The Technical Path to Mind Reading using AI

Jan 16, 2025

We have the technology, today, to build a consumer grade, non-invasive, real world mind reading device. What's more, I think it's possible to accurately replicate the experience of the Pensieve from Harry Potter, a magical device whereby multiple people could visit the memories of another person in a VR-like environment. This breaks way to two new skill sets that are exclusively found in the wizarding world, Legilimency (mind reading) and Occlumency (defense against mind reading), which suddenly becomes accessible to us in the real world in the form of prompt engineering.

Before I lay out the road map, I want to establish some of the building blocks and inspiration for how we're going to get there. There's a few ideas that you need to be familiar with if this is going to make any sense.

Magical Inspiration

The Harry Potter Wiki describes the Pensieve as such:

The Pensieve was enchanted to recreate memories so that they become re-liveable, taking every detail stored in the subconscious and recreating it faithfully. Once memories were siphoned into the Pensieve, a witch or wizard could enter them by dipping your head in the water and view the memory from a third-person point of view, almost as if travelling into the past. However, since it's just a memory and not actual time travel, the user of the Pensieve was non-participant, meaning that their presence had no effect on anything that happened in the memory.

In the books, there are numerous times where memories are extracted from people to be put into the Pensieve. Occasionally, multiple people visit the same memory together. It is also shown in the case of Horace Slughorn's guilt for telling Tom Riddle about horcruxes that it is possible to tamper with your own memories in a way that obfuscates what you actually remember to present yourself in a better light. The Pensieve is often used in the context of criminal investigations or to provide an additional look into someone's mind for better context.

Aside from the Pensieve, there are two areas of study in the wizarding world called Legilimency and Occlumeny. Again, I'll let the Harry Potter wiki explain the two:

Legilimency was the act of magically navigating through the many layers of a person's mind and correctly interpreting one's findings. A person who practised this art was known as a Legilimens. Muggles may have called this "mind-reading", but practitioners disdained the term as naive. The opposite of Legilimency was Occlumency, which was used to shield one's mind from the invasion and influence of a Legilimens.

In the books, people can study and train themselves to get better at legilimency and occlumency, not unlike any other scholarly subject. The better you are at navigating the layers and complexities of someone's mind, the more useful information you would be able to extract from it. The same is true of occlumency but for defense, and even though these two fields of study are separate from the Pensieve, it is advantageous for us to think about the ideas of "defending against an intruder trying to extract thoughts from you" and "intentionally obfuscating your memories that someone is trying to view" as the same thing.

Technical Inspiration

Most people think about the technology behind ChatGPT as being exclusively for text. You give the AI text, it gives you text back (or in some cases, images, video or audio). Under the hood, these words are converted into numbers (simply by saying that "The = 1", "quick = 2", etc.), lots of fun matrix math is performed on those numbers, and the resulting list of numbers are converted back into other words.

But really, it's just a sequence of input data that is converted into numbers, which is then converted into output data on the other side. Literally anything that can be represented as a sequence of data points can be converted into any other sequence of data points, given enough training data to learn an appropriate conversion. It could be "text to image", "image to driving commands", "protein chain to 3D structure", or...."brain signal to fully generated 3D VR world".

There is a company called OpenBCI that sells consumer grade EEG (Electroencephalography) development kits.

The kits are a little pricey, but from what I've seen online, they have a fairly well established software suite that lets you quickly iterate. What you see below is a screenshot of their software reading the EEG data from a user wearing an EEG cap. They have library integrations for many languages to let you do things with the EEG data programatically.

Note how the EEG data consists of 8 different wavelengths, or "channels". This will be important for our prototyping stages later.

Putting it all together

In short, this is the loop we are creating.

Starting off, the user would need to wear one of those EEG caps so we could record their brain signals. They would also need to wear a VR headset. As noted in the above screenshot, there is an EEG/VR headset combo listed for $28K, but I'm pretty sure a solution could be cobbled together for much cheaper than that (in the range of ~$3K-5K).

The brain would generate EEG signals as a series of channels. Those signals would be sent to the VR headset in real time. From there, the channels would be fed into the neural network to generate the 3D environment.

Generating points in 3D space isn't a new idea. There is a company called Zoo that is building a text-to-CAD software and Meshy who is doing text to 3D model and image to 3D model. Once we generate the environment around the user, their reaction to it and interaction with it influences their brain to generate new EEG data, creating self sustaining cycle.

Data Collection and Curation

The loop as a whole isn’t that tough to build, but we’re missing a very important part: a fully trained neural network that can do the proper translation between brain signals and a 3D VR environment. To do that we need data — and lots of it. Without even looking it up, I can tell you that there isn't enough training data out there. It's just too obscure of a modality translation.

The play here would be to walk around (or pay people to walk around) with an EEG cap on with a depth camera strapped to their head GoPro style, along with a way to either save the data locally or stream it to the cloud, and just record everything they do 24/7. Each camera frame would be linked to the corresponding EEG data taken at any given time.

$\text{For each timestep } t: \quad (\mathrm{EEG}_t, \mathrm{Frame}_t)$

The nice part about this system is that we already have a high level understanding of what brain waves indicate what type of behavior and what it implies, which means that we can more or less label the data in an unsupervised manner.

Delta:
- Frequency Range: 0.1 - 3.5 Hz
- Characteristics: Slowest waves, highest amplitude
- Associated States: Deep sleep, unconsciousness
Theta:
- Frequency Range: 3.5 - 7.5 Hz
- Characteristics: Slow activity
- Associated States: Creativity, intuition, daydreaming, subconscious mind
Alpha:
- Frequency Range: 8 - 12 Hz
- Characteristics: Relaxed, alert state
- Associated States: Relaxation, meditation, mental coordination
Beta:
- Frequency Range: 13 - 30 Hz
- Characteristics: Fast activity
- Associated States: Alertness, active thinking, anxiety
Gamma:
- Frequency Range: > 30 Hz
- Characteristics: High frequency
- Associated States: Information processing, consciousness

The process of collecting data would, of course, be very expensive and labor intensive. We need a cheaper way of prototyping this. Thankfully, videos offer a similar approach that we can take to prove out the concept. Rather than inputting EEG data and outputting a 3D world, we can deconstruct the audio signal from a video using a fast Fourier transform (FFT) into 8 component parts and pair those with the image of the frame from which the audio was pulled.

$\text{For each timestep } t: \quad (\mathrm{Audio}_t, \mathrm{Frame}_t)$

Over enough samples, this could let us generate images from audio. This would prove out the viability of the idea without the need to do large scale data collection of brain waves.

If you’re enjoying this so far, please share it with people who might be interested :)

Use Cases

And now for the fun part. Assuming this can be created, there are a few really cool capabilities this unlocks. First of all, we would be able to record memories as easily as we can record audio or video today. One or more people could experience the memories of another as they remember it. This could be done via first person perspective or third person via a gaussian splat.

Imagine being able to see your grandparents marriage as they saw it themselves, not just a set of black and white images. Or maybe we could better prepare soldiers by placing them in the memories of senior officers that have been on the battlefield. Psychologists could live through the experiences of their patients, dramatically reducing the time it takes to really understand how to help them. Journalists could experience the memories of their sources first hand. Avid readers could experience the vision an author had in their head when they wrote a novel. We could record and relive our own dreams as a multiplayer experience. The key here is that we don't need a device attached to the person to record the initial experience -- the person's brain is the recorder.

The best prompt engineers of our generation might also see viable career paths with a technology like this. Imagine you were a criminal investigator, except instead of just listening to what the person says and seeing their body language, you had a direct view into the way they interpret the world. Good prompt engineers build up intuitions around language models. The best ones can get any kind of output they need, no matter how much safety post-training the LLM has gone through. This is very much akin to Legilimency, if you'll grant me a slightly abstract interpretation of prompt engineering.

Safety training an LLM ≈ Someone who is resistant to questioning ≈ occlumency

All three of these things can be thought of in a similar light. If someone does not want to reveal their true thoughts, they can train themselves to think differently when someone (like a skilled prompt engineer) is trying to elicit specific memories from them.

For example, if someone stole a pink elephant, the prompt engineer could ask the thief who stole it to think about a pink elephant, with the goal of getting the thief to illicit the specific set of brain waves that would generate their first person perspective of the crime itself. If a skilled occlumens were being questioned, they would simply not illicit those memories when put through questioning. Better yet, some would even be able to modify their own memories during questioning.

This could also be used in terms of psychotherapy. If someone has a fear of spiders, we could create a virtual environment that uses the closed loop feedback system to condition the user to no longer be afraid of spiders. The same could be said for trauma and PTSD. We could specify any state of EEG data to optimize for and it would work towards that. We could improve people's mental health with a lot more tact and effectiveness than we do today.

On the video games side of things, we could make a VR horror game based off of this technology. Most horror games do ok at being horrifying. Low grade games rely on jump scares rather than creating a true sense of terror. Good ones seem to do a good job of ambiance and the unknown to create an everpresent sense of unease. Well, it turns out we can actually measure somebody's sense of fear via brain signals (specifically Theta waves). Using the closed loop system, if we develop a reward model that optimizes for fear, we could create the objectively scariest game ever. Every interaction of the game would be customized to the user, and would constantly evolve during gameplay as the user becomes accustomed to new fears.

Caveats and Side Notes

With all that said, and as excited as I am, there are a few things I want to mention regarding this approach.

We have technology to write directly to the brain, but the methods are often invasive, imprecise, and limited. Doing it in a non-invasive way via our ears and eyeballs makes it a more consumer friendly product.
fMRIs offer a much higher resolution for brain activity than EEG caps but is much slower. Since we are looking for real time feedback, and I don't know of a portable fMRI system with a software suite rich enough to hook into a game engine, EEG is used instead. The resolution of the EEG data is the main reason why I would not advise this be used in criminal investigations. Reading it from outside of the skull is simply too lossy and subject to external noise at this time. This may improve with a higher quality device.
I have no idea how this would perform for people who have aphantasia, people who have the inability to visualize things in their head.
It might be possible to use this technology to better communicate with animals Dr. Doolittle style, but it’s not clear to me that’s actually possible. Would need much further investigation.

The Lisowe

Discussion about this post