Explaining Embeddings To My Uber Driver

Lisbon, If you have ever been here, you know the streets are a wild mix of steep hills, narrow cobblestone alleys, and confusing one-way roads. Navigating this city takes serious skill. The yellow trams rattle past, tourists spill onto the roads looking for the best bakeries, and drivers somehow weave through it all without losing their minds.

Yesterday, I was riding in an uber heading toward the Alfama district. My driver, João, was expertly dodging a parked delivery van while smoothly shifting gears. To break the silence, he asked the classic question.

"So, what do you do?"

I said, "What you mean? as in how?"

"for work" he said

"I work in tech," I said. "Mostly dealing with artificial intelligence"

João looked at me in the rearview mirror. "ai? like chatgpt? Man, I tried that the other day. It feels like it actually understands what I am saying. How does it do that? Does it read a dictionary?"

It was a fair question. How do we teach a machine made of silicon and wire to understand human language? The secret sauce comes down to a concept called "embeddings." I decided to give João the non-technical breakdown, using the very city we were driving through.

Computers only speak math

First, we have to state a basic truth. Computers are basically very fast calculators. They do not understand english. They do not understand portuguese, they only understand numbers.

If you type the word "dog" into a computer, it has no idea what a dog is. It does not know about barking, fur, or fetching tennis balls. It just sees a sequence of characters: d, o, and g. If you type the word "puppy", the computer thinks it is a completely unrelated thing.

For a long time, software engineers tried to solve this by making massive lists. They would try to tag words manually. But language is way too messy for that. People use slang. Words have double meanings.

I pointed out the window at a cafe. "João, if I want to tell a computer what a 'pastel de nata' is, I cannot just give it the dictionary definition. It will not know what 'pastry' or 'custard' means either. It is a dead end."

Mapping meaning to coordinates

He nodded. "So how do you fix it?"

I asked him how his GPS works. He pointed to his phone mounted on the dashboard. "It uses satellites to find my latitude and longitude."

Exactly. Two simple numbers can describe any location on the planet. If you know the coordinates of lisbon and the coordinates of madrid, you can use basic math to figure out how far apart they are.

What if we did the exact same thing for words?

This is what an embedding is. It is a way to turn words, sentences, or even images into coordinates on a map. But instead of a map of the physical world, it is a map of meaning.

Let us imagine a very simple map with only two directions. The left-right direction measures how "animal-like" something is. The up-down direction measures how "domesticated" something is.

If we map the word "cat", we put it high up on the "animal" scale and high up on the "domesticated" scale. If we map the word "tiger", it gets a high "animal" score, but a very low "domesticated" score. If we map the word "car", it gets a zero on the "animal" scale.

Now, the computer does not need to know what a cat or a tiger is. It just looks at the coordinates. It sees that "cat" and "tiger" are close together on the map. It mathematically knows they are related. Embeddings do the exact same thing, but instead of mapping a physical city, they map meaning. We take a word and assign it a set of numbers. We call this list of numbers a vector. It acts just like a set of GPS coordinates for an idea.

Neighborhoods of meaning

He stopped at a crosswalk to let a group of tourists pass. "So every word has a location?" he asked.

"Yes. And here is the cool part. On a physical map, places that are close together share a neighborhood. If you are standing at a bakery in Belém, you are very close to the Jerónimos Monastery. In the mathematical space of an embedding, words that share a similar meaning are placed right next to each other."

When you ask an AI a question, it calculates the distance between these coordinates to figure out how related two ideas are. If the distance is short, the computer knows the concepts are strongly related.

Adding more dimensions

He accelerated past the Praça do Comércio. "But words have different meanings," he noted. "An apple is a fruit, but it is also the company that made your phone."

"You are absolutely right," I said. "That is the exact problem scientists faced. If we only use two numbers, like a flat map, we do not have enough room to capture all the complexity of human language."

To fix this, an embedding does not just use two numbers. It uses hundreds of numbers. Sometimes thousands.

Instead of a flat map, imagine a space with hundreds of different directions. One direction might measure how "fluffy" a word is, dogs and pillows would score high there. Another direction might measure whether a word is an animal or a machine, another measures if it is a food.

By plotting words across hundreds of different axes, the computer captures incredible detail. Because of this, the math works out in fascinating ways. If you take the coordinates for "king", subtract the coordinates for "man", and add the coordinates for "woman", the computer lands almost exactly on the coordinates for "queen".

The computer still does not know what a queen is. It just knows the geometry.

Why this matters

"Okay," he said, turning onto my street. "That makes sense. But what do we actually use this for?"

I explained that this technology powers almost everything we do online today. Think about searching for something on the internet. Ten years ago, search engines just matched exact letters. If you typed "sneakers" into a search bar, the system only looked for websites containing that exact word.

Today, search engines use embeddings. When you type "sneakers", the system turns your word into a mathematical coordinate. It then looks for all the documents that live in that exact same neighborhood. It will happily show you pages about "running shoes" or "trainers" because those words share almost the exact same location in the computer's mathematical brain.

This powers modern search engines, product recommendations, and the massive language models that everyone is talking about right now. By converting our messy, complicated languages into clean mathematical coordinates, we finally gave machines a way to process ideas.

Reaching the destination

We pulled up at my stop. The uber app beeped on João's phone to signal the end of the ride.

"So," he said, putting the car in park. "It is just map reading. The computer builds a map of the dictionary, and related words live on the same street."

"You nailed it," I said, opening the door. No magic involved. Just high school geometry scaled up to a massive degree, happening millions of times a second.

I grabbed my stuffs, said my goodbyes, and stepped out into the Lisbon sun. Next time I need to understand a complicated tech concept, I might just try explaining it in the back of an uber haha.