April 22, 2024

Embeddings: The Unsung Hero of Modern AI

6 min

“Why understanding vector spaces is more important than understanding GPT.”

Everyone's talking about GPT. But if you want to actually build useful AI systems, you need to understand embeddings first.

What Even Is an Embedding?

An embedding is just a list of numbers that represents something. A word. A sentence. An image. A user's preferences. Anything, really.

The magic is that similar things end up with similar numbers.

Why This Matters

Let me show you with a real example. Say you're building a search system (I've done this a few times).

Traditional search: User types "best coffee shops" → you look for documents containing those exact words.

Embedding search: User types "best coffee shops" → you convert that to numbers → find documents with similar numbers → return results about "top cafes," "great espresso places," and "where to get good coffee"

The second approach *understands meaning*. That's the difference.

The Three Laws of Embeddings

1. **Similar things cluster together** - "king" and "queen" are close in embedding space 2. **Relationships are preserved** - king - man + woman ≈ queen (yes, really) 3. **Context changes everything** - "bank" the financial institution vs "bank" of a river have different embeddings

Practical Tips

Use pre-trained embeddings first. Training your own is almost never worth it.

Dimension matters. 384 dimensions is usually enough. 1536 is overkill for most tasks.

Normalize your vectors. Cosine similarity is your friend.

The Punchline

You can build a remarkably powerful AI system with nothing but good embeddings and a vector database. No LLM required.

Sometimes the simplest solution is the best one.

margin scribbles:

— this is the stuff they should teach in school

— the king-queen thing blew my mind when I first learned it

thanks for reading!

→ found this useful? let me know at hello@meghavi.me

more thoughts:

What I Learned Shipping LLMs to Production

8 min

Debugging ML Systems: A Survival Guide

10 min