back to thoughts

Embeddings: The Unsung Hero of Modern AI

6 min

Why understanding vector spaces is more important than understanding GPT.

Everyone's talking about GPT. But if you want to actually build useful AI systems, you need to understand embeddings first.

What Even Is an Embedding?

An embedding is just a list of numbers that represents something. A word. A sentence. An image. A user's preferences. Anything, really.

The magic is that similar things end up with similar numbers.

Why This Matters

Let me show you with a real example. Say you're building a search system (I've done this a few times).

  • Traditional search: User types "best coffee shops" → you look for documents containing those exact words.
  • Embedding search: User types "best coffee shops" → you convert that to numbers → find documents with similar numbers → return results about "top cafes," "great espresso places," and "where to get good coffee"
  • The second approach *understands meaning*. That's the difference.

    The Three Laws of Embeddings

    1. **Similar things cluster together** - "king" and "queen" are close in embedding space 2. **Relationships are preserved** - king - man + woman ≈ queen (yes, really) 3. **Context changes everything** - "bank" the financial institution vs "bank" of a river have different embeddings

    Practical Tips

  • Use pre-trained embeddings first. Training your own is almost never worth it.
  • Dimension matters. 384 dimensions is usually enough. 1536 is overkill for most tasks.
  • Normalize your vectors. Cosine similarity is your friend.
  • The Punchline

    You can build a remarkably powerful AI system with nothing but good embeddings and a vector database. No LLM required.

    Sometimes the simplest solution is the best one.

    margin scribbles:

    this is the stuff they should teach in school

    the king-queen thing blew my mind when I first learned it

    thanks for reading!

    → found this useful? let me know at hello@meghavi.me

    more thoughts: