Embeddings

Vector representations of tokens in high-dimensional space.

Overview

Embeddings convert discrete tokens into dense vectors that capture semantic meaning.

Key Concepts

Token Embeddings

  • Learned lookup table (embedding matrix)
  • Each token maps to a vector
  • Typical dimensions: 2048, 4096, 8192

Layer Normalization

  • Stabilizes training
  • Applied before/after attention/MLP blocks

Dropout

  • Regularization technique
  • Prevents overfitting

Residual Connections

  • Enables deep networks
  • Helps with gradient flow