Transformers and Modern LLMs

Attention Is All You Need Paper 3 Blue 1 Brown video

The Core Innovation: Attention

Attention, is the core of a transformer, which is an improvement to RNNs (basically MLPs that run multiple times by itself).

Some nouns:

Temperature: How random the model takes the probability distribution of next tokens. Higher is more creative and sporadic, 0 is very grounded and logical.
Top-k: The number of samples to sample from in the probability distribution of next tokens.
Top-p: The minimum percentage probability to take from the distribution, do not set top-k if this is set to another a value.

#ai #ai/conceptual