🔑Large language models are probalistic models of documents, predicting the probability of word tokens.
💡An autoregressive language model predicts the next token based on the previous tokens.
🌍Perplexity measures the average number of bits needed to communicate the next word in a language model.
🤔Language models make assumptions about fixed history and a categorical distribution of word predictions.
🔬Perplexity has been a useful metric for language modeling, enabling comparison between different models.