N-gram
An N-gram is a contiguous sequence of n items from a given sample of text or speech. For example, natural language processing (NLP) is often used for tasks such as text prediction, spelling correction, speech recognition, and language modeling.
The "N" in the N-gram represents the number of sequence items. For instance:
- A 1-gram (or unigram) is a sequence of one item (e.g., a single word).
- A 2-gram (or bigram) is a sequence of two items (e.g., two adjacent words in a sentence).
- A 3-gram (or trigram) is a sequence of three items, and so on.
N-grams predict the likelihood of a given N-gram (or word sequence) following a sequence of words in a sentence. This is useful in applications like text prediction software, where predicting the next word or sequence of words can improve user experience and efficiency. In language modeling, N-grams help estimate the probability distribution over a sequence of words, providing a foundation for models to understand and generate human-like text.