Sukyong Hong Interview
- LLM = Self Attention Decoders
- https://poloclub.github.io/transformer-explainer
Top K Sampling
- Choose one among high probable words
Top P Sampling
- Choose trustworthy ones, even if candidate counts change
RAG
- There's an AI model that puts vectors directly,
- Search with Cosine Similarity and Vector,
- When you put knowledge, put the original text.
Force Korean while saving tokens
Translated into English, Add "Speak Korean."
Query Rewriting
- Search for 5 synonyms
- History-based rewriting
Reranking
- Get the top 50 first
- Cross Encoder Ranking Model
- Then recalculate the similarity