how the tokenizer for gpt-4 (tiktoken) works and why it can't reverse strings

how the tokenizer for gpt-4 (tiktoken) works and why it can't reverse strings

2.963 Lượt nghe
how the tokenizer for gpt-4 (tiktoken) works and why it can't reverse strings
chris breaks down the chatgpt (gpt-4) tokenizer and shows why large language models such as gpt, llama-2 and mistral struggle to reverse words. chris looks at how words, programming languages, different languages and even how morse code is tokenized, and shows how tokenizers tend to be biased towards english languages and programming languages,