How the Gemma/Gemini Tokenizer Works - Gemma/Gemini vs GPT-4 vs Mistral
in this video, we go under the hood of the gemini and gemma-7b and gemma-2b tokenizer. we look at the large vocabulary and the impact that it has on the size of the model, and how Google has put a focus on people, places, culture, languages and things over efficient vocabulary and frequent sub-words. in this video chris introduced his new tokenizer benchmark test, dataset and tokenizer visualizer tools
github
---------------
https://github.com/chrishayuk/tokenizer-benchmark