Overview of how quantization works with neural networks, from weight compression to inference calculations and training. Based on a Stanford lecture, the full slides are at https://docs.google.com/presentation/d/1zGm5bqGrkAepwJZ5PABiYjrIKq1pDnzafa8ZYeaFhXY/edit?usp=sharing
There are also two Colab notebooks to accompany this screencast at:
https://drive.google.com/file/d/1Pn-lNOty-P1fX6G66oa6wvsEn46Xyx8Y
https://colab.research.google.com/drive/1h0eFudI7gy5AGYdxBFdoVpggwBN5ftMu
See https://tinymlbook.com for more about this series.