If you need help with anything quantization or ML related (e.g. debugging code) feel free to book a 30 minute consultation session! https://calendly.com/oscar-savolainen
I'm also available for long-term freelance work, e.g. for training / productionizing models, teaching AI concepts, etc.
*Video Summary:*
In this video, we go over the theory of how to statically quantize a PyTorch model in Eager mode.
*Timestamps:*
00:00 Intro
03:05 Required Architecture Changes (QuantStubs/ DeQuantStubs/ FloatFunctionals)
08:54 Fusing modules
12:18 Assignment of QConfigs (recipe for quantization for each module),
15:26 Preparing the model for quantization (i.e. making the model fake-quantizable),
20:25 Converting the model to a "true" quantized int8 model.
23:06 Conclusion
For more background on what it means to quantize a tensor, see:
https://www.youtube.com/watch?v=rzMs-wKQU_U&feature=youtu.be
*Links (PyTorch documentation):*
- Quant/DeQuant stub definition: https://pytorch.org/docs/stable/_modules/torch/ao/quantization/stubs.html
- FloatFunctionals definition: https://pytorch.org/docs/stable/generated/torch.ao.nn.quantized.FloatFunctional.html
- QConfig: https://pytorch.org/docs/stable/generated/torch.ao.quantization.qconfig.QConfig.html
- `prepare_qat`: https://pytorch.org/docs/stable/generated/torch.ao.quantization.prepare_qat.html
-Converting the model: https://pytorch.org/docs/stable/generated/torch.ao.quantization.convert.html