We start with a simple signals processing workload, and then accelerate it by several orders magnitude using the following libraries:
1. Numpy: https://numpy.org
2. Numba: https://numba.pydata.org
3. Dask: https://dask.org
4. CuPy: https://cupy.chainer.org
5. Numba CUDA: https://numba.pydata.org/numba-doc/dev/cuda/index.html
We eventually run a real-time streaming system on multi-GPU hardware. For more information on GPU computing in Python in general, consider looking at information about RAPIDS at https://rapids.ai
Notebooks:
- CPU: https://nbviewer.jupyter.org/urls/gist.github.com/mrocklin/148da2076cac9ba183071241b3d8e18a/raw/5e0e08cb1c2cab5d8a92ecf4221bf6b9f0eabf17/pipeline-cpus.ipynb
- GPU: https://nbviewer.jupyter.org/urls/gist.github.com/mrocklin/148da2076cac9ba183071241b3d8e18a/raw/5e0e08cb1c2cab5d8a92ecf4221bf6b9f0eabf17/pipeline-gpus.ipynb
- Gist: https://gist.github.com/mrocklin/148da2076cac9ba183071241b3d8e18a
Update: Jacob Tomlinson ran this same computation on a gamer laptop to see the performance one can get from a cheap consumer system: https://www.youtube.com/watch?v=7Bw1OqVuLtQ