In this video, Matt Rocklin gives a brief introduction to Dask Futures. You'll learn about how Dask Futures can be used to parallelize a simple embarrassingly parallel workflow, as well as task dependencies and memory management.
What is Dask?
Dask is a free and open-source library for parallel computing in Python. Dask is a community project maintained by developers and organizations.
What is a Dask Future?
Dask Delayed and Futures represent lightweight mechanisms for building and running custom task graphs, while staying within traditional Python coding patterns. This combination — regular Python code with a powerful distributed scheduler — enables all kinds of industry or discipline-specific workloads to be parallelized for fast, large-scale computation.
Dask supports a real-time task framework that extends Python’s concurrent.futures interface. This interface is good for arbitrary task scheduling like dask.delayed, but is immediate rather than lazy, which provides some more flexibility in situations where the computations may evolve over time.
These features depend on the second generation task scheduler found in dask.distributed (which, despite its name, runs very well on a single machine).
Share your feedback with us in the comments and let us know:
- Did you find the video helpful?
- Do you have specific questions about Dask Futures?
Learn more at https://docs.dask.org/en/latest/futures.html
KEY MOMENTS
00:00:00 Intro
00:00:26 Workflow Example
00:02:02 Dask Futures
00:03:00 Concurrent Futures
00:03:56 Task Dependencies and Memory Management
00:05:46 More Workflow Examples
00:08:49 Secret to Releasing Some Memory
00:10:48 Dask Manages Memory Like Python