Parsing JSON inputs is a common task in network oriented services. Many web APIs provide data as JSON outputs that are fed into another process. Parsing and validating JSON text inputs can be a significant bottleneck for a high throughput network service.
This month, Richard Thomson will give us an overview of the simdjson library for parsing JSON inputs. From a feature perspective, this library gives us fast parsing and validation of JSON input documents and provides an API for traversing the parsed document.
What's more interesting about this library is the approach taken by the implementation: using SIMD CPU instruction set extensions for parsing and validating the JSON inputs. This involves a two-pass algorithm for scanning the input text for interesting characters and a second pass that builds a document structure and uses SIMD instructions for number parsing and UTF-8 string validation. The library also uses a dynamic dispatch mechanism to allow a single build of the library to select the appropriate set of SIMD instructions at runtime.
The presentation will cover the following topics:
- A brief overview of SIMD instruction set extensions
- How to extract data from the parsed document structure?
- What does parsing look like with SIMD instructions?
- How does the implementation use dynamic dispatch for instruction set selection at runtime?
- How does the implementation expose SIMD operations?
simdjson: https://simdjson.org/
Example code: https://github.com/LegalizeAdulthood/comics-simdjson
Meetup: https://www.meetup.com/utah-cpp-programmers/
Past topics: https://utahcpp.wordpress.com/past-meeting-topics/
Future topics: https://utahcpp.wordpress.com/past-meeting-topics/
00:00 Introduction
01:30 Sample Data Set
07:30 Sample Program Demonstration
09:03 Review of SIMD
11:28 Instruction Set Extensions
14:30 simdjson Library
20:30 simdjson Parsing Algorithm
26:00 Sample Code
31:25 simdjson_result
34:09 DOM Iterator Support
35:24 Querying the DOM for Data
39:40 Pretty Printing
44:11 Impression of Using the DOM
45:09 Parser Lifetime Requirements
46:07 DOM Internal Structure
50:00 DOM Traversal
54:27 Implementation Selection
57:40 cpuid Wrapper
1:01:53 Dynamic Dispatch
1:05:00 Haswell Parser
1:09:45 Inspiration for Your SIMD Code
1:14:45 Summary