Parsing JSON at Gigabytes per Second with simdjson

Parsing JSON at Gigabytes per Second with simdjson

2.200 Lượt nghe
Parsing JSON at Gigabytes per Second with simdjson
Parsing JSON inputs is a common task in network oriented services. Many web APIs provide data as JSON outputs that are fed into another process. Parsing and validating JSON text inputs can be a significant bottleneck for a high throughput network service. This month, Richard Thomson will give us an overview of the simdjson library for parsing JSON inputs. From a feature perspective, this library gives us fast parsing and validation of JSON input documents and provides an API for traversing the parsed document. What's more interesting about this library is the approach taken by the implementation: using SIMD CPU instruction set extensions for parsing and validating the JSON inputs. This involves a two-pass algorithm for scanning the input text for interesting characters and a second pass that builds a document structure and uses SIMD instructions for number parsing and UTF-8 string validation. The library also uses a dynamic dispatch mechanism to allow a single build of the library to select the appropriate set of SIMD instructions at runtime. The presentation will cover the following topics: - A brief overview of SIMD instruction set extensions - How to extract data from the parsed document structure? - What does parsing look like with SIMD instructions? - How does the implementation use dynamic dispatch for instruction set selection at runtime? - How does the implementation expose SIMD operations? simdjson: https://simdjson.org/ Example code: https://github.com/LegalizeAdulthood/comics-simdjson Meetup: https://www.meetup.com/utah-cpp-programmers/ Past topics: https://utahcpp.wordpress.com/past-meeting-topics/ Future topics: https://utahcpp.wordpress.com/past-meeting-topics/ 00:00 Introduction 01:30 Sample Data Set 07:30 Sample Program Demonstration 09:03 Review of SIMD 11:28 Instruction Set Extensions 14:30 simdjson Library 20:30 simdjson Parsing Algorithm 26:00 Sample Code 31:25 simdjson_result 34:09 DOM Iterator Support 35:24 Querying the DOM for Data 39:40 Pretty Printing 44:11 Impression of Using the DOM 45:09 Parser Lifetime Requirements 46:07 DOM Internal Structure 50:00 DOM Traversal 54:27 Implementation Selection 57:40 cpuid Wrapper 1:01:53 Dynamic Dispatch 1:05:00 Haswell Parser 1:09:45 Inspiration for Your SIMD Code 1:14:45 Summary