GraphGeeks Talk Ep8: How To Create Knowledge Graphs from Unstructured Data

GraphGeeks Talk Ep8: How To Create Knowledge Graphs from Unstructured Data

6.395 Lượt nghe
GraphGeeks Talk Ep8: How To Create Knowledge Graphs from Unstructured Data
Join us for a step-by-step guide to working with unstructured data sources for constructing and updating knowledge graphs! Repo: https://github.com/DerwenAI/strwythura This talk looks at the best way to convert text documents into a knowledge graph. According to open source libraries for GraphRAG, a dominant notion is: "Just use an LLM to generate a graph automatically, which should be good enough to use." For those working with graphs in regulated environments or mission-critical apps, this isn't appropriate. And for downstream use cases, there's a larger question: How can we build KGs from both structured and unstructured data sources, and keep human expert reviews in the loop, while taking advantage of LLMs and other deep learning models? We'll review the broader practices in knowledge graph construction. You'll learn a general process that includes parsing the text (e.g., based on spaCy pipelines) then using textgraph methods to build a lexical graph. We'll generate a semantic layer atop this, making use of named entity recognition, entity extraction, and leveraging previous entity resolution work with structured data sources to perform entity linking. Then making use of relation extraction to connect pairs of nodes, we'll enrich the semantics for edges in the graph. In each steps, we're using LLMs and other deep learning models to augment narrowly-defined tasks within the overall workflows. Using domain-specific resources such as a thesaurus, we'll show how to perform semantic random walks to expand the graph. Finally, we'll show graph analytics to make use of the graph -- tying into what's needed for use cases such as GraphRAG. Paco Nathan leads DevRel for the Entity Resolved Knowledge Graph practice area at Senzing.com and is a computer scientist with +40 years of tech industry experience and core expertise in data science, natural language, graph technologies, and cloud computing. He's the author of numerous books, videos, and tutorials about these topics.