Spaces:
Sleeping
Sleeping
# Lemma Graph | |
This project introduces the notion of a _lemma graph_ as an intermediate representation. | |
Effectively, this provides a kind of cache during the processing of each "chunk" of text. | |
Think of the end result as "enhanced tokenization" for text used to generate graph data elements. | |
Other projects might call this by different names: | |
an "evidence graph" in [#wen2023mindmap](biblio.md#wen2023mindmap) | |
or a "dynamically growing local KG" in [#loganlpgs19](biblio.md#loganlpgs19). | |
The lemma graph collects metadata from NLP parsing, entity linking, etc., which generally get discarded in many applications. | |
Therefore the lemma graph becomes rather "noisy", and in most cases would be too big to store across the analysis of a large corpus. | |
Leveraging this intermediate form, per chunk, collect the valuable information about nodes, edges, properties, probabilities, etc., to aggregate for the document analysis overall. | |
Consequently, this project explores the use of topological transforms on graphs to enhance representations for [_graph levels of detail_](https://blog.derwen.ai/graph-levels-of-detail-ea4226abba55), i.e., being able to understand a graph a varying levels of abstraction. | |
Note that adjacent areas of interest include emerging work on: | |
- _graph of relations_ | |
- _foundation models for KGs_ | |
Means for "bootstrapping" a _lemma graph_ with initial semantic relations, allows for "sampling" from a curated KG to enhance the graph algorithms used, e.g., through _semantic random walks_ which allow for incorporating heterogeneous sources and relatively large-scale external KGs. | |
This mechanism also creates opportunities for distributed processing, because the "chunks" of text can follow a _task parallel_ pattern, accumulating the extracted results from each lemma graph into a graph database. | |
Augmenting a KG iteratively over time follows a similar pattern. | |