Spaces:
Running
Running
This project Implements an LLM-augmented `textgraph` algorithm for | |
constructing a _lemma graph_ from raw, unstructured text source. | |
The `TextGraphs` library is based on work developed by | |
[Derwen](https://derwen.ai/graph) | |
in 2023 Q2 for customer apps and used in our `Cysoni` | |
product. | |
This library integrates code from: | |
* [`SpanMarker`](https://github.com/tomaarsen/SpanMarkerNER/) | |
* [`spaCy-DBpedia-Spotlight`](https://github.com/MartinoMensio/spacy-dbpedia-spotlight) | |
* [`REBEL`](https://github.com/Babelscape/rebel) | |
* [`OpenNRE`](https://github.com/thunlp/OpenNRE/) | |
* [`qwikidata`](https://github.com/kensho-technologies/qwikidata) | |
* [`pulp`](https://github.com/coin-or/pulp) | |
* [`spaCy`](https://spacy.io/) | |
* [`HF transformers`](https://huggingface.co/docs/transformers/index) | |
* [`PyTextRank`](https://github.com/DerwenAI/pytextrank/) | |
For more background about early efforts which led to this line of inquiry, see the recent talks: | |
* ["Language, Graphs, and AI in Industry"](https://derwen.ai/s/mqqm) | |
**Paco Nathan**, K1st World (2023-10-11) ([video](https://derwen.ai/s/4h2kswhrm3gc)) | |
* ["Language Tools for Creators"](https://derwen.ai/s/rhvg) | |
**Paco Nathan**, FOSSY (2023-07-13) | |
The `TextGraphs` library shows integrations of several of these kinds | |
of components, complemented with use of graph queries, graph algorithms, | |
and other related tooling. | |
Admittedly, the results present a "hybrid" approach: | |
it's not purely "generative" -- whatever that might mean. | |
A core principle here is to provide results from the natural language | |
workflows which may be used for expert feedback. | |
In other words, how can we support means for leveraging | |
_human-in-the-loop_ (HITL) process? | |
Another principle has been to create a Python library built to produced | |
configurable, extensible pipelines. | |
Care has been given to writing code that can be run concurrently | |
(e.g., leveraging `asyncio`), using dependencies which have | |
business-friendly licenses, and paying attention to security concerns. | |
The library provides three main affordances for AI applications: | |
1. With the default settings, one can use `TextGraphs` to extracti ranked key phrases from raw text -- even without using any of the additional deep learning models. | |
2. Going a few further steps, one can generate an RDF or LPG graph from raw texts, and make use of _entity linking_, _relation extraction_, and other techniques to ground the natural language parsing by leveraging some knowledge graph which represents a particular domain. Default examples use WikiMedia graphs: DBPedia, Wikidata, etc. | |
3. A third set of goals for `TextGraphs` is to provide a "playground" or "gym" for evaluating _graph levels of detail_, i.e., abstraction layers for knowledge graphs, and explore some the emerging work to produced _foundation models_ for knowledge graphs through topological transforms. | |
Regarding the third point, consider how language parsing produces | |
graphs by definition, although NLP results tend to be quite _noisy_. | |
The annotations inferred by NLP pipelines often get thrown out. | |
This seemed like a good opportunity to generate sample data for | |
"condensing" graphs into more abstracted representations. | |
In other words, patterns within the relatively noisy parse results | |
can be condensed into relatively refined knowledge graph elements. | |
Note that while the `spaCy` library for NLP plays a central role, the | |
`TextGraphs` library is not intended to become a `spaCy` pipeline. | |