shreyaspimpalgaonkar commited on
Commit
4099389
1 Parent(s): c202072

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +82 -3
README.md CHANGED
@@ -1,3 +1,82 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ # Triplex
6
+
7
+ <!-- Provide a quick summary of what the model is/does. -->
8
+
9
+ Triplex is a model for creating knowledge graphs from unstructured data. It works by extracting triplets - simple statements consisting of a subject, predicate, and object - from text or other data sources. Try the demo here: [kg.sciphi.ai](kg.sciphi.ai)
10
+
11
+ ## Model Details
12
+
13
+ It is a finetuned version of Phi3-3.8B on a high quality proprietary dataset constructed using DBPedia, Wikidata, and other data sources.
14
+
15
+ ### Model Description
16
+
17
+
18
+
19
+ - **Developed by:** [https://www.SciPhi.ai](SciPhi.ai)
20
+
21
+ ### Model Sources
22
+
23
+ <!-- Provide the basic links for the model. -->
24
+
25
+ - **Repository:** [https://www.github.com/SciPhi-AI/R2R](https://www.github.com/SciPhi-AI/R2R)
26
+ - **Blog:** [https://www.sciphi.ai/blog/triplex](https://www.sciphi.ai/blog/triplex)
27
+ - **Demo:** [kg.sciphi.ai](kg.sciphi.ai)
28
+
29
+
30
+ ```python
31
+
32
+ import json
33
+ from transformers import AutoModelForCausalLM, AutoTokenizer
34
+
35
+
36
+ def triplextract(model, tokenizer, text, entity_types, predicates):
37
+
38
+ input_format = """
39
+ **Entity Types:**
40
+ {entity_types}
41
+
42
+ **Predicates:**
43
+ {predicates}
44
+
45
+ **Text:**
46
+ {text}
47
+ """
48
+
49
+ message = input_format.format(entity_types = json.dumps({"entity_types": entity_types}), predicates = json.dumps({"predicates": predicates}), text = text)
50
+
51
+ messages = [{'role': 'user', 'content': message}]
52
+ input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt = True, return_tensors="pt").to("cuda")
53
+ output = tokenizer.decode(model.generate(input_ids=input_ids, max_length=2048)[0], skip_special_tokens=True)
54
+ print(output)
55
+ return output
56
+
57
+
58
+
59
+ tokenizer = AutoTokenizer.from_pretrained("sciphi/triplex", trust_remote_code=True)
60
+ model = AutoModelForCausalLM.from_pretrained("sciphi/triplex", trust_remote_code=True)
61
+
62
+
63
+ model.to("cuda")
64
+
65
+ model.eval()
66
+
67
+
68
+ entity_types = [ "LOCATION", "POSITION", "DATE", "CITY", "COUNTRY", "NUMBER" ]
69
+
70
+ predicates = [ "POPULATION", "AREA" ]
71
+
72
+ text = """
73
+ San Francisco,[24] officially the City and County of San Francisco, is a commercial, financial, and cultural center in Northern California.
74
+
75
+ With a population of 808,437 residents as of 2022, San Francisco is the fourth most populous city in the U.S. state of California behind Los Angeles, San Diego, and San Jose.
76
+ """
77
+
78
+ prediction = triplextract(model, tokenizer, text, entity_types, predicates)
79
+ print(prediction)
80
+
81
+
82
+ ```