This is a version of flan-t5-xl
fine-tuned on the KELM Corpus to take in sentences and output triplets of the form subject-relation-object
to be used for knowledge graph generation.
The model uses custom tokens to delimit triplets:
special_tokens = ['<triplet>', '</triplet>', '<relation>', '<object>']
tokenizer.add_tokens(special_tokens)
You can use it like this:
model = model.to(device)
model.eval()
new_input = "Hugging Face, Inc. is an American company that develops tools for building applications using machine learning.",
inputs = tokenizer(new_input, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"))
print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=False)[0])
Output: <pad><triplet> Hugging Face <relation> instance of <object> Business </triplet></s>
This model still isn't perfect, and may make mistakes! I'm working on fine-tuning it for longer and on a more diverse set of data.
- Downloads last month
- 379
Inference API (serverless) does not yet support peft models for this pipeline type.
Model tree for bew/t5_sentence_to_triplet_xl
Base model
google/flan-t5-xl