language: | |
- en | |
license: apache-2.0 | |
library_name: peft | |
datasets: | |
- kelm | |
pipeline_tag: text2text-generation | |
base_model: google/flan-t5-xl | |
This is a version of `flan-t5-xl` fine-tuned on the [KELM Corpus](https://github.com/google-research-datasets/KELM-corpus) to take in sentences and output triplets of the form `subject-relation-object` to be used for knowledge graph generation. | |
The model uses custom tokens to delimit triplets: | |
``` | |
special_tokens = ['<triplet>', '</triplet>', '<relation>', '<object>'] | |
tokenizer.add_tokens(special_tokens) | |
``` | |
You can use it like this: | |
``` | |
model = model.to(device) | |
model.eval() | |
new_input = "Hugging Face, Inc. is an American company that develops tools for building applications using machine learning.", | |
inputs = tokenizer(new_input, return_tensors="pt") | |
with torch.no_grad(): | |
outputs = model.generate(input_ids=inputs["input_ids"].to("cuda")) | |
print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=False)[0]) | |
``` | |
Output: `<pad><triplet> Hugging Face <relation> instance of <object> Business </triplet></s>` | |
This model still isn't perfect, and may make mistakes! I'm working on fine-tuning it for longer and on a more diverse set of data. |