--- license: apache-2.0 datasets: - kelm language: - en library_name: peft pipeline_tag: text2text-generation --- This is a version of `flan-t5-xl` fine-tuned on the [KELM Corpus](https://github.com/google-research-datasets/KELM-corpus) to take in sentences and output triplets of the form `subject-relation-object` to be used for knowledge graph generation. The model uses custom tokens to delimit triplets: ``` special_tokens = ['', '', '', ''] tokenizer.add_tokens(special_tokens) ``` You can use it like this: ``` model = model.to(device) model.eval() new_input = "Hugging Face, Inc. is an American company that develops tools for building applications using machine learning.", inputs = tokenizer(new_input, return_tensors="pt") with torch.no_grad(): outputs = model.generate(input_ids=inputs["input_ids"].to("cuda")) print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=False)[0]) ``` Output: ` Hugging Face instance of Business ` This model still isn't perfect, and may make mistakes! I'm working on fine-tuning it for longer and on a more diverse set of data.