|
--- |
|
license: apache-2.0 |
|
--- |
|
|
|
# Triplex |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
Triplex is a model for creating knowledge graphs from unstructured data developed by [SciPhi.AI](https://www.sciphi.ai). |
|
|
|
It works by extracting triplets - simple statements consisting of a subject, predicate, and object - from text or other data sources. |
|
|
|
Try the demo here: [kg.sciphi.ai](https://kg.sciphi.ai) |
|
|
|
## Model Details |
|
|
|
It is a finetuned version of Phi3-3.8B on a high quality proprietary dataset constructed using DBPedia, Wikidata, and other data sources. |
|
|
|
### Model Sources |
|
|
|
- **Blog:** [https://www.sciphi.ai/blog/triplex](https://www.sciphi.ai/blog/triplex) |
|
- **Demo:** [kg.sciphi.ai](kg.sciphi.ai) |
|
- **R2R Repository:** [https://www.github.com/SciPhi-AI/R2R](https://www.github.com/SciPhi-AI/R2R) |
|
|
|
```python |
|
|
|
import json |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
def triplextract(model, tokenizer, text, entity_types, predicates): |
|
|
|
input_format = """ |
|
**Entity Types:** |
|
{entity_types} |
|
|
|
**Predicates:** |
|
{predicates} |
|
|
|
**Text:** |
|
{text} |
|
""" |
|
|
|
message = input_format.format( |
|
entity_types = json.dumps({"entity_types": entity_types}), |
|
predicates = json.dumps({"predicates": predicates}), |
|
text = text) |
|
|
|
messages = [{'role': 'user', 'content': message}] |
|
input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt = True, return_tensors="pt").to("cuda") |
|
output = tokenizer.decode(model.generate(input_ids=input_ids, max_length=2048)[0], skip_special_tokens=True) |
|
return output |
|
|
|
model = AutoModelForCausalLM.from_pretrained("sciphi/triplex", trust_remote_code=True).to('cuda').eval() |
|
tokenizer = AutoTokenizer.from_pretrained("sciphi/triplex", trust_remote_code=True) |
|
|
|
entity_types = [ "LOCATION", "POSITION", "DATE", "CITY", "COUNTRY", "NUMBER" ] |
|
predicates = [ "POPULATION", "AREA" ] |
|
text = """ |
|
San Francisco,[24] officially the City and County of San Francisco, is a commercial, financial, and cultural center in Northern California. |
|
|
|
With a population of 808,437 residents as of 2022, San Francisco is the fourth most populous city in the U.S. state of California behind Los Angeles, San Diego, and San Jose. |
|
""" |
|
|
|
prediction = triplextract(model, tokenizer, text, entity_types, predicates) |
|
print(prediction) |
|
|
|
|
|
``` |