--- license: apache-2.0 language: - en pipeline_tag: text-classification library_name: transformers --- # yizhao-risk-en-scorer ## Introduction This is a BERT model fine-tuned on a high-quality English financial dataset. It generates a security risk score, which helps to identify and remove data with security risks from financial datasets, thereby reducing the proportion of illegal or undesirable data. For the complete data cleaning process, please refer to [YiZhao](https://github.com/HITsz-TMG/YiZhao). ## Quickstart Here is an example code snippet for generating security risk scores using this model. ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification text = "You are a smart robot" risk_model_name = "risk-model-en-v0.1" risk_tokenizer = AutoTokenizer.from_pretrained(risk_model_name) risk_model = AutoModelForSequenceClassification.from_pretrained(risk_model_name) risk_inputs = risk_tokenizer(text, return_tensors="pt", padding="longest", truncation=True) risk_outputs = risk_model(**risk_inputs) risk_logits = risk_outputs.logits.squeeze(-1).float().detach().numpy() risk_score = risk_logits.item() result = { "text": text, "risk_score": risk_score } print(result) # {'text': 'You are a smart robot', 'risk_score': 0.11226219683885574} ```