habiakl
commited on
Commit
•
3cbe94e
1
Parent(s):
eaf800d
Add model weights
Browse files- README.md +32 -0
- config.json +3 -0
- pytorch_model.bin +3 -0
- special_tokens_map.json +3 -0
- tokenizer.json +3 -0
- tokenizer_config.json +3 -0
- training_args.bin +3 -0
- vocab.txt +0 -0
README.md
ADDED
@@ -0,0 +1,32 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Financial Relation Extraction
|
2 |
+
|
3 |
+
## Process
|
4 |
+
|
5 |
+
Detecting the presence of a relationship between financial terms and qualifying the relationship in case of its presence. Example use cases:
|
6 |
+
|
7 |
+
* An A-B trust is a joint trust created by a married couple for the purpose of minimizing estate taxes. (<em>Relationship **exists**, type: **is**</em>)
|
8 |
+
* There are no withdrawal penalties. (<em>Relationship **does not exist**, type: **x**</em>)
|
9 |
+
|
10 |
+
## Data
|
11 |
+
The data consists of financial definitions collected from different sources (Wikimedia, IFRS, Investopedia) for financial indicators. Each definition has been split up into sentences, and term relationships in a sentence have been extracted using the [Stanford Open Information Extraction](https://nlp.stanford.edu/software/openie.html) module.
|
12 |
+
A typical row in the dataset consists of a definition sentence and its corresponding relationship label.
|
13 |
+
The labels were restricted to the 5 most-widely identified relationships, namely: **x** (no relationship), **has**, **is in**, **is** and **are**.
|
14 |
+
|
15 |
+
|
16 |
+
## Model
|
17 |
+
The model used is a standard Roberta-base transformer model from the Hugging Face library. See [HUGGING FACE DistilBERT base model](https://huggingface.co/distilbert-base-uncased) for more details about the model.
|
18 |
+
In addition, the model has been pretrained to initializa weigths that would otherwise be unused if loaded from an existing pretrained stock model.
|
19 |
+
|
20 |
+
## Metrics
|
21 |
+
The evaluation metrics used are: Precision, Recall and F1-score. The following is the classification report on the test set.
|
22 |
+
|
23 |
+
| relation | precision | recall | f1-score | support |
|
24 |
+
| ------------- |:-------------:|:-------------:|:-------------:| -----:|
|
25 |
+
| has | 0.7416 | 0.9674 | 0.8396 | 2362 |
|
26 |
+
| is in | 0.7813 | 0.7925 | 0.7869 | 2362 |
|
27 |
+
| is | 0.8650 | 0.6863 | 0.7653 | 2362 |
|
28 |
+
| are | 0.8365 | 0.8493 | 0.8429 | 2362 |
|
29 |
+
| x | 0.9515 | 0.8302 | 0.8867 | 2362 |
|
30 |
+
| | | | | |
|
31 |
+
| macro avg | 0.8352 | 0.8251 | 0.8243 | 11810 |
|
32 |
+
| weighted avg | 0.8352 | 0.8251 | 0.8243 | 11810 |
|
config.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:07f7615afabda7ff754ea77e3a06a2d218132bfcc3aa42e22f22ac1585bd7718
|
3 |
+
size 774
|
pytorch_model.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:ff72da51e34eb2d892c303c6f5f7beed57b13965a4c9a0e1379fb95654da8d30
|
3 |
+
size 267872407
|
special_tokens_map.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:303df45a03609e4ead04bc3dc1536d0ab19b5358db685b6f3da123d05ec200e3
|
3 |
+
size 112
|
tokenizer.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:ecafc709dc78a0d00e3bc20477606e97ccfd239fdfddd7e53fcb24300ba0bc13
|
3 |
+
size 466247
|
tokenizer_config.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:87fab29eb94840215d6b277994841550362ceff337f16bbf44e9af30fd2fb62d
|
3 |
+
size 291
|
training_args.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:e2ab6d3f261a834531ac404acb765265a52a8016338c732b78daa1f299bf6002
|
3 |
+
size 2415
|
vocab.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|