ghh001 commited on
Commit
2982506
1 Parent(s): 53cc723

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +67 -1
README.md CHANGED
@@ -1,6 +1,72 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
4
  Compared to zjunlp/knowlm-13b-zhixi, zjunlp/knowlm-13b-ie exhibits slightly stronger practicality in information extraction but with a decrease in its general applicability.
5
 
6
- zjunlp/knowlm-13b-ie samples around 10% of the data from Chinese-English information extraction datasets, which then undergo negative sampling. For instance, if dataset A contains labels [a, b, c, d, e, f], we first sample 10% of the data from A. For a given sample 's', it might only contain labels a and b. We randomly add relationships that it doesn't originally have, such as c and d, from the specified list of relation candidates. When encountering these additional relationships, the model might output text similar to 'NAN'.This method equips the model with the ability to generate 'NAN' outputs to a certain extent, enhancing its information extraction capability while weakening its generalization ability.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+ # 1.Differences from knowlm-13b-zhixi
5
  Compared to zjunlp/knowlm-13b-zhixi, zjunlp/knowlm-13b-ie exhibits slightly stronger practicality in information extraction but with a decrease in its general applicability.
6
 
7
+ zjunlp/knowlm-13b-ie samples around 10% of the data from Chinese-English information extraction datasets, which then undergo negative sampling. For instance, if dataset A contains labels [a, b, c, d, e, f], we first sample 10% of the data from A. For a given sample 's', it might only contain labels a and b. We randomly add relationships that it doesn't originally have, such as c and d, from the specified list of relation candidates. When encountering these additional relationships, the model might output text similar to 'NAN'.This method equips the model with the ability to generate 'NAN' outputs to a certain extent, enhancing its information extraction capability while weakening its generalization ability.
8
+
9
+
10
+
11
+ # 2.IE template
12
+ NER supports the following templates:
13
+ ```python
14
+ entity_template_zh = {
15
+ 0:'已知候选的实体类型列表:{s_schema},请你根据实体类型列表,从以下输入中抽取出可能存在的实体。请按照{s_format}的格式回答。',
16
+ 1:'我将给你个输入,请根据实体类型列表:{s_schema},从输入中抽取出可能包含的实体,并以{s_format}的形式回答。',
17
+ 2:'我希望你根据实体类型列表从给定的输入中抽取可能的实体,并以{s_format}的格式回答,实体类型列表={s_schema}。',
18
+ 3:'给定的实体类型列表是{s_schema}\n根据实体类型列表抽取,在这个句子中可能包含哪些实体?你可以先别出实体, 再判断实体类型。请以{s_format}的格式回答。',
19
+ }
20
+
21
+ entity_int_out_format_zh = {
22
+ 0:['"(实体,实体类型)"', entity_convert_target0],
23
+ 1:['"实体是\n实体类型是\n\n"', entity_convert_target1],
24
+ 2:['"实体:实体类型\n"', entity_convert_target2],
25
+ 3:["JSON字符串[{'entity':'', 'entity_type':''}, ]", entity_convert_target3],
26
+ }
27
+
28
+ entity_template_en = {
29
+ 0:'Identify the entities and types in the following text and where entity type list {s_schema}. Please provide your answerin the form of {s_format}.',
30
+ 1:'From the given text, extract the possible entities and types . The types are {s_schema}. Please format your answerin the form of {s_format}.',
31
+ }
32
+
33
+ entity_int_out_format_en = {
34
+ 0:['(Entity, Type)', entity_convert_target0_en],
35
+ 1:["{'Entity':'', 'Type':''}", entity_convert_target1_en],
36
+ }
37
+ ```
38
+
39
+ The schema and format are embedded in the template({s_schema}、{s_format}) and need to be specified by the user themselves.
40
+
41
+ Please refer to [ner_template.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/ner_template.py)、[re_template.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/re_template.py)、[ee_template.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/ee_template.py) for more details about the templates.
42
+
43
+
44
+ # 3.Convert script
45
+ We have provided a script at [convert.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/convert.py) to uniformly convert data into KnowLM instructions.
46
+
47
+ The [data](https://github.com/zjunlp/DeepKE/tree/main/example/llm/InstructKGC/data) directory contains the expected data format for each task before executing convert.py
48
+
49
+ ```bash
50
+ python kg2instruction/convert.py \
51
+ --src_path data/NER/sample.json \
52
+ --tgt_path data/NER/processed.json \
53
+ --schema_path data/NER/schema.json \
54
+ --language zh \
55
+ --task NER \
56
+ --sample 0 \
57
+ --all
58
+ ```
59
+
60
+ # 4.Evaluate
61
+
62
+ We provide a script at [evaluate.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/evaluate.py) to convert the string output of the model into a list and calculate F1
63
+
64
+ ```bash
65
+ python kg2instruction/evaluate.py \
66
+ --standard_path data/NER/processed.json \
67
+ --submit_path data/NER/processed.json \
68
+ --task ner \
69
+ --language zh
70
+ ```
71
+
72
+