add schema
Browse files- README.md +44 -22
- README_CN.md +43 -21
README.md
CHANGED
@@ -9,30 +9,30 @@ zjunlp/knowlm-13b-ie samples around 10% of the data from Chinese-English informa
|
|
9 |
|
10 |
|
11 |
# 2.IE template
|
12 |
-
|
13 |
```python
|
14 |
-
|
15 |
-
0:'
|
16 |
-
1:'
|
17 |
-
2:'
|
18 |
-
3:'
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
19 |
}
|
20 |
|
21 |
-
|
22 |
-
0:
|
23 |
-
1:
|
24 |
-
2:['"实体类型:实体\n"', entity_convert_target2],
|
25 |
-
3:["JSON字符串[{'entity':'', 'entity_type':''}, ]", entity_convert_target3],
|
26 |
}
|
27 |
|
28 |
-
|
29 |
-
0:'
|
30 |
-
1:'
|
31 |
-
}
|
32 |
-
|
33 |
-
entity_int_out_format_en = {
|
34 |
-
0:['(Entity, Type)', entity_convert_target0_en],
|
35 |
-
1:["{'Entity':'', 'Type':''}", entity_convert_target1_en],
|
36 |
}
|
37 |
```
|
38 |
|
@@ -41,9 +41,31 @@ Both the schema and format placeholders ({s_schema} and {s_format}) are embedded
|
|
41 |
For a more comprehensive understanding of the templates, please refer to the files [ner_template.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/ner_template.py)、[re_template.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/re_template.py)、[ee_template.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/ee_template.py) .
|
42 |
|
43 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
44 |
|
45 |
|
46 |
-
#
|
47 |
|
48 |
A script named [convert.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/convert.py) is provided to facilitate the uniform conversion of data into KnowLM instructions. The [data](https://github.com/zjunlp/DeepKE/tree/main/example/llm/InstructKGC/data) directory contains the expected data format for each task before executing convert.py.
|
49 |
|
@@ -61,7 +83,7 @@ python kg2instruction/convert.py \
|
|
61 |
|
62 |
|
63 |
|
64 |
-
#
|
65 |
We provide a script, [inference.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/src/inference.py), for direct inference using the `zjunlp/knowlm-13b-ie model`. Please refer to the [README.md](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/README.md) for environment configuration and other details.
|
66 |
|
67 |
```bash
|
@@ -77,7 +99,7 @@ If GPU memory is not enough, you can use `--bits 8` or `--bits 4`.
|
|
77 |
|
78 |
|
79 |
|
80 |
-
#
|
81 |
|
82 |
We provide a script at [evaluate.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/evaluate.py) to convert the string output of the model into a list and calculate F1
|
83 |
|
|
|
9 |
|
10 |
|
11 |
# 2.IE template
|
12 |
+
RE supports the following templates:
|
13 |
```python
|
14 |
+
relation_template_zh = {
|
15 |
+
0:'已知候选的关系列表:{s_schema},请你根据关系列表,从以下输入中抽取出可能存在的头实体与尾实体,并给出对应的关系三元组。请按照{s_format}的格式回答。',
|
16 |
+
1:'我将给你个输入,请根据关系列表:{s_schema},从输入中抽取出可能包含的关系三元组,并以{s_format}的形式回答。',
|
17 |
+
2:'我希望你根据关系列表从给定的输入中抽取可能的关系三元组,并以{s_format}的格式回答,关系列表={s_schema}。',
|
18 |
+
3:'给定的关系列表是{s_schema}\n根据关系列表抽取关系三元组,在这个句子中可能包含哪些关系三元组?请以{s_format}的格式回答。',
|
19 |
+
}
|
20 |
+
|
21 |
+
relation_int_out_format_zh = {
|
22 |
+
0:['"(头实体,关系,尾实体)"', relation_convert_target0],
|
23 |
+
1:['"头实体是\n关系是\n尾实体是\n\n"', relation_convert_target1],
|
24 |
+
2:['"关系:头实体,尾实体\n"', relation_convert_target2],
|
25 |
+
3:["JSON字符串[{'head':'', 'relation':'', 'tail':''}, ]", relation_convert_target3],
|
26 |
}
|
27 |
|
28 |
+
relation_template_en = {
|
29 |
+
0:'Identify the head entities (subjects) and tail entities (objects) in the following text and provide the corresponding relation triples from relation list {s_schema}. Please provide your answer as a list of relation triples in the form of {s_format}.',
|
30 |
+
1:'From the given text, extract the possible head entities (subjects) and tail entities (objects) and give the corresponding relation triples. The relations are {s_schema}. Please format your answer as a list of relation triples in the form of {s_format}.',
|
|
|
|
|
31 |
}
|
32 |
|
33 |
+
relation_int_out_format_en = {
|
34 |
+
0:['(Subject, Relation, Object)', relation_convert_target0_en],
|
35 |
+
1:["{'head':'', 'relation':'', 'tail':''}", relation_convert_target1_en],
|
|
|
|
|
|
|
|
|
|
|
36 |
}
|
37 |
```
|
38 |
|
|
|
41 |
For a more comprehensive understanding of the templates, please refer to the files [ner_template.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/ner_template.py)、[re_template.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/re_template.py)、[ee_template.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/ee_template.py) .
|
42 |
|
43 |
|
44 |
+
# 3.Common relationship types
|
45 |
+
|
46 |
+
|
47 |
+
```python
|
48 |
+
{
|
49 |
+
'组织': ['别名', '位于', '类型', '成立时间', '解散时间', '成员', '创始人', '事件', '子组织', '产品', '成就', '运营'],
|
50 |
+
'医学': ['别名', '病因', '症状', '可能后果', '包含', '发病部位'],
|
51 |
+
'事件': ['别名', '类型', '发生时间', '发生地点', '参与者', '主办方', '提名者', '获奖者', '赞助者', '获奖作品', '获胜者', '奖项'],
|
52 |
+
'运输': ['别名', '位于', '类型', '属于', '途径', '开通时间', '创建时间', '车站等级', '长度', '面积'],
|
53 |
+
'人造物件': ['别名', '类型', '受众', '成就', '品牌', '产地', '长度', '宽度', '高度', '重量', '价值', '制造商', '型号', '生产时间', '材料', '用途', '发现者或发明者'],
|
54 |
+
'生物': ['别名', '学名', '类型', '分布', '父级分类单元', '主要食物来源', '用途', '长度', '宽度', '高度', '重量', '特征'],
|
55 |
+
'建筑': ['别名', '类型', '位于', '临近', '名称由来', '长度', '宽度', '高度', '面积', '创建时间', '创建者', '成就', '事件'],
|
56 |
+
'自然科学': ['别名', '类型', '性质', '生成物', '用途', '组成', '产地', '发现者或发明者'],
|
57 |
+
'地理地区': ['别名', '类型', '所在行政领土', '接壤', '事件', '面积', '人口', '行政中心', '产业', '气候'],
|
58 |
+
'作品': ['别名', '类型', '受众', '产地', '成就', '导演', '编剧', '演员', '平台', '制作者', '改编自', '包含', '票房', '角色', '作曲者', '作词者', '表演者', '出版时间', '出版商', '作者'],
|
59 |
+
'人物': ['别名', '籍贯', '国籍', '民族', '朝代', '出生时间', '出生地点', '死亡时间', '死亡地点', '专业', '学历', '作品', '职业', '职务', '成就', '所属组织', '父母', '配偶', '兄弟姊妹', '亲属', '同事', '参与'],
|
60 |
+
'天文对象': ['别名', '类型', '坐标', '发现者', '发现时间', '名称由来', '属于', '直径', '质量', '公转周期', '绝对星等', '临近']
|
61 |
+
}
|
62 |
+
```
|
63 |
+
|
64 |
+
Here [schema](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/schema.py) provides 12 text topics and common relationship types under the topic.
|
65 |
+
|
66 |
|
67 |
|
68 |
+
# 4.Convert script
|
69 |
|
70 |
A script named [convert.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/convert.py) is provided to facilitate the uniform conversion of data into KnowLM instructions. The [data](https://github.com/zjunlp/DeepKE/tree/main/example/llm/InstructKGC/data) directory contains the expected data format for each task before executing convert.py.
|
71 |
|
|
|
83 |
|
84 |
|
85 |
|
86 |
+
# 5.Usage
|
87 |
We provide a script, [inference.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/src/inference.py), for direct inference using the `zjunlp/knowlm-13b-ie model`. Please refer to the [README.md](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/README.md) for environment configuration and other details.
|
88 |
|
89 |
```bash
|
|
|
99 |
|
100 |
|
101 |
|
102 |
+
# 6.Evaluate
|
103 |
|
104 |
We provide a script at [evaluate.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/evaluate.py) to convert the string output of the model into a list and calculate F1
|
105 |
|
README_CN.md
CHANGED
@@ -7,32 +7,33 @@ zjunlp/knowlm-13b-ie 从中英文信息抽取数据集中采样约 10% 的数据
|
|
7 |
|
8 |
|
9 |
# 2. 信息抽取模板
|
10 |
-
|
11 |
|
12 |
```python
|
13 |
-
|
14 |
-
0:
|
15 |
-
1:
|
16 |
-
2:
|
17 |
-
3:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
}
|
19 |
|
20 |
-
|
21 |
-
0:
|
22 |
-
1:
|
23 |
-
2: ['"实体类型:实体\n"', entity_convert_target2],
|
24 |
-
3: ["JSON字符串[{'entity':'', 'entity_type':''}, ]", entity_convert_target3],
|
25 |
}
|
26 |
|
27 |
-
|
28 |
-
0:
|
29 |
-
1: '
|
30 |
}
|
31 |
|
32 |
-
entity_int_out_format_en = {
|
33 |
-
0: ['(Entity, Type)', entity_convert_target0_en],
|
34 |
-
1: ["{'Entity':'', 'Type':''}", entity_convert_target1_en],
|
35 |
-
}
|
36 |
```
|
37 |
|
38 |
|
@@ -41,7 +42,28 @@ entity_int_out_format_en = {
|
|
41 |
|
42 |
|
43 |
|
44 |
-
# 3
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
45 |
|
46 |
提供一个名为 [convert.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/convert.py) 的脚本,用于将数据统一转换为可以直接输入 KnowLM 的指令。在执行 convert.py 之前,请参考 [data](https://github.com/zjunlp/DeepKE/tree/main/example/llm/InstructKGC/data) 目录中包含了每个任务的预期数据格式。
|
47 |
|
@@ -57,7 +79,7 @@ python kg2instruction/convert.py \
|
|
57 |
```
|
58 |
|
59 |
|
60 |
-
#
|
61 |
我们提供了可直接使用 `zjunlp/knowlm-13b-ie` 模型进行推理的脚本[inference.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/src/inference.py), 请参考 [README.md](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/README.md) 配置环境等。
|
62 |
|
63 |
```bash
|
@@ -72,7 +94,7 @@ CUDA_VISIBLE_DEVICES="0" python src/inference.py \
|
|
72 |
如果GPU显存不足够, 可以采用 `--bits 8` 或 `--bits 4`
|
73 |
|
74 |
|
75 |
-
#
|
76 |
我们提供一个位于 [evaluate.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/evaluate.py) 的脚本,用于将模型的字符串输出转换为列表并计算 F1 分数。
|
77 |
|
78 |
```bash
|
|
|
7 |
|
8 |
|
9 |
# 2. 信息抽取模板
|
10 |
+
关系抽取(RE)支持以下模板:
|
11 |
|
12 |
```python
|
13 |
+
relation_template_zh = {
|
14 |
+
0:'已知候选的关系列表:{s_schema},请你根据关系列表,从以下输入中抽取出可能存在的头实体与尾实体,并给出对应的关系三元组。请按照{s_format}的格式回答。',
|
15 |
+
1:'我将给你个输入,请根据关系列表:{s_schema},从输入中抽取出可能包含的关系三元组,并以{s_format}的形式回答。',
|
16 |
+
2:'我希望你根据关系列表从给定的输入中抽取可能的关系三元组,并以{s_format}的格式回答,关系列表={s_schema}。',
|
17 |
+
3:'给定的关系列表是{s_schema}\n根据关系列表抽取关系三元组,在这个句子中可能包含哪些关系三元组?请以{s_format}的格式回答。',
|
18 |
+
}
|
19 |
+
|
20 |
+
relation_int_out_format_zh = {
|
21 |
+
0:['"(头实体,关系,尾实体)"', relation_convert_target0],
|
22 |
+
1:['"头实体是\n关系是\n尾实体是\n\n"', relation_convert_target1],
|
23 |
+
2:['"关系:头实体,尾实体\n"', relation_convert_target2],
|
24 |
+
3:["JSON字符串[{'head':'', 'relation':'', 'tail':''}, ]", relation_convert_target3],
|
25 |
}
|
26 |
|
27 |
+
relation_template_en = {
|
28 |
+
0:'Identify the head entities (subjects) and tail entities (objects) in the following text and provide the corresponding relation triples from relation list {s_schema}. Please provide your answer as a list of relation triples in the form of {s_format}.',
|
29 |
+
1:'From the given text, extract the possible head entities (subjects) and tail entities (objects) and give the corresponding relation triples. The relations are {s_schema}. Please format your answer as a list of relation triples in the form of {s_format}.',
|
|
|
|
|
30 |
}
|
31 |
|
32 |
+
relation_int_out_format_en = {
|
33 |
+
0:['(Subject, Relation, Object)', relation_convert_target0_en],
|
34 |
+
1:["{'head':'', 'relation':'', 'tail':''}", relation_convert_target1_en],
|
35 |
}
|
36 |
|
|
|
|
|
|
|
|
|
37 |
```
|
38 |
|
39 |
|
|
|
42 |
|
43 |
|
44 |
|
45 |
+
# 3.常见的关系类型
|
46 |
+
|
47 |
+
```python
|
48 |
+
{
|
49 |
+
'组织': ['别名', '位于', '类型', '成立时间', '解散时间', '成员', '创始人', '事件', '子组织', '产品', '成就', '运营'],
|
50 |
+
'医学': ['别名', '病因', '症状', '可能后果', '包含', '发病部位'],
|
51 |
+
'事件': ['别名', '类型', '发生时间', '发生地点', '参与者', '主办方', '提名者', '获奖者', '赞助者', '获奖作品', '获胜者', '奖项'],
|
52 |
+
'运输': ['别名', '位于', '类型', '属于', '途径', '开通时间', '创建时间', '车站等级', '长度', '面积'],
|
53 |
+
'人造物件': ['别名', '类型', '受众', '成就', '品牌', '产地', '长度', '宽度', '高度', '重量', '价值', '制造商', '型号', '生产时间', '材料', '用途', '发现者或发明者'],
|
54 |
+
'生物': ['别名', '学名', '类型', '分布', '父级分类单元', '主要食物来源', '用途', '长度', '宽度', '高度', '重量', '特征'],
|
55 |
+
'建筑': ['别名', '类型', '位于', '临近', '名称由来', '长度', '宽度', '高度', '面积', '创建时间', '创建者', '成就', '事件'],
|
56 |
+
'自然科学': ['别名', '类型', '性质', '生成物', '用途', '组成', '产地', '发现者或发明者'],
|
57 |
+
'地理地区': ['别名', '类型', '所在行政领土', '接壤', '事件', '面积', '人口', '行政中心', '产业', '气候'],
|
58 |
+
'作品': ['别名', '类型', '受众', '产地', '成就', '导演', '编剧', '演员', '平台', '制作者', '改编自', '包含', '票房', '角色', '作曲者', '作词者', '表演者', '出版时间', '出版商', '作者'],
|
59 |
+
'人物': ['别名', '籍贯', '国籍', '民族', '朝代', '出生时间', '出生地点', '死亡时间', '死亡地点', '专业', '学历', '作品', '职业', '职务', '成就', '所属组织', '父母', '配偶', '兄弟姊妹', '亲属', '同事', '参与'],
|
60 |
+
'天文对象': ['别名', '类型', '坐标', '发现者', '发现时间', '名称由来', '属于', '直径', '质量', '公转周期', '绝对星等', '临近']
|
61 |
+
}
|
62 |
+
```
|
63 |
+
|
64 |
+
此处 [schema](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/schema.py) 提供了12种文本主题, 以及该主题下常见的关系类型。
|
65 |
+
|
66 |
+
# 4. 转换脚本
|
67 |
|
68 |
提供一个名为 [convert.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/convert.py) 的脚本,用于将数据统一转换为可以直接输入 KnowLM 的指令。在执行 convert.py 之前,请参考 [data](https://github.com/zjunlp/DeepKE/tree/main/example/llm/InstructKGC/data) 目录中包含了每个任务的预期数据格式。
|
69 |
|
|
|
79 |
```
|
80 |
|
81 |
|
82 |
+
# 5. 使用
|
83 |
我们提供了可直接使用 `zjunlp/knowlm-13b-ie` 模型进行推理的脚本[inference.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/src/inference.py), 请参考 [README.md](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/README.md) 配置环境等。
|
84 |
|
85 |
```bash
|
|
|
94 |
如果GPU显存不足够, 可以采用 `--bits 8` 或 `--bits 4`
|
95 |
|
96 |
|
97 |
+
# 6. 评估
|
98 |
我们提供一个位于 [evaluate.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/evaluate.py) 的脚本,用于将模型的字符串输出转换为列表并计算 F1 分数。
|
99 |
|
100 |
```bash
|