ghh001 commited on
Commit
c96ca5c
1 Parent(s): 0f13d4b

add Dataset

Browse files
Files changed (2) hide show
  1. README.md +33 -3
  2. README_CN.md +31 -5
README.md CHANGED
@@ -1,6 +1,18 @@
1
  ---
2
  license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
4
  # 1.Differences from knowlm-13b-zhixi
5
  Compared to zjunlp/knowlm-13b-zhixi, zjunlp/knowlm-13b-ie exhibits slightly stronger practicality in information extraction but with a decrease in its general applicability.
6
 
@@ -94,7 +106,25 @@ python kg2instruction/convert_test.py \
94
  ```
95
 
96
 
97
- # 5.Usage
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
98
  We provide a script, [inference.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/src/inference.py), for direct inference using the `zjunlp/knowlm-13b-ie model`. Please refer to the [README.md](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/README.md) for environment configuration and other details.
99
 
100
  ```bash
@@ -110,7 +140,7 @@ If GPU memory is not enough, you can use `--bits 8` or `--bits 4`.
110
 
111
 
112
 
113
- # 6.Evaluate
114
 
115
  We provide a script at [evaluate.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/evaluate.py) to convert the string output of the model into a list and calculate F1
116
 
 
1
  ---
2
  license: apache-2.0
3
+ ----
4
+
5
+
6
+ - [1.Differences from knowlm-13b-zhixi](#1differences-from-knowlm-13b-zhixi)
7
+ - [2.IE template](#2ie-template)
8
+ - [3.Common relationship types](#3common-relationship-types)
9
+ - [4.Convert script](#4convert-script)
10
+ - [5.Datasets](#5datasets)
11
+ - [6.Usage](#6usage)
12
+ - [7.Evaluate](#7evaluate)
13
+
14
+
15
+
16
  # 1.Differences from knowlm-13b-zhixi
17
  Compared to zjunlp/knowlm-13b-zhixi, zjunlp/knowlm-13b-ie exhibits slightly stronger practicality in information extraction but with a decrease in its general applicability.
18
 
 
106
  ```
107
 
108
 
109
+ # 5.Datasets
110
+
111
+
112
+ Below are some readily processed datasets:
113
+
114
+ | Name | Download Links | Quantity | Description |
115
+ | ------------------- | ---------------------------------------------------------------------------------------------------------------------- | ------ | -------------------------------------------------------------------------------------------------------------------------------------------------------- |
116
+ | KnowLM-IE.json | [Google Drive](https://drive.google.com/file/d/1hY_R6aFgW4Ga7zo41VpOVOShbTgBqBbL/view?usp=sharing) <br/> [HuggingFace](https://huggingface.co/datasets/zjunlp/KnowLM-IE) | 281860 | Dataset mentioned in [InstructIE](https://arxiv.org/abs/2305.11527) |
117
+ | KnowLM-ke | [HuggingFace](hhttps://huggingface.co/datasets/zjunlp/knowlm-ke) | XXXX | Contains all instruction data (General, IE, Code, COT, etc.) used for training [zjunlp/knowlm-13b-zhixi](https://huggingface.co/zjunlp/knowlm-13b-zhixi) |
118
+
119
+
120
+ `KnowLM-IE.json`: Contains fields such as `'id'` (unique identifier), `'cate'` (text category), `'instruction'` (extraction instruction), `'input'` (input text), `'output'` (output text), and `'relation'` (triples). The `'relation'` field can be used to construct extraction instructions and outputs freely. `'instruction'` has 16 formats (4 prompts * 4 output formats), and `'output'` is generated in the specified format from `'instruction'`.
121
+
122
+ `KnowLM-ke`: Contains fields `'instruction'`, `'input'`, and `'output'` only. The files `ee-en.json`, `ee_train.json`, `ner-en.json`, `ner_train.json`, `re-en.json`, and `re_train.json` under its directory contain Chinese-English IE instruction data.
123
+
124
+
125
+
126
+
127
+ # 6.Usage
128
  We provide a script, [inference.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/src/inference.py), for direct inference using the `zjunlp/knowlm-13b-ie model`. Please refer to the [README.md](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/README.md) for environment configuration and other details.
129
 
130
  ```bash
 
140
 
141
 
142
 
143
+ # 7.Evaluate
144
 
145
  We provide a script at [evaluate.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/evaluate.py) to convert the string output of the model into a list and calculate F1
146
 
README_CN.md CHANGED
@@ -1,4 +1,13 @@
1
- # 1. knowlm-13b-zhixi 的区别
 
 
 
 
 
 
 
 
 
2
 
3
  与 zjunlp/knowlm-13b-zhixi 相比,zjunlp/knowlm-13b-ie 在信息抽取方面表现出略强的实用性,但其一般适用性下降。
4
 
@@ -6,7 +15,7 @@ zjunlp/knowlm-13b-ie 从中英文信息抽取数据集中采样约 10% 的数据
6
 
7
 
8
 
9
- # 2. 信息抽取模板
10
  关系抽取(RE)支持以下模板:
11
 
12
  ```python
@@ -63,7 +72,7 @@ relation_int_out_format_en = {
63
 
64
  此处 [schema](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/schema.py) 提供了12种文本主题, 以及该主题下常见的关系类型。
65
 
66
- # 4. 转换脚本
67
 
68
  提供一个名为 [convert.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/convert.py)、[convert_test.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/convert_test.py) 的脚本,用于将数据统一转换为可以直接输入 KnowLM 的指令。在执行 convert.py 之前,请参考 [data](https://github.com/zjunlp/DeepKE/tree/main/example/llm/InstructKGC/data) 目录中包含了每个任务的预期数据格式。
69
 
@@ -91,7 +100,24 @@ python kg2instruction/convert_test.py \
91
  ```
92
 
93
 
94
- # 5. 使用
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
95
  我们提供了可直接使用 `zjunlp/knowlm-13b-ie` 模型进行推理的脚本[inference.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/src/inference.py), 请参考 [README.md](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/README.md) 配置环境等。
96
 
97
  ```bash
@@ -106,7 +132,7 @@ CUDA_VISIBLE_DEVICES="0" python src/inference.py \
106
  如果GPU显存不足够, 可以采用 `--bits 8` 或 `--bits 4`
107
 
108
 
109
- # 6. 评估
110
  我们提供一个位于 [evaluate.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/evaluate.py) 的脚本,用于将模型的字符串输出转换为列表并计算 F1 分数。
111
 
112
  ```bash
 
1
+ - [1.与 knowlm-13b-zhixi 的区别](#1与-knowlm-13b-zhixi-的区别)
2
+ - [2.信息抽取模板](#2信息抽取模板)
3
+ - [3.常见的关系类型](#3常见的关系类型)
4
+ - [4.转换脚本](#4转换脚本)
5
+ - [5.现成数据集](#5现成数据集)
6
+ - [6.使用](#6使用)
7
+ - [7.评估](#7评估)
8
+
9
+
10
+ # 1.与 knowlm-13b-zhixi 的区别
11
 
12
  与 zjunlp/knowlm-13b-zhixi 相比,zjunlp/knowlm-13b-ie 在信息抽取方面表现出略强的实用性,但其一般适用性下降。
13
 
 
15
 
16
 
17
 
18
+ # 2.信息抽取模板
19
  关系抽取(RE)支持以下模板:
20
 
21
  ```python
 
72
 
73
  此处 [schema](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/schema.py) 提供了12种文本主题, 以及该主题下常见的关系类型。
74
 
75
+ # 4.转换脚本
76
 
77
  提供一个名为 [convert.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/convert.py)、[convert_test.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/convert_test.py) 的脚本,用于将数据统一转换为可以直接输入 KnowLM 的指令。在执行 convert.py 之前,请参考 [data](https://github.com/zjunlp/DeepKE/tree/main/example/llm/InstructKGC/data) 目录中包含了每个任务的预期数据格式。
78
 
 
100
  ```
101
 
102
 
103
+ # 5.现成数据集
104
+
105
+ 下面是一些现成的处理后的数据:
106
+
107
+ | 名称 | 下载 | 数量 | 描述 |
108
+ | ------------------- | ---------------------------------------------------------------------------------------------------------------------- | ------ | -------------------------------------------------------------------------------------------------------------------------------------------------------- |
109
+ | KnowLM-IE.json | [Google drive](https://drive.google.com/file/d/1hY_R6aFgW4Ga7zo41VpOVOShbTgBqBbL/view?usp=sharing) <br/> [HuggingFace](https://huggingface.co/datasets/zjunlp/KnowLM-IE) | 281860 | [InstructIE](https://arxiv.org/abs/2305.11527) 中提到的数据集 |
110
+ | KnowLM-ke | [HuggingFace](hhttps://huggingface.co/datasets/zjunlp/knowlm-ke) | XXXX | 训练[zjunlp/knowlm-13b-zhixi](https://huggingface.co/zjunlp/knowlm-13b-zhixi)所用到的所有指令数据(通用、IE、Code、COT等) |
111
+
112
+
113
+ `KnowLM-IE.json`:包含 `'id'`(唯一标识符)、`'cate'`(文本主题)、`'instruction'`(抽取指令)、`'input'`(输入文本)、`'output'`(输出文本)字段、`'relation'`(三元组)字段,可以通过`'relation'`自由构建抽取的指令和输出,`'instruction'`有16种格式(4种prompt * 4种输出格式),`'output'`是按照`'instruction'`中指定的输出格式生成的文本。
114
+
115
+
116
+ `KnowLM-ke`:仅包含`'instruction'`、`'input'`、`'output'`字段。其目录下的`ee-en.json`、`ee_train.json`、`ner-en.json`、`ner_train.json`、`re-en.json`、`re_train.json`为中英文IE指令数据。
117
+
118
+
119
+
120
+ # 6.使用
121
  我们提供了可直接使用 `zjunlp/knowlm-13b-ie` 模型进行推理的脚本[inference.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/src/inference.py), 请参考 [README.md](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/README.md) 配置环境等。
122
 
123
  ```bash
 
132
  如果GPU显存不足够, 可以采用 `--bits 8` 或 `--bits 4`
133
 
134
 
135
+ # 7.评估
136
  我们提供一个位于 [evaluate.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/evaluate.py) 的脚本,用于将模型的字符串输出转换为列表并计算 F1 分数。
137
 
138
  ```bash