zjunlp
/

knowlm-13b-ie

@@ -162,7 +162,14 @@ wiki_cate_schema_zh = {
 # 5.转换脚本
-我们提供了一个名为 [convert.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/convert.py) 的脚本，用于将数据统一转换为可以直接输入 KnowLM 的指令。在执行 convert.py 之前，请参考 [data](https://github.com/zjunlp/DeepKE/tree/main/example/llm/InstructKGC/data) 目录中包含了每个任务的预期数据格式。
 ```bash
 python kg2instruction/convert.py \
@@ -177,6 +184,8 @@ python kg2instruction/convert.py \
   --random_sort        # 是否对指令中的schema列表进行随机排序
 ```
 `schema_path`指定schema文件(json文件)路径, schema文件共包含3行json字符串, 以固定的格式组织schema信息。这里以NER任务为例, 每行的含义如下:
@@ -215,8 +224,14 @@ python kg2instruction/convert.py \
 </details>
-[convert_test.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/convert_test.py) 不要求数据具有标签(`entity`、`relation`、`event`)字段, 只需要具有 `input` 字段, 以及提供 `schema_path`, 适合用来处理测试数据。
 ```bash
 python kg2instruction/convert_test.py \
@@ -228,6 +243,7 @@ python kg2instruction/convert_test.py \
     --sample 0
 ```
 以下是一个实体识别（NER）任务数据转换的示例：

 # 5.转换脚本
+**训练数据转换**
+在对模型进行数据输入之前，需要将**数据格式化**以包含`instruction`和`input`字段。为此，我们提供了一个脚本 [kg2instruction/convert.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/convert.py)，它可以将数据批量转换成模型可以直接使用的格式。
+> 在使用 [kg2instruction/convert.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/convert.py) 脚本之前，请确保参考了 [data](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/data) 目录。该目录详细说明了每种任务所需的数据格式要求。请参考 sample.json 以了解转换前数据的格式，schema.json 则展示了 schema 的组织结构，而 processed.json 则描述了转换后的数据格式。
 ```bash
 python kg2instruction/convert.py \
   --random_sort        # 是否对指令中的schema列表进行随机排序
 ```
+**负采样**: 假设数据集 A 包含标签 [a，b，c，d，e，f]，对于某个给定的样本 s，它可能仅涉及标签 a 和 b。我们的目标是随机从候选关系列表中引入一些原本与 s 无关的关系，比如 c 和 d。然而，值得注意的是，在输出中，c 和 d 的标签要么不被输出，要么输出为`NAN`。
 `schema_path`指定schema文件(json文件)路径, schema文件共包含3行json字符串, 以固定的格式组织schema信息。这里以NER任务为例, 每行的含义如下:
 </details>
+更详细的schema文件信息可在[data](./data)目录下各个任务目录的`schema.json`文件中查看。
+**测试数据转换**
+对于**测试数据**，可以使用 [kg2instruction/convert_test.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/convert_test.py) 脚本，它不要求数据包含标签（`entity`、`relation`、`event`）字段，**只需**提供`input`字段和相应的`schema_path`。
 ```bash
 python kg2instruction/convert_test.py \
     --sample 0
 ```
+**数据转换实例**
 以下是一个实体识别（NER）任务数据转换的示例：