ghh001 commited on
Commit
533f9fb
1 Parent(s): 7f82448

Update README_EN.md

Browse files
Files changed (1) hide show
  1. README_EN.md +15 -2
README_EN.md CHANGED
@@ -178,8 +178,11 @@ Here [schema](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC
178
 
179
  # 5.Convert script
180
 
181
- A script named [convert.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/convert.py) is provided to facilitate the uniform conversion of data into KnowLM instructions. The [data](https://github.com/zjunlp/DeepKE/tree/main/example/llm/InstructKGC/data) directory contains the expected data format for each task before executing convert.py.
182
 
 
 
 
183
 
184
 
185
  ```bash
@@ -196,8 +199,11 @@ python kg2instruction/convert.py \
196
  ```
197
 
198
 
 
 
199
  The `schema_path` specifies the path to a schema file (a JSON file). The schema file consists of three lines of JSON strings, organized in a fixed format. Taking Named Entity Recognition (NER) as an example, the meanings of each line are as follows:
200
 
 
201
  ```
202
  ["BookTitle", "Address", "Movie", ...] # List of entity types
203
  [] # Empty list
@@ -238,8 +244,13 @@ For Event Extraction with Arguments (EEA) tasks:
238
  </details>
239
 
240
 
 
 
 
 
 
 
241
 
242
- [convert_test.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/convert_test.py) does not require data to have label (`entity`, `relation`, `event`) fields, only needs to have an `input` field and provide a `schema_path` is suitable for processing test data.
243
 
244
  ```bash
245
  python kg2instruction/convert_test.py \
@@ -251,6 +262,8 @@ python kg2instruction/convert_test.py \
251
  --sample 0
252
  ```
253
 
 
 
254
 
255
  Here is an example of data conversion for Named Entity Recognition (NER) task:
256
 
 
178
 
179
  # 5.Convert script
180
 
181
+ **Training Data Transformation**
182
 
183
+ Before inputting data into the model, it needs to be formatted to include `instruction` and `input` fields. To assist with this, we offer a script [kg2instruction/convert.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/convert.py), which can batch convert data into a format directly usable by the model.
184
+
185
+ > Before using the [kg2instruction/convert.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/convert.py) script, please ensure you have referred to the [data](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/data) directory. Please consult `sample.json` to understand the format of the data before conversion, `schema.json` illustrates the organization of the schema, and `processed.json` describes the format of the data after conversion.
186
 
187
 
188
  ```bash
 
199
  ```
200
 
201
 
202
+ **Negative Sampling**: Assuming dataset A contains labels [a, b, c, d, e, f], for a given sample s, it might involve only labels a and b. Our objective is to randomly introduce some relationships from the candidate relationship list that were originally unrelated to s, such as c and d. However, it's worth noting that in the output, the labels for c and d either won't be included, or they will be output as `NAN`.
203
+
204
  The `schema_path` specifies the path to a schema file (a JSON file). The schema file consists of three lines of JSON strings, organized in a fixed format. Taking Named Entity Recognition (NER) as an example, the meanings of each line are as follows:
205
 
206
+
207
  ```
208
  ["BookTitle", "Address", "Movie", ...] # List of entity types
209
  [] # Empty list
 
244
  </details>
245
 
246
 
247
+ For more detailed information on the schema file, you can refer to the `schema.json` file in the respective task directories under the [data](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/data) directory.
248
+
249
+
250
+ **Testing Data Transformation**
251
+
252
+ For test data, you can use the [kg2instruction/convert_test.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/convert_test.py) script, which does not require the data to contain label fields (`entity`, `relation`, `event`), just the input field and the corresponding schema_path.
253
 
 
254
 
255
  ```bash
256
  python kg2instruction/convert_test.py \
 
262
  --sample 0
263
  ```
264
 
265
+ **Data Transformation Examples**
266
+
267
 
268
  Here is an example of data conversion for Named Entity Recognition (NER) task:
269