ghh001 commited on
Commit
ba92567
1 Parent(s): ed06ca4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -0
README.md CHANGED
@@ -1,3 +1,11 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+ 相较于zjunlp/knowlm-13b-zhixi,zjunlp/knowlm-13b-ie的抽取实用性更强一点,但通用性有所削弱。
5
+
6
+ zjunlp/knowlm-13b-ie从中英文ie数据上采样10%左右的样本,这些样本会进行负采样操作。比如数据集A中包含的所有标签为[a,b,c,d,e,f],我们首先从A中采样10%的样本。对于样本s,它可能只含有标签a,b,我们会在指令指定的关系候选列表中随机添加它不存在的关系比如c,d,然后碰到这种关系,模型会输出类似NAN的文本。该方法使得模型具备了一定的输出NAN的能力,抽取能力得到了增强,但削弱了模型的通用能力。
7
+
8
+
9
+ Compared to zjunlp/knowlm-13b-zhixi, zjunlp/knowlm-13b-ie exhibits slightly stronger practicality in information extraction but with a decrease in its general applicability.
10
+
11
+ zjunlp/knowlm-13b-ie samples around 10% of the data from Chinese-English information extraction datasets, which then undergo negative sampling. For instance, if dataset A contains labels [a, b, c, d, e, f], we first sample 10% of the data from A. For a given sample 's', it might only contain labels a and b. We randomly add relationships that it doesn't originally have, such as c and d, from the specified list of relation candidates. When encountering these additional relationships, the model might output text similar to 'NAN'.This method equips the model with the ability to generate 'NAN' outputs to a certain extent, enhancing its information extraction capability while weakening its generalization ability.