a-mannion commited on
Commit
dd61435
·
verified ·
1 Parent(s): 988b2af

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -10,7 +10,7 @@ tags:
10
  - pytorch
11
  ---
12
 
13
- # Jargon-legal-4096
14
 
15
  [Jargon](https://hal.science/hal-04535557/file/FB2_domaines_specialises_LREC_COLING24.pdf) is an efficient transformer encoder LM for French, combining the LinFormer attention mechanism with the RoBERTa model architecture.
16
 
@@ -25,9 +25,9 @@ Jargon is available in several versions with different context sizes and types o
25
  |-------------------------------------------------------------------------------------|:-----------------------:|:----------------:|
26
  | [jargon-general-base](https://huggingface.co/PantagrueLLM/jargon-general-base) | scratch |8.5GB Web Corpus|
27
  | [jargon-general-biomed](https://huggingface.co/PantagrueLLM/jargon-general-biomed) | jargon-general-base |5.4GB Medical Corpus|
28
- | [jargon-general-legal](https://huggingface.co/PantagrueLLM/jargon-general-legal) | jargon-general-base |18GB Legal Corpus
29
  | [jargon-multidomain-base](https://huggingface.co/PantagrueLLM/jargon-multidomain-base) | jargon-general-base |Medical+Legal Corpora|
30
- | [jargon-legal](https://huggingface.co/PantagrueLLM/jargon-legal) | scratch |18GB Legal Corpus|
31
  | [jargon-legal-4096](https://huggingface.co/PantagrueLLM/jargon-legal-4096) | scratch |18GB Legal Corpus|
32
  | [jargon-biomed](https://huggingface.co/PantagrueLLM/jargon-biomed) | scratch |5.4GB Medical Corpus|
33
  | [jargon-biomed-4096](https://huggingface.co/PantagrueLLM/jargon-biomed-4096) | scratch |5.4GB Medical Corpus|
@@ -58,13 +58,13 @@ For more info please check out the [paper](https://hal.science/hal-04535557/file
58
 
59
  ## Using Jargon models with HuggingFace transformers
60
 
61
- You can get started with `jargon-legal-4096` using the code snippet below:
62
 
63
  ```python
64
  from transformers import AutoModelForMaskedLM, AutoTokenizer, pipeline
65
 
66
- tokenizer = AutoTokenizer.from_pretrained("PantagrueLLM/jargon-legal-4096", trust_remote_code=True)
67
- model = AutoModelForMaskedLM.from_pretrained("PantagrueLLM/jargon-legal-4096", trust_remote_code=True)
68
 
69
  jargon_maskfiller = pipeline("fill-mask", model=model, tokenizer=tokenizer)
70
  output = jargon_maskfiller("Il est allé au <mask> hier")
 
10
  - pytorch
11
  ---
12
 
13
+ # Jargon-general-legal
14
 
15
  [Jargon](https://hal.science/hal-04535557/file/FB2_domaines_specialises_LREC_COLING24.pdf) is an efficient transformer encoder LM for French, combining the LinFormer attention mechanism with the RoBERTa model architecture.
16
 
 
25
  |-------------------------------------------------------------------------------------|:-----------------------:|:----------------:|
26
  | [jargon-general-base](https://huggingface.co/PantagrueLLM/jargon-general-base) | scratch |8.5GB Web Corpus|
27
  | [jargon-general-biomed](https://huggingface.co/PantagrueLLM/jargon-general-biomed) | jargon-general-base |5.4GB Medical Corpus|
28
+ | [jargon-general-legal](https://huggingface.co/PantagrueLLM/jargon-general-legal) (this model) | jargon-general-base |18GB Legal Corpus
29
  | [jargon-multidomain-base](https://huggingface.co/PantagrueLLM/jargon-multidomain-base) | jargon-general-base |Medical+Legal Corpora|
30
+ | [jargon-legal](https://huggingface.co/PantagrueLLM/jargon-legal) | scratch |18GB Legal Corpus|
31
  | [jargon-legal-4096](https://huggingface.co/PantagrueLLM/jargon-legal-4096) | scratch |18GB Legal Corpus|
32
  | [jargon-biomed](https://huggingface.co/PantagrueLLM/jargon-biomed) | scratch |5.4GB Medical Corpus|
33
  | [jargon-biomed-4096](https://huggingface.co/PantagrueLLM/jargon-biomed-4096) | scratch |5.4GB Medical Corpus|
 
58
 
59
  ## Using Jargon models with HuggingFace transformers
60
 
61
+ You can get started with this model using the code snippet below:
62
 
63
  ```python
64
  from transformers import AutoModelForMaskedLM, AutoTokenizer, pipeline
65
 
66
+ tokenizer = AutoTokenizer.from_pretrained("PantagrueLLM/jargon-general-legal", trust_remote_code=True)
67
+ model = AutoModelForMaskedLM.from_pretrained("PantagrueLLM/jargon-general-legal", trust_remote_code=True)
68
 
69
  jargon_maskfiller = pipeline("fill-mask", model=model, tokenizer=tokenizer)
70
  output = jargon_maskfiller("Il est allé au <mask> hier")