louisbrulenaudet commited on
Commit
b8aa404
1 Parent(s): e742085

readme update

Browse files
Files changed (3) hide show
  1. .DS_Store +0 -0
  2. 1_Pooling/config.json +7 -0
  3. README.md +19 -17
.DS_Store ADDED
Binary file (8.2 kB). View file
 
1_Pooling/config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false
7
+ }
README.md CHANGED
@@ -8,7 +8,7 @@ tags:
8
 
9
  ---
10
 
11
- # {MODEL_NAME}
12
 
13
  This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
14
 
@@ -28,7 +28,7 @@ Then you can use the model like this:
28
  from sentence_transformers import SentenceTransformer
29
  sentences = ["This is an example sentence", "Each sentence is converted"]
30
 
31
- model = SentenceTransformer('{MODEL_NAME}')
32
  embeddings = model.encode(sentences)
33
  print(embeddings)
34
  ```
@@ -51,32 +51,23 @@ def cls_pooling(model_output, attention_mask):
51
  sentences = ['This is an example sentence', 'Each sentence is converted']
52
 
53
  # Load model from HuggingFace Hub
54
- tokenizer = AutoTokenizer.from_pretrained('{MODEL_NAME}')
55
- model = AutoModel.from_pretrained('{MODEL_NAME}')
56
 
57
  # Tokenize sentences
58
- encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
59
 
60
  # Compute token embeddings
61
  with torch.no_grad():
62
  model_output = model(**encoded_input)
63
 
64
  # Perform pooling. In this case, cls pooling.
65
- sentence_embeddings = cls_pooling(model_output, encoded_input['attention_mask'])
66
 
67
  print("Sentence embeddings:")
68
  print(sentence_embeddings)
69
  ```
70
 
71
-
72
-
73
- ## Evaluation Results
74
-
75
- <!--- Describe how your model was evaluated -->
76
-
77
- For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name={MODEL_NAME})
78
-
79
-
80
  ## Training
81
  The model was trained with the parameters:
82
 
@@ -96,7 +87,6 @@ Parameters of the fit()-Method:
96
  {
97
  "epochs": 1,
98
  "evaluation_steps": 0,
99
- "evaluator": "NoneType",
100
  "max_grad_norm": 1,
101
  "optimizer_class": "<class 'torch.optim.adamw.AdamW'>",
102
  "optimizer_params": {
@@ -120,4 +110,16 @@ SentenceTransformer(
120
 
121
  ## Citing & Authors
122
 
123
- <!--- Describe where people can find more information -->
 
 
 
 
 
 
 
 
 
 
 
 
 
8
 
9
  ---
10
 
11
+ # Domain-adapted BERT for General Legal Practice
12
 
13
  This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
14
 
 
28
  from sentence_transformers import SentenceTransformer
29
  sentences = ["This is an example sentence", "Each sentence is converted"]
30
 
31
+ model = SentenceTransformer("louisbrulenaudet/tsdae-lemone-mbert-base")
32
  embeddings = model.encode(sentences)
33
  print(embeddings)
34
  ```
 
51
  sentences = ['This is an example sentence', 'Each sentence is converted']
52
 
53
  # Load model from HuggingFace Hub
54
+ tokenizer = AutoTokenizer.from_pretrained("louisbrulenaudet/tsdae-lemone-mbert-base")
55
+ model = AutoModel.from_pretrained("louisbrulenaudet/tsdae-lemone-mbert-base")
56
 
57
  # Tokenize sentences
58
+ encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors="pt")
59
 
60
  # Compute token embeddings
61
  with torch.no_grad():
62
  model_output = model(**encoded_input)
63
 
64
  # Perform pooling. In this case, cls pooling.
65
+ sentence_embeddings = cls_pooling(model_output, encoded_input["attention_mask"])
66
 
67
  print("Sentence embeddings:")
68
  print(sentence_embeddings)
69
  ```
70
 
 
 
 
 
 
 
 
 
 
71
  ## Training
72
  The model was trained with the parameters:
73
 
 
87
  {
88
  "epochs": 1,
89
  "evaluation_steps": 0,
 
90
  "max_grad_norm": 1,
91
  "optimizer_class": "<class 'torch.optim.adamw.AdamW'>",
92
  "optimizer_params": {
 
110
 
111
  ## Citing & Authors
112
 
113
+ If you use this code in your research, please use the following BibTeX entry.
114
+
115
+ ```BibTeX
116
+ @misc{louisbrulenaudet2023,
117
+ author = {Louis Brulé Naudet},
118
+ title = {Tranformer-based Denoising AutoEncoder for tax practice},
119
+ year = {2023}
120
+ }
121
+ ```
122
+
123
+ ## Feedback
124
+
125
+ If you have any feedback, please reach out at [[email protected]](mailto:[email protected]).