bhavnicksm commited on
Commit
9355d0a
·
1 Parent(s): 9e767a6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +95 -0
README.md CHANGED
@@ -1,3 +1,98 @@
1
  ---
 
 
 
 
2
  license: mit
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model: baai/bge-base-en-v1.5
3
+ language:
4
+ - en
5
+ library_name: model2vec
6
  license: mit
7
+ model_name: brown-beetle-base-v0.1
8
+ tags:
9
+ - embeddings
10
+ - static-embeddings
11
+ - sentence-transformers
12
  ---
13
+ # 🪲 brown-beetle-base-v0.1 Model Card
14
+
15
+ <div align="center">
16
+ <img width="75%" alt="Beetle logo" src="./assets/beetle_logo.png">
17
+ </div>
18
+
19
+ > [!TIP]
20
+ > Beetles are some of the most diverse and interesting creatures on Earth. They are found in every environment, from the deepest oceans to the highest mountains. They are also known for their ability to adapt to a wide range of habitats and lifestyles. They are small, fast and powerful!
21
+
22
+ The beetle series of models are made as good starting points for Static Embedding training (via TokenLearn or Fine-tuning), as well as decent Static Embedding models. Each beetle model is made to be an improvement over the original **M2V_base_output** model in some way, and that's the threshold we set for each model (except the brown beetle series, which is the original model).
23
+
24
+ This model has been distilled from `baai/bge-base-en-v1.5`, with PCA but of the same size as the original model. This model does not apply Zipf.
25
+
26
+ > [!NOTE]
27
+ > The brown beetle series is made for convinience in loading and using the model instead of having to run it, though it is pretty fast to reproduce anyways. If you want to use the original model by the folks from the Minish Lab, you can use the **M2V_base_output** model.
28
+
29
+ ## Version Information
30
+
31
+ - **brown-beetle-base-v0**: The original model, without using PCA or Zipf. The lack of PCA and Zipf also makes this a decent model for further training.
32
+ - **brown-beetle-base-v0.1**: The original model, with PCA but of the same size as the original model. This model is great if you want to experiment with Zipf or other weighting methods.
33
+ - **brown-beetle-base-v1**: The original model, with PCA and Zipf.
34
+ - **brown-beetle-small-v1**: A smaller version of the original model, with PCA and Zipf. Equivalent to **M2V_base_output**.
35
+ - **brown-beetle-tiny-v1**: A tiny version of the original model, with PCA and Zipf.
36
+
37
+ ## Installation
38
+
39
+ Install model2vec using pip:
40
+
41
+ ```bash
42
+ pip install model2vec
43
+ ```
44
+
45
+ ## Usage
46
+
47
+ Load this model using the `from_pretrained` method:
48
+
49
+ ```python
50
+ from model2vec import StaticModel
51
+
52
+ # Load a pretrained Model2Vec model
53
+ model = StaticModel.from_pretrained("bhavnicksm/brown-beetle-base-v0")
54
+
55
+ # Compute text embeddings
56
+ embeddings = model.encode(["Example sentence"])
57
+ ```
58
+
59
+ Read more about the Model2Vec library [here](https://github.com/MinishLab/model2vec).
60
+
61
+ ## Reproduce this model
62
+
63
+ To reproduce this model, you must install the `model2vec[distill]` package and use the following code:
64
+
65
+ ```python
66
+ from model2vec.distill import distill
67
+
68
+ # Distill the model
69
+ m2v_model = distill(
70
+ model_name="bge-base-en-v1.5",
71
+ pca_dims=768,
72
+ apply_zipf=False,
73
+ )
74
+
75
+ # Save the model
76
+ m2v_model.save_pretrained("brown-beetle-base-v0.1")
77
+ ```
78
+
79
+ ## Comparison with other models
80
+
81
+ Coming soon...
82
+
83
+ ## Acknowledgements
84
+
85
+ This model is made using the [Model2Vec](https://github.com/MinishLab/model2vec) library. Credit goes to the [Minish Lab](https://github.com/MinishLab) team for developing this library.
86
+
87
+ ## Citation
88
+
89
+ Please cite the [Model2Vec repository](https://github.com/MinishLab/model2vec) if you use this model in your work.
90
+
91
+ ```bibtex
92
+ @software{minishlab2024model2vec,
93
+ authors = {Stephan Tulkens, Thomas van Dongen},
94
+ title = {Model2Vec: Turn any Sentence Transformer into a Small Fast Model},
95
+ year = {2024},
96
+ url = {https://github.com/MinishLab/model2vec},
97
+ }
98
+ ```