Add Sentence Transformers integration
Hello @yliu279 , @yeliu918 and co-authors
Pull Request overview
- Add Sentence Transformers compatibility
- Add 'modules.json' with a Transformer and Pooling module
- The Pooling is configured to use last token pooling
- Set the maximum sequence length to 32768 via the
sentence_bert_config.json
. - In
modeling_gemma2.py
, add aforward
that simply forwards to theself.model
.
- Add tags for
transformers
,sentence-transformers
,code
, andretrieval
for better discoverability. - Add outputs for the code snippets
Details
I've added Sentence Transformers compatibility. Despite the custom modeling code, it's still rather easy to add this support as ST relies on transformers
in a simple way. The code snippet uses the prompt
argument in model.encode
to add the prompts, although this can also just be done by prepending each query.
Also, I saw that all files are saved with git-lfs
. Because of this, it seems like the modeling_gemma2.py
changes are very big, but it's only an addition of:
def forward(self, **kwargs):
return self.model(**kwargs)
Simple as that!
Also, you might be able to increase visibility of this model by adding it to your Salesforce SFR-Embedding collection: https://huggingface.co./collections/Salesforce/sfr-embedding-models-66abe671200408925487b6c8
- Tom Aarsen
I also noticed that almost all files in the repository are stored under git-lfs (large file storage), including .json, .md, and .py files. I believe this should not be used for files that are this small, but only for e.g. the model files. It prevents people from looking at the README.md or configuration files without downloading.
You can remove the following lines from .gitattributes
: https://huggingface.co./Salesforce/SFR-Embedding-Code-2B_R/blob/60bd9076055a6658d7ad2e6855cb558b856108cd/.gitattributes#L36-L40
And then run
git add --renormalize *.md
git add --renormalize *.json
git add --renormalize *.py
So that these files show up in "full" again. Please let me know your thoughts, I can try and include that in this pull request as well.
- Tom Aarsen
Hi @tomaarsen ,
Thanks for sharing this information. I’ve tried multiple times to remove the git-lfs from the file, but it hasn’t worked. I followed the instructions and can see the normal file locally, not the LFS file. However, I can’t push it since there are no changes, and it says "Everything up-to-date." Do you know what I should do in this case?
Thanks,
Ye