Model Card for CodeCSE
A simple pre-trained model for code and comment sentence embeddings using contrastive learning. This model was pretrained using CodeSearchNet.
Please clone the CodeCSE repository to get GraphCodeBERTForCL
and other dependencies to use this pretrained model. https://github.com/emu-se/CodeCSE
Detailed instructions are listed in the repository's README.md. Overall, you will need:
- GraphCodeBERT (CodeCSE uses GraphCodeBERT's input format for code)
- GraphCodeBERTForCL defined in codecse/codecse
Inference example
NL input example: example_nl.json
{
"original_string": "",
"docstring_tokens": ["Save", "model", "to", "a", "pickle", "located", "at", "path"],
"url": "https://github.com/openai/baselines/blob/3301089b48c42b87b396e246ea3f56fa4bfc9678/baselines/deepq/deepq.py#L55-L72"
}
Code snippet to get the embedding of an NL document (link to complete code):
nl_json = load_example("example_nl.json")
batch = prepare_inputs(nl_json, tokenizer, args)
nl_inputs = batch[3]
with torch.no_grad():
nl_vec = model(input_ids=nl_inputs, sent_emb="nl")[1]
- Downloads last month
- 74
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.