SmartBERT V3 CodeBERT

SmartBERT

Overview

SmartBERT V3 is a pre-trained programming language model, initialized with CodeBERT-base-mlm. It has been further trained on SmartBERT V2 with an additional 64,000 smart contracts, to enhance its robustness in representing smart contract code at the function level.

  • Training Data: Trained on a total of 80,000 smart contracts, including 16,000 from SmartBERT V2 and 64,000 (starts from 30001) new contracts.
  • Hardware: Utilized 2 Nvidia A100 80G GPUs.
  • Training Duration: Over 30 hours.
  • Evaluation Data: Evaluated on 1,500 (starts from 96425) smart contracts.

Usage

from transformers import RobertaTokenizer, RobertaForMaskedLM, pipeline

model = RobertaForMaskedLM.from_pretrained('web3se/SmartBERT-v3')
tokenizer = RobertaTokenizer.from_pretrained('web3se/SmartBERT-v3')

code_example = "function totalSupply() external view <mask> (uint256);"
fill_mask = pipeline('fill-mask', model=model, tokenizer=tokenizer)

outputs = fill_mask(code_example)
print(outputs)

Preprocessing

All newline (\n) and tab (\t) characters in the function code were replaced with a single space to ensure consistency in the input data format.

Base Model

Training Setup

training_args = TrainingArguments(
  output_dir=OUTPUT_DIR,
  overwrite_output_dir=True,
  num_train_epochs=20,
  per_device_train_batch_size=64,
  save_steps=10000,
  save_total_limit=2,
  evaluation_strategy="steps",
  eval_steps=10000,
  resume_from_checkpoint=checkpoint
)

How to Use

To train and deploy the SmartBERT V3 model for Web API services, please refer to our GitHub repository: web3se-lab/SmartBERT.

Contributors

Sponsors

Downloads last month
11
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for web3se/SmartBERT-v3

Finetuned
(16)
this model