Introduction

We present a fine-tuned model for formal verification tasks. It is fine-tuned in five formal specification languages (Cog, Dafny, Lean4, ACSL, and TLA) on six formal-verification-related tasks:

  • Requirement Analysis: given requirements and description of the verification or modeling goals, decomposing the goal into detailed verification steps

  • Proof/Model Generation: given requirements and description of the verification or modeling goals, writing formal proofs or models that can be verified by verifier/model checker.

  • Proof segment generation

  • Proof Completion: complete the given incomplete proofs or models

  • Proof Infilling: filling in the middle of the given incomplete proofs or models

  • Code 2 Proof: (Currently only support for ACSL whose specification is in form of code annotations) given the code under verification, generate the proof with the specifications

Application Scenario

Supported Formal Specification Languages

Data Preparation Pipeline

Quickstart

Here provides a code snippet with apply_chat_template to show you how to load the tokenizer and model and how to inference fmbench.

from transformers import AutoModelForCausalLM, AutoTokenizer

instruct = """
Translate the given requirement using TLA's syntax and semantics.
You only need to return the TLA formal specification without explanation.
"""

input_text = """
An operation `LM_Inner_Rsp(p)` that represents a response process for a given parameter `p`. It satisfies the following conditions:
  - The control state `octl[p]` is equal to `\"done\"`.
  - The `Reply(p, obuf[p], memInt, memInt')` operation is executed.
  - The control state `octl` is updated by setting the `p` index of `octl` to `\"rdy\"`.
  - The variables `omem` and `obuf` remain unchanged.
"""

model_name = "fm-universe/llama3.1-8b-ultrachat-fma"

model = AutoModelForCausalLM.from_pretrained(
    model_name, torch_dtype="auto", device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

messages = [{"role": "user", "content": f"{instruct}\n{input_text}"}]

text = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(**model_inputs, max_new_tokens=4096)
generated_ids = [
    output_ids[len(input_ids) :]
    for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

Example of Offline Inference

vLLM supports offline inference.

from vllm import LLM, SamplingParams

instruct = """
Translate the given requirement using TLA's syntax and semantics.
You only need to return the TLA formal specification without explanation.
"""

input_text = """
An operation `LM_Inner_Rsp(p)` that represents a response process for a given parameter `p`. It satisfies the following conditions:
  - The control state `octl[p]` is equal to `\"done\"`.
  - The `Reply(p, obuf[p], memInt, memInt')` operation is executed.
  - The control state `octl` is updated by setting the `p` index of `octl` to `\"rdy\"`.
  - The variables `omem` and `obuf` remain unchanged.
"""

model_name = "fm-universe/llama3.1-8b-ultrachat-fma"

# Pass the default decoding hyperparameters
# max_tokens is for the maximum length for generation.
greed_sampling = SamplingParams(temperature=0, max_tokens=4096)

# load the model
llm = LLM(
    model=model_name,
    tensor_parallel_size=1,
    max_model_len=4096,
    enable_chunked_prefill=True,
    # quantization="fp8", # Enabling FP8 quantization for model weights can reduce memory usage.
)

# Prepare chat messages
chat_message = [{"role": "user", "content": f"{instruct}\n{input_text}"}]

# Inference
responses = llm.chat(chat_message, greed_sampling, use_tqdm=True)

print(responses[0].outputs[0].text)

Citation

@misc{fmbench25jialun,
      title={From Informal to Formal--Incorporating and Evaluating LLMs on Natural Language Requirements to Verifiable Formal Proofs}, 
      author={Jialun Cao and Yaojie Lu and Meiziniu Li and Haoyang Ma and Haokun Li and Mengda He and Cheng Wen and Le Sun and Hongyu Zhang and Shengchao Qin and Shing-Chi Cheung and Cong Tian},
      year={2025},
      eprint={2501.16207},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2501.16207}, 
}
Downloads last month
46
Safetensors
Model size
8.03B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for fm-universe/llama3.1-8b-ultrachat-fma

Finetuned
(916)
this model