Model Card for Model ID

SPARK-mini-base is a 3.8B parameter, domain specific, language model trained on an extensive dataset curated from documents generated by the nuclear power industry.

The model was developed by continuously-pretraining Microsoft's Phi-3-mini-4k-instruct with over 35B tokens of high quality data curated from millions of public documents originating within the nuclear power domain. SPARK-mini-base was trained by Nuclearn AI, and is released as a research artifact, demonstration tool, and domain specific base LLM for further fine tuning by downstream practitioners working within or tangetial to the nuclear industry domain.

SPARK-mini-base is trained using next token prediction objective without any alignment - it requires multishot prompting to respond properly. An instruction tuned version is available at SPARK-mini-instruct.

Uses

SPARK-mini-base is a base LLM with no alignment process (SFT, RLHF, etc) applied and like other base models, must be multi-shot prompted for adequate performance. For a model with instruction based alignment, please see SPARK-mini-instruct.

Nuclearn targets a few specific use cases with this open-source model release:

Accelerating the work of technical staff at national research labs or regulatory agencies by providing a domain specific language model from which futher use cases can be fine tuned.
Improving the performance of systems deployed in the Nuclear industry that currently utilize language models as feature extractors or model trunks in predictive AI systems.
Accessibilty for practitioners without hardware accelerator or cloud connection capablities.

Direct Use

SPARK-mini-base is a base model without alignment - multishot prompting is required. Prompting techniques should follow techniques applicable to other base language models without alignment. See huggingface prompting docs.

SPARK-mini-base is trained with 'prompt pre-training' as demonstrated in Galactica: A Large Language Model for Science for steerability in different dimensions important to end users.

License

License: CC-BY-NC with exceptions made below for unrestricted use.

The license permits free use by a limited number of commercial entities including:

Operating nuclear utilties
Regulatory Bodies (Commercial or Government)
Research Labs and Research Focused groups (e.g. National Laboratories and Electric Power Specific Research Groups)

Bias, Risks, and Limitations

This model has been trained extensively on Nuclear Power related information, but like every LM, still makes factual and logical mistakes.
The model should not be used for production use cases without futher training or applicable guardrails.
Intentional bias has been trained into the model for steerability
Base model is trained without text formatting. Further fine tuning will be needed for formatted responses (see SPARK-mini-instruct).

How to Get Started with the Model

# Requires transformers 4.41 for Phi3 compatibility
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "nuclearnai/SPARK-mini-base"
model = AutoModelForCausalLM.from_pretrained(
    model_name
).to("cuda")

tokenizer = AutoTokenizer.from_pretrained(
    model_name
)

# Generate using min_p sampling
prompt = """The ECCS is"""

# Note that no chat template is used for base model
input_ids = tokenizer.encode(
    prompt,
    return_tensors="pt",
    add_special_tokens=False,
).to("cuda")

output = model.generate(
    input_ids=input_ids,
    min_p=0.2,
    temperature=0.7,
    do_sample=True,
    max_new_tokens=100,
)   

print(tokenizer.decode(output[0], skip_special_tokens=False))

Output:

The ECCS is designed to cool the reactor core and to provide additional shutdown capability following initiation of the following accident conditions: 1. Loss-of-coolant accident (LOCA) including a pipe break or a spurious relief or safety valve opening in the RCS which would result in a discharge larger than that which could be made up by the normal make-up system. 2. Loss-of-secondary-coolant accident including a pipe

Training Details

Training Data

All training data for SPARK-mini-base is obtained from publically available sources, but is not being released.

Specific details on the training data, or sharing the training data will be made available on a case by case basis by contacting Nuclearn at [email protected]

Training Procedure

Training procedure follows best practices for continuous pretraining of base LLMs.

The model was trained in bf16 using DeepSpeed Zero3 on a multinode, private A100 server cluster.

Evaluation

SPARK-mini-base was evaluated on a set of private benchmarks created specifically for testing specific Nuclear Industry knowledge.

Completions (HellaSWAG for Nuclear)

Modeled after the HellaSWAG Benchmark
Various completions of complex Nuclear plant operational scenarios and fact passages.

Multiple Choice QA (MMLU for Nuclear)

Modeled after the MMLU benchmark
Multiple Choice question and answer on Nuclear Plant Operations, Systems, Engineering, etc...

Environmental Impact

Hardware Type: A100-80GB SXM4
Cloud Provider: Nuclearn Training Cluster

Model Architecture and Objective

SPARK-mini-base is based on the Phi3 architecture.

Compute Infrastructure

SPARK-mini-base is trained on the Nuclearn Training cluster - an A100-80GB server cluster with 800Gb/s Infiniband connectivity

Model Card Authors

Bradley Fox, Nuclearn Inc Jerrold Vincent, Nuclearn Inc Nate Irby, Nuclearn Inc

NuclearnAI
/

SPARK-mini-base

You need to agree to share your contact information to access this model