Model Card for Model ID
SPARK-mini-base is a 3.8B parameter, domain specific, language model trained on an extensive dataset curated from documents generated by the nuclear power industry.
The model was developed by continuously-pretraining Microsoft's Phi-3-mini-4k-instruct with over 35B tokens of high quality data curated from millions of public documents originating within the nuclear power domain. SPARK-mini-base was trained by Nuclearn AI, and is released as a research artifact, demonstration tool, and domain specific base LLM for further fine tuning by downstream practitioners working within or tangetial to the nuclear industry domain.
SPARK-mini-base is trained using next token prediction objective without any alignment - it requires multishot prompting to respond properly. An instruction tuned version is available at SPARK-mini-instruct.
Uses
SPARK-mini-base is a base LLM with no alignment process (SFT, RLHF, etc) applied and like other base models, must be multi-shot prompted for adequate performance. For a model with instruction based alignment, please see SPARK-mini-instruct.
Nuclearn targets a few specific use cases with this open-source model release:
- Accelerating the work of technical staff at national research labs or regulatory agencies by providing a domain specific language model from which futher use cases can be fine tuned.
- Improving the performance of systems deployed in the Nuclear industry that currently utilize language models as feature extractors or model trunks in predictive AI systems.
- Accessibilty for practitioners without hardware accelerator or cloud connection capablities.
Direct Use
SPARK-mini-base is a base model without alignment - multishot prompting is required. Prompting techniques should follow techniques applicable to other base language models without alignment. See huggingface prompting docs.
SPARK-mini-base is trained with 'prompt pre-training' as demonstrated in Galactica: A Large Language Model for Science for steerability in different dimensions important to end users.
License
License: CC-BY-NC with exceptions made below for unrestricted use.
The license permits free use by a limited number of commercial entities including:
- Operating nuclear utilties
- Regulatory Bodies (Commercial or Government)
- Research Labs and Research Focused groups (e.g. National Laboratories and Electric Power Specific Research Groups)
Bias, Risks, and Limitations
- This model has been trained extensively on Nuclear Power related information, but like every LM, still makes factual and logical mistakes.
- The model should not be used for production use cases without futher training or applicable guardrails.
- Intentional bias has been trained into the model for steerability
- Base model is trained without text formatting. Further fine tuning will be needed for formatted responses (see SPARK-mini-instruct).
How to Get Started with the Model
# Requires transformers 4.41 for Phi3 compatibility
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "nuclearnai/SPARK-mini-base"
model = AutoModelForCausalLM.from_pretrained(
model_name
).to("cuda")
tokenizer = AutoTokenizer.from_pretrained(
model_name
)
# Generate using min_p sampling
prompt = """The ECCS is"""
# Note that no chat template is used for base model
input_ids = tokenizer.encode(
prompt,
return_tensors="pt",
add_special_tokens=False,
).to("cuda")
output = model.generate(
input_ids=input_ids,
min_p=0.2,
temperature=0.7,
do_sample=True,
max_new_tokens=100,
)
print(tokenizer.decode(output[0], skip_special_tokens=False))
Output:
The ECCS is designed to cool the reactor core and to provide additional shutdown capability following initiation of the following accident conditions: 1. Loss-of-coolant accident (LOCA) including a pipe break or a spurious relief or safety valve opening in the RCS which would result in a discharge larger than that which could be made up by the normal make-up system. 2. Loss-of-secondary-coolant accident including a pipe
Training Details
Training Data
All training data for SPARK-mini-base is obtained from publically available sources, but is not being released.
Specific details on the training data, or sharing the training data will be made available on a case by case basis by contacting Nuclearn at [email protected]
Training Procedure
Training procedure follows best practices for continuous pretraining of base LLMs.
The model was trained in bf16 using DeepSpeed Zero3 on a multinode, private A100 server cluster.
Evaluation
SPARK-mini-base was evaluated on a set of private benchmarks created specifically for testing specific Nuclear Industry knowledge.
Completions (HellaSWAG for Nuclear)
- Modeled after the HellaSWAG Benchmark
- Various completions of complex Nuclear plant operational scenarios and fact passages.
Multiple Choice QA (MMLU for Nuclear)
- Modeled after the MMLU benchmark
- Multiple Choice question and answer on Nuclear Plant Operations, Systems, Engineering, etc...
Environmental Impact
- Hardware Type: A100-80GB SXM4
- Cloud Provider: Nuclearn Training Cluster
Model Architecture and Objective
SPARK-mini-base is based on the Phi3 architecture.
Compute Infrastructure
SPARK-mini-base is trained on the Nuclearn Training cluster - an A100-80GB server cluster with 800Gb/s Infiniband connectivity
Model Card Authors
Bradley Fox, Nuclearn Inc Jerrold Vincent, Nuclearn Inc Nate Irby, Nuclearn Inc
- Downloads last month
- 30