File size: 6,237 Bytes

# Model Card for mlpf-clic-clusters-v1.9.0

This model reconstructs particles in a detector, based on the tracks and calorimeter clusters recorded by the detector.

## Model Details

The performance is measured with respect to generator-level jets and MET computed from Pythia particles, i.e. the truth-level jets and MET.

<details>
  <summary>Jet performance</summary>
  
  <img src="plots_checkpoint-26-2.004527/clic_edm_ttbar_pf/jet_response_iqr_over_med_pt.png" alt="ttbar jet resolution" width="300"/>
  <img src="plots_checkpoint-26-2.004527/clic_edm_qq_pf/jet_response_iqr_over_med_pt.png" alt="qq jet resolution" width="300"/>
  <img src="plots_checkpoint-26-2.004527/clic_edm_ww_fullhad_pf/jet_response_iqr_over_med_pt.png" alt="ttbar jet resolution" width="300"/>

</details>

<details>
  <summary>MET performance</summary>
  
  <img src="plots_checkpoint-26-2.004527/clic_edm_ttbar_pf/met_response_iqr_over_med.png" alt="ttbar MET resolution" width="300"/>
  <img src="plots_checkpoint-26-2.004527/clic_edm_qq_pf/met_response_iqr_over_med.png" alt="qq MET resolution" width="300"/>
  <img src="plots_checkpoint-26-2.004527/clic_edm_ww_fullhad_pf/met_response_iqr_over_med.png" alt="ttbar MET resolution" width="300"/>

</details>

### Model Description

- **Developed by:** Joosep Pata, Eric Wulff, Farouk Mokhtar, Mengke Zhang, David Southwick, Maria Girone, David Southwick, Javier Duarte, Michael Kagan
- **Model type:** transformer
- **License:** Apache License

### Model Sources

- **Repository:** https://github.com/jpata/particleflow/releases/tag/v1.9.0

## Uses
### Direct Use

This model may be used to study the physics and computational performance on ML-based reconstruction in simulation.

### Out-of-Scope Use

This model is not intended for physics measurements on real data. 

## Bias, Risks, and Limitations

The model has only been trained on simulation data and has not been validated against real data.
The model has not been peer reviewed or published in a peer-reviewed journal.

## How to Get Started with the Model

Use the code below to get started with the model.

```
#get the code
git clone https://github.com/jpata/particleflow
cd particleflow
git checkout v1.9.0

#get the models
git clone https://huggingface.co./jpata/particleflow models
```

## Training Details
Trained on 8x MI250X for 26 epochs over ~3 days.
The training was continued twice from a checkpoint due to the 24h time limit.

### Training Data
The following datasets were used:
```
/eos/user/j/jpata/mlpf/tensorflow_datasets/clic/clic_edm_qq_pf/2.2.0
/eos/user/j/jpata/mlpf/tensorflow_datasets/clic/clic_edm_ttbar_pf/2.2.0
/eos/user/j/jpata/mlpf/tensorflow_datasets/clic/clic_edm_ww_fullhad_pf/2.2.0
```

The truth and target definition was updated in [jpata/particleflow#345](https://github.com/jpata/particleflow/pull/345) with respect to [Pata, J., Wulff, E., Mokhtar, F. et al. Improved particle-flow event reconstruction with scalable neural networks for current and future particle detectors. Commun Phys 7, 124 (2024)](https://doi.org/10.1038/s42005-024-01599-5).

In particular, target particles for MLPF reconstruction are based on `status=1` particles.
For non-interacting `status=1`, nearby (dR<0.2) interacting `status=0` are used instead.
It's important to note that truth and target jets are defined in the center of mass frame, whereas PF particles are defined in the lab frame: https://github.com/key4hep/k4geo/issues/399#issuecomment-2381714391.

The datasets were generated using Key4HEP with the following scripts:
- https://github.com/HEP-KBFI/key4hep-sim/releases/tag/v1.0.0
- https://github.com/HEP-KBFI/key4hep-sim/blob/v1.0.0/clic/run_sim.sh

## Training Procedure 

<details>
  <summary>Training script</summary>
  
```bash
#!/bin/bash
#SBATCH --job-name=mlpf-train
#SBATCH --account=project_465000301
#SBATCH --time=1-00:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=32
#SBATCH --mem=200G
#SBATCH --gpus-per-task=8
#SBATCH --partition=standard-g
#SBATCH --no-requeue
#SBATCH -o logs/slurm-%x-%j-%N.out

cd /scratch/project_465000301/particleflow

module load LUMI/24.03 partition/G

export IMG=/scratch/project_465000301/pytorch-rocm6.2.simg
export PYTHONPATH=hep_tfds
export TFDS_DATA_DIR=/scratch/project_465000301/tensorflow_datasets
#export MIOPEN_DISABLE_CACHE=true
export MIOPEN_USER_DB_PATH=/tmp/${USER}-${SLURM_JOB_ID}-miopen-cache
export MIOPEN_CUSTOM_CACHE_DIR=${MIOPEN_USER_DB_PATH}
export TF_CPP_MAX_VLOG_LEVEL=-1 #to suppress ROCm fusion is enabled messages
export ROCM_PATH=/opt/rocm
#export NCCL_DEBUG=INFO
#export MIOPEN_ENABLE_LOGGING=1
#export MIOPEN_ENABLE_LOGGING_CMD=1
#export MIOPEN_LOG_LEVEL=4
export KERAS_BACKEND=torch

env

#TF training
singularity exec \
    --rocm \
    -B /scratch/project_465000301 \
    -B /tmp \
    --env LD_LIBRARY_PATH=/opt/rocm/lib/ \
    --env CUDA_VISIBLE_DEVICES=$ROCR_VISIBLE_DEVICES \
     $IMG python3 mlpf/pyg_pipeline.py --dataset clic --gpus 8 \
     --data-dir $TFDS_DATA_DIR --config parameters/pytorch/pyg-clic.yaml \
     --train --gpu-batch-multiplier 128 --num-workers 8 --prefetch-factor 100 --checkpoint-freq 1 --conv-type attention --dtype bfloat16 --lr 0.0001 --num-epochs 30
```

</details>

## Evaluation

<details>
  <summary>Evaluation script</summary>
  
```bash
#!/bin/bash
#SBATCH --partition gpu
#SBATCH --gres gpu:mig:1
#SBATCH --mem-per-gpu 200G
#SBATCH -o logs/slurm-%x-%j-%N.out

IMG=/home/software/singularity/pytorch.simg:2024-08-18
cd ~/particleflow

WEIGHTS=models/clic/clusters/v1.9.0/checkpoints/checkpoint-26-2.004527.pth
singularity exec -B /scratch/persistent --nv \
     --env PYTHONPATH=hep_tfds \
     --env KERAS_BACKEND=torch \
     $IMG  python3 mlpf/pyg_pipeline.py --dataset clic --gpus 1 \
     --data-dir /scratch/persistent/joosep/tensorflow_datasets --config parameters/pytorch/pyg-clic.yaml \
     --test --make-plots --gpu-batch-multiplier 100 --load $WEIGHTS --dtype bfloat16 --prefetch-factor 10 --num-workers 8 --load $WEIGHTS --ntest 50000            
```

</details>

## Citation

## Glossary

- PF: particle flow reconstruction
- MLPF: machine learning for particle flow
- CLIC: Compact Linear Collider

## Model Card Contact

Joosep Pata, [email protected]