File size: 6,237 Bytes
1742c16
 
 
 
 
 
b80521d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3d51c53
1742c16
 
e4c61d4
1742c16
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e4c61d4
1742c16
 
 
 
 
 
e4c61d4
 
1742c16
e4c61d4
 
 
 
1742c16
 
 
 
 
 
 
 
 
e4c61d4
 
 
1742c16
 
e77311b
e4c61d4
e77311b
 
73af006
e4c61d4
 
 
 
 
 
1742c16
b80521d
 
 
1742c16
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b80521d
 
1742c16
b80521d
 
 
 
1742c16
 
 
 
 
 
 
 
 
 
e4c61d4
1742c16
 
 
 
 
242f612
1742c16
 
b80521d
 
1742c16
 
 
 
3d70ffa
 
 
1742c16
 
 
3d51c53
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
# Model Card for mlpf-clic-clusters-v1.9.0

This model reconstructs particles in a detector, based on the tracks and calorimeter clusters recorded by the detector.

## Model Details

The performance is measured with respect to generator-level jets and MET computed from Pythia particles, i.e. the truth-level jets and MET.

<details>
  <summary>Jet performance</summary>
  
  <img src="plots_checkpoint-26-2.004527/clic_edm_ttbar_pf/jet_response_iqr_over_med_pt.png" alt="ttbar jet resolution" width="300"/>
  <img src="plots_checkpoint-26-2.004527/clic_edm_qq_pf/jet_response_iqr_over_med_pt.png" alt="qq jet resolution" width="300"/>
  <img src="plots_checkpoint-26-2.004527/clic_edm_ww_fullhad_pf/jet_response_iqr_over_med_pt.png" alt="ttbar jet resolution" width="300"/>

</details>

<details>
  <summary>MET performance</summary>
  
  <img src="plots_checkpoint-26-2.004527/clic_edm_ttbar_pf/met_response_iqr_over_med.png" alt="ttbar MET resolution" width="300"/>
  <img src="plots_checkpoint-26-2.004527/clic_edm_qq_pf/met_response_iqr_over_med.png" alt="qq MET resolution" width="300"/>
  <img src="plots_checkpoint-26-2.004527/clic_edm_ww_fullhad_pf/met_response_iqr_over_med.png" alt="ttbar MET resolution" width="300"/>

</details>

### Model Description

- **Developed by:** Joosep Pata, Eric Wulff, Farouk Mokhtar, Mengke Zhang, David Southwick, Maria Girone, David Southwick, Javier Duarte, Michael Kagan
- **Model type:** transformer
- **License:** Apache License

### Model Sources

- **Repository:** https://github.com/jpata/particleflow/releases/tag/v1.9.0

## Uses
### Direct Use

This model may be used to study the physics and computational performance on ML-based reconstruction in simulation.

### Out-of-Scope Use

This model is not intended for physics measurements on real data. 

## Bias, Risks, and Limitations

The model has only been trained on simulation data and has not been validated against real data.
The model has not been peer reviewed or published in a peer-reviewed journal.

## How to Get Started with the Model

Use the code below to get started with the model.

```
#get the code
git clone https://github.com/jpata/particleflow
cd particleflow
git checkout v1.9.0

#get the models
git clone https://huggingface.co./jpata/particleflow models
```

## Training Details
Trained on 8x MI250X for 26 epochs over ~3 days.
The training was continued twice from a checkpoint due to the 24h time limit.

### Training Data
The following datasets were used:
```
/eos/user/j/jpata/mlpf/tensorflow_datasets/clic/clic_edm_qq_pf/2.2.0
/eos/user/j/jpata/mlpf/tensorflow_datasets/clic/clic_edm_ttbar_pf/2.2.0
/eos/user/j/jpata/mlpf/tensorflow_datasets/clic/clic_edm_ww_fullhad_pf/2.2.0
```

The truth and target definition was updated in [jpata/particleflow#345](https://github.com/jpata/particleflow/pull/345) with respect to [Pata, J., Wulff, E., Mokhtar, F. et al. Improved particle-flow event reconstruction with scalable neural networks for current and future particle detectors. Commun Phys 7, 124 (2024)](https://doi.org/10.1038/s42005-024-01599-5).

In particular, target particles for MLPF reconstruction are based on `status=1` particles.
For non-interacting `status=1`, nearby (dR<0.2) interacting `status=0` are used instead.
It's important to note that truth and target jets are defined in the center of mass frame, whereas PF particles are defined in the lab frame: https://github.com/key4hep/k4geo/issues/399#issuecomment-2381714391.

The datasets were generated using Key4HEP with the following scripts:
- https://github.com/HEP-KBFI/key4hep-sim/releases/tag/v1.0.0
- https://github.com/HEP-KBFI/key4hep-sim/blob/v1.0.0/clic/run_sim.sh

## Training Procedure 

<details>
  <summary>Training script</summary>
  
```bash
#!/bin/bash
#SBATCH --job-name=mlpf-train
#SBATCH --account=project_465000301
#SBATCH --time=1-00:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=32
#SBATCH --mem=200G
#SBATCH --gpus-per-task=8
#SBATCH --partition=standard-g
#SBATCH --no-requeue
#SBATCH -o logs/slurm-%x-%j-%N.out

cd /scratch/project_465000301/particleflow

module load LUMI/24.03 partition/G

export IMG=/scratch/project_465000301/pytorch-rocm6.2.simg
export PYTHONPATH=hep_tfds
export TFDS_DATA_DIR=/scratch/project_465000301/tensorflow_datasets
#export MIOPEN_DISABLE_CACHE=true
export MIOPEN_USER_DB_PATH=/tmp/${USER}-${SLURM_JOB_ID}-miopen-cache
export MIOPEN_CUSTOM_CACHE_DIR=${MIOPEN_USER_DB_PATH}
export TF_CPP_MAX_VLOG_LEVEL=-1 #to suppress ROCm fusion is enabled messages
export ROCM_PATH=/opt/rocm
#export NCCL_DEBUG=INFO
#export MIOPEN_ENABLE_LOGGING=1
#export MIOPEN_ENABLE_LOGGING_CMD=1
#export MIOPEN_LOG_LEVEL=4
export KERAS_BACKEND=torch

env

#TF training
singularity exec \
    --rocm \
    -B /scratch/project_465000301 \
    -B /tmp \
    --env LD_LIBRARY_PATH=/opt/rocm/lib/ \
    --env CUDA_VISIBLE_DEVICES=$ROCR_VISIBLE_DEVICES \
     $IMG python3 mlpf/pyg_pipeline.py --dataset clic --gpus 8 \
     --data-dir $TFDS_DATA_DIR --config parameters/pytorch/pyg-clic.yaml \
     --train --gpu-batch-multiplier 128 --num-workers 8 --prefetch-factor 100 --checkpoint-freq 1 --conv-type attention --dtype bfloat16 --lr 0.0001 --num-epochs 30
```

</details>

## Evaluation

<details>
  <summary>Evaluation script</summary>
  
```bash
#!/bin/bash
#SBATCH --partition gpu
#SBATCH --gres gpu:mig:1
#SBATCH --mem-per-gpu 200G
#SBATCH -o logs/slurm-%x-%j-%N.out

IMG=/home/software/singularity/pytorch.simg:2024-08-18
cd ~/particleflow

WEIGHTS=models/clic/clusters/v1.9.0/checkpoints/checkpoint-26-2.004527.pth
singularity exec -B /scratch/persistent --nv \
     --env PYTHONPATH=hep_tfds \
     --env KERAS_BACKEND=torch \
     $IMG  python3 mlpf/pyg_pipeline.py --dataset clic --gpus 1 \
     --data-dir /scratch/persistent/joosep/tensorflow_datasets --config parameters/pytorch/pyg-clic.yaml \
     --test --make-plots --gpu-batch-multiplier 100 --load $WEIGHTS --dtype bfloat16 --prefetch-factor 10 --num-workers 8 --load $WEIGHTS --ntest 50000            
```

</details>

## Citation

## Glossary

- PF: particle flow reconstruction
- MLPF: machine learning for particle flow
- CLIC: Compact Linear Collider

## Model Card Contact

Joosep Pata, [email protected]