File size: 5,988 Bytes
b370d01 f1304c1 b370d01 7eb3b5d b370d01 83ffcdf b370d01 83ffcdf |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
# Model Card for mlpf-clic-clusters-v2.0.0
This model reconstructs particles in a detector, based on the tracks and calorimeter clusters recorded by the detector.
## Model Details
The performance is measured with respect to generator-level jets and MET computed from Pythia particles, i.e. the truth-level jets and MET.
<details>
<summary>Jet performance</summary>
<img src="plots_checkpoint-29-1.901667/clic_edm_ttbar_pf/jet_response_iqr_over_med_pt.png" alt="ttbar jet resolution" width="300"/>
<img src="plots_checkpoint-29-1.901667/clic_edm_qq_pf/jet_response_iqr_over_med_pt.png" alt="qq jet resolution" width="300"/>
<img src="plots_checkpoint-29-1.901667/clic_edm_ww_fullhad_pf/jet_response_iqr_over_med_pt.png" alt="ttbar jet resolution" width="300"/>
</details>
<details>
<summary>MET performance</summary>
<img src="plots_checkpoint-29-1.901667/clic_edm_ttbar_pf/met_response_iqr_over_med.png" alt="ttbar MET resolution" width="300"/>
<img src="plots_checkpoint-29-1.901667/clic_edm_qq_pf/met_response_iqr_over_med.png" alt="qq MET resolution" width="300"/>
<img src="plots_checkpoint-29-1.901667/clic_edm_ww_fullhad_pf/met_response_iqr_over_med.png" alt="ttbar MET resolution" width="300"/>
</details>
### Model Description
- **Developed by:** Joosep Pata, Eric Wulff, Farouk Mokhtar, Mengke Zhang, David Southwick, Maria Girone, David Southwick, Javier Duarte, Michael Kagan
- **Model type:** transformer
- **License:** Apache License
### Model Sources
- **Repository:** https://github.com/jpata/particleflow/releases/tag/v2.0.0
## Uses
### Direct Use
This model may be used to study the physics and computational performance on ML-based reconstruction in simulation.
### Out-of-Scope Use
This model is not intended for physics measurements on real data.
## Bias, Risks, and Limitations
The model has only been trained on simulation data and has not been validated against real data.
The model has not been peer reviewed or published in a peer-reviewed journal.
## How to Get Started with the Model
Use the code below to get started with the model.
```
#get the code
git clone https://github.com/jpata/particleflow
cd particleflow
git checkout v2.0.0
#get the models
git clone https://huggingface.co/jpata/particleflow models
```
## Training Details
Trained on 8x MI250X for 26 epochs over ~3 days.
The training was continued from a checkpoint due to the 24h time limit.
### Training Data
The following datasets were used:
```
/eos/user/j/jpata/mlpf/tensorflow_datasets/clic/clic_edm_qq_pf/2.3.0
/eos/user/j/jpata/mlpf/tensorflow_datasets/clic/clic_edm_ttbar_pf/2.3.0
/eos/user/j/jpata/mlpf/tensorflow_datasets/clic/clic_edm_ww_fullhad_pf/2.3.0
```
The truth and target definition was updated in [jpata/particleflow#352](https://github.com/jpata/particleflow/pull/352) have an updated truth and target definition with respect to [Pata, J., Wulff, E., Mokhtar, F. et al. Improved particle-flow event reconstruction with scalable neural networks for current and future particle detectors. Commun Phys 7, 124 (2024)](https://doi.org/10.1038/s42005-024-01599-5).
In particular, target particles for MLPF reconstruction are based on status=1 particles.
For non-interacting status=1, the direct children interacting status=0 are used instead.
The datasets were generated using Key4HEP with the following scripts:
- https://github.com/HEP-KBFI/key4hep-sim/releases/tag/v1.0.0
- https://github.com/HEP-KBFI/key4hep-sim/blob/v1.0.0/clic/run_sim.sh
## Training Procedure
```bash
#!/bin/bash
#SBATCH --job-name=mlpf-train
#SBATCH --account=project_465000301
#SBATCH --time=3-00:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=32
#SBATCH --mem=200G
#SBATCH --gpus-per-task=8
#SBATCH --partition=small-g
#SBATCH --no-requeue
#SBATCH -o logs/slurm-%x-%j-%N.out
cd /scratch/project_465000301/particleflow
module load LUMI/24.03 partition/G
export IMG=/scratch/project_465000301/pytorch-rocm6.2.simg
export PYTHONPATH=`pwd`
export TFDS_DATA_DIR=/scratch/project_465000301/tensorflow_datasets
#export MIOPEN_DISABLE_CACHE=true
export MIOPEN_USER_DB_PATH=/tmp/${USER}-${SLURM_JOB_ID}-miopen-cache
export MIOPEN_CUSTOM_CACHE_DIR=${MIOPEN_USER_DB_PATH}
export TF_CPP_MAX_VLOG_LEVEL=-1 #to suppress ROCm fusion is enabled messages
export ROCM_PATH=/opt/rocm
#export NCCL_DEBUG=INFO
#export MIOPEN_ENABLE_LOGGING=1
#export MIOPEN_ENABLE_LOGGING_CMD=1
#export MIOPEN_LOG_LEVEL=4
export KERAS_BACKEND=torch
env
#TF training
singularity exec \
--rocm \
-B /scratch/project_465000301 \
-B /tmp \
--env LD_LIBRARY_PATH=/opt/rocm/lib/ \
--env CUDA_VISIBLE_DEVICES=$ROCR_VISIBLE_DEVICES \
$IMG python3 mlpf/pipeline.py --dataset clic --gpus 8 \
--data-dir $TFDS_DATA_DIR --config parameters/pytorch/pyg-clic.yaml \
--train --gpu-batch-multiplier 128 --num-workers 8 --prefetch-factor 100 --checkpoint-freq 1 --conv-type attention --dtype bfloat16 --lr 0.0001 --num-epochs 30
```
## Evaluation
```bash
#!/bin/bash
#SBATCH --partition gpu
#SBATCH --gres gpu:mig:1
#SBATCH --mem-per-gpu 200G
#SBATCH -o logs/slurm-%x-%j-%N.out
IMG=/home/software/singularity/pytorch.simg:2024-08-18
cd ~/particleflow
WEIGHTS=models/clic/clusters/v2.0.0/checkpoints/checkpoint-29-1.901667.pth
singularity exec -B /scratch/persistent --nv \
--env PYTHONPATH=`pwd` \
--env KERAS_BACKEND=torch \
$IMG python3 mlpf/pyg_pipeline.py --dataset clic --gpus 1 \
--data-dir /scratch/persistent/joosep/tensorflow_datasets --config parameters/pytorch/pyg-clic.yaml \
--test --make-plots --gpu-batch-multiplier 100 --load $WEIGHTS --dtype bfloat16 --prefetch-factor 10 --num-workers 8 --load $WEIGHTS
```
## Citation
## Glossary
- PF: particle flow reconstruction
- MLPF: machine learning for particle flow
- CLIC: Compact Linear Collider
## Model Card Contact
Joosep Pata, [email protected]
## Full outputs
```
/local/joosep/mlpf/results/clic/pyg-clic_20241011_102451_167094
```
|