hmehryar / hm.jetscapeml Goto Github PK

License: MIT License

Jupyter Notebook 95.22% Python 4.19% Shell 0.59%

hm.jetscapeml's Introduction

Hm.JetscapeMl

Welcome to Hm.JetscapeMl! This repository contains code and resources related to utilizing machine learning techniques for analyzing Jetscape simulation data.

Introduction
Dataset
Installation
Repository Guideline
Usage
Contributing
License

Introduction

Hm.JetscapeMl is designed to extract valuable insights and patterns from Jetscape simulation data using modern machine learning techniques. The dataset and accompanying scripts provide a comprehensive framework for conducting machine learning experiments on this data.

Dataset

ML-JET, a dataset for parameter classification in heavy ion collisions using jet images. The dataset is hosted on Kaggle: ML-Jet Dataset (https://www.kaggle.com/datasets/haydarmehryar/ml-jet).

The JET-ML dataset is designed as a comprehensive benchmark for machine learning applications in the field of relativistic heavy ion collisions. This dataset facilitates the study and prediction of energy loss mechanisms in high-energy particle physics, specifically focusing on parameters like initial parton virtuality and strong coupling constant, denoted as $Q_0$ and $\alpha_s$, respectively.

Purpose and Scope

The primary aim of the JET-ML dataset is to support the development and evaluation of machine learning models for high energy physics that can classify and predict jet event parameters under different physical conditions in a quark-gluon plasma (QGP). It provides a rich collection of simulated jet images, which are pivotal in understanding the dynamics of parton energy loss in such environments. The dataset emphasizes the connection between energy loss and quantum chromodynamics (QCD) parameters, $Q_0$ and $\alpha_s$, which are critical for characterizing the scattering and splitting behavior of partons as they traverse the medium.

Data Generation and Features

The dataset was generated using the JETSCAPE framework (https://jetscape.org/), a sophisticated tool for simulating jet events in high-energy collisions.

Dataset Composition and Labeling

The JET-ML dataset comprises 10.8 million images, each with a resolution of 32 x 32pixels, representing Pb-Pb collision events. The jet observables used in our dataset building process are: (a) $p_T$: transverse momentum, (b) $\phi$: azimuthal angle, and (c) $\eta$: pseudorapidity of the emitted thermal particles. Each event has three coordinates, which are as follows:

The $x$ axis represents $\eta$, which is the pseudorapidity, and is in range of $[-\pi,\pi]$,
The $y$ axis represents $\phi$, which is the azimuthal angle, and is in range of $[-\pi,\pi]$,
The $z$ axis represents $\Sigma p_t$, which is the summation of $p_t$ values in each specific mesh cell.

Each image is labeled with its corresponding energy loss module (MATTER or MATTER-LBT), the strong coupling constant $\alpha_s$, and the virtuality separation scale $Q_0$. In image below, 10 sample events 2-D are demostrated with their related parameter.

Point cloud representation of a sample event is demostrated in image below:

Configurations (01 to 09): Nine distinct configurations corresponding to different combinations of physical parameters.
Strong Coupling Constant ($\alpha_s$): The simulations include $\alpha_s$ values of 0.2, 0.3, and 0.4.
Virtuality Separation Scale ($Q_0$): The dataset includes $Q_0$ values of 1, 1.5, 2.0, and 2.5.
Energy Loss Modules:
- MATTER: Handles the initial parton showering and energy loss.
- MATTER-LBT: Incorporates medium-induced scattering and gluon radiation at lower virtualities.
Dataset Size: varies indeffrent files. They contain 10.8 million, 1 million, 100k, 10k, 1k images of 32x32 pixel resolution.
Dataset Format:

DataColumn(name="dataset_x", description="32x32 pixel jet images.", data_type="image", shape=(32, 32)),
DataColumn(name="dataset_y", description="Associated labels including energy loss module, alpha_s, and Q_0.", data_type="numeric", shape=(3,)),

Intended Use and Applications

This dataset is intended for researchers and practitioners in both machine learning and high-energy physics. It provides a robust platform for developing models that can classify or predict event parameters in particle collisions, aiding in the deeper understanding of QGP properties and behavior. Possible applications include:

Training deep learning models for medium parameter classification.
Evaluating the impact of different $Q_0$ and $\alpha_s$ values on parton energy loss.
Benchmarking novel machine learning algorithms in the context of high-energy physics.

Compliance with FAIR Standards

The JET-ML dataset adheres to the principles of FAIR (Findable, Accessible, Interoperable, Reusable) data. It is publicly available through platforms like Kaggle (https://www.kaggle.com/datasets/haydarmehryar/ml-jet) and GitHub (https://github.com/hmehryar/Hm.JetscapeMl), with comprehensive documentation and metadata provided to facilitate its use and integration into various research workflows.

Installation

To get started with Hm.JetscapeMl, follow these steps:

Clone this repository to your local machine:

git clone https://github.com/hmehryar/Hm.JetscapeMl.git

Navigate to the repository's directory:

cd Hm.JetscapeMl

Reading the Dataset: To read and utilize the dataset, users can employ various tools and libraries compatible with the pickle format. Here’s how the dataset can be accessed using Python with the pickle library:

import pickle
dataset_file_name = f"ml_jet_dataset.pkl"
try:
    with open(file_name, 'rb') as dataset_file:
        loaded_data = pickle.load(dataset_file, encoding='latin1')
        (dataset_x, dataset_y) = loaded_data
        print("dataset_x:",type(dataset_x), dataset_x.size, dataset_x.shape)
        print("dataset_y:",type(dataset_y), dataset_y.size,dataset_y.shape)
except pickle.UnpicklingError as e:
        print("Error while loading the pickle file:", e)

RepositoryGuideline

Rebuidling/Expanding Dataset

All the step-by-step process/related codes for buidling the ML-JET Dataset can be found in jet_ml_dataset_builder Directory.

Applying Machine Learning (ML) & Neural Network (NN) Architectures

Neural Networks

MNIST Net

MNIST Net ~\cite{lecun1998gradient}, more commonly known as LeNet. It was initially devised for handwritten digit recognition, leverages insights into 2D shape invariances through local connection patterns and weight constraints. It uses an image input. With 4 layers, including convolutional and fully connected layers, MNIST Net boasts 96,445 trainable parameters. The model implementation can be found at MNIST Net Direcetory.

VGG16 Net

VGG16Net ~\cite{simonyan2014very}, renowned for its remarkable performance in image recognition tasks. It uses an image input and comprises 16 layers, with 4 convolutional and fully connected blocks, totaling 15,676,673 trainable parameters. The model implementation can be found at VGG16 Net Direcetory.

Point Net

PointNet~\cite{qi2017pointnet} introduces a novel approach to processing point cloud data, making it uniquely suited for our jet event image classification task. Unlike conventional CNNs that operate on structured grid-like data, PointNet directly consumes unordered point sets. The model implementation can be found at Point Net Direcetory.

Traditional Machine Learning

All following methods implemetation can be found at ML models directory.

Logistic Regression

The logistic regression model is trained specifically for binary classification on the first column.
Predictions and evaluation are performed based on the binary labels.

Decision Tree

This code uses DecisionTreeClassifier instead of LogisticRegression. The structure is similar: extract the first column for binary classification, split the dataset, flatten the images, initialize the model, train the model, make predictions, and evaluate the accuracy.

Support Vector Machine (SVM)

This code uses LinearSVC instead of LogisticRegression or DecisionTreeClassifier. The structure remains similar: extract the first column for binary classification, split the dataset, flatten the images, initialize the model, train the model, make predictions, and evaluate the accuracy.

K-Nearest Neighbors (KNN)

Adjust the k_neighbors parameter based on your requirements. The structure is similar to the previous examples: extract the first column for binary classification, split the dataset, flatten the images, initialize the model, train the model, make predictions, and evaluate the accuracy.

Random Forest

This code uses RandomForestClassifier from scikit-learn. The structure is similar to the previous examples: extract the first column for binary classification, split the dataset, flatten the images, initialize the model, train the model, make predictions, and evaluate the accuracy.

jet_ml_dataset_builder_by_size For building a new dataset from the original dataset with different

jet_ml_diffusion_model Difussion model implemeteation for generating events from parameters

jet_ml_mnist_net MNIST Net Implemetation of binary classifier for Eloss for each 9 different configuration

jet_ml_models consists of python classes for each implmented models (For now just pointnet has its own implementation)

jet_ml_models_notebooks it consists the binary classification implemetation of Decision Tree, KNN, Random Forest, SVM,and Logistic Regression for $eloss$, more detail explanation of code implemented in this directory is in Traditional machine learning method section.

jet_ml_pointnet several implmentation of PointNet for binary classification of $eloss$ or a single notebook that can train different type of classifier based on user desire, and it includes a sample tensorflow GPU implmentation

jet_ml_pointnet_alpha_s includes a PointNet classifier specifically for $\alpha_s$

jet_ml_pointnet_eloss includes a PointNet binary classifier specifically for $eloss$

jet_ml_sample_events includes sample events from each dataset configuration in 2-D and point cloud demostration.

jet_ml_synthesis_model_vgg16 consists of VGG16 Net implemetation, user can choose the desired parameter from $eloss$, $\alpha_s$, or $Q_0$ to train the classifier for it

jet_ml_validation_calculator consists of implmentation for loading a trained model and calculate the confusion matrix and accuracy for it based on the loaded dataset

jet_ml_vgg16_model_cnn VGG16 Net Implemetation of binary classifier for Eloss for each 9 different configuration

Usage

Once you have the repository set up and the dependencies installed, you can start utilizing the project:

Data Preprocessing: Use the provided scripts to preprocess and prepare Jetscape simulation data for analysis.
Machine Learning Models: Explore the models directory to find pre-implemented machine learning models tailored for analyzing Jetscape data.
Example Notebooks: Check out the notebooks directory for example Jupyter notebooks that demonstrate how to use the machine learning models with Jetscape data.
Customization: Feel free to customize and extend the code to suit your specific needs and experiments.

Contributing

Contributions are welcome and encouraged! If you'd like to contribute to Hm.JetscapeMl, follow these steps:

Fork the repository to your GitHub account.
Create a new branch from the main branch for your changes.
Make your changes and commit them with descriptive commit messages.
Push your changes to your forked repository.
Open a pull request (PR) to the original repository, describing the changes you've made.

Please ensure your contributions adhere to the project's coding standards and follow best practices.

License

This project is licensed under the MIT License. Feel free to customize this template according to your project's specifics and additional information you want to provide.

hm.jetscapeml's People

Contributors

Stargazers

Watchers

hm.jetscapeml's Issues

Presentation: Modifying presentation table for different datasets, merging it with primary gslide and delete the gslide

Configuration#5: Step#1: simulating alpha_S=0.3, matter: q_0=1 | matterlbt: q_0=2: matter05

run the simulation with the above config by jetscape,
transfer the final state hadron to the related directory named as config-XX-final-state-hadrons
after doing step1 on all 12 simulations go to step 2 for the configurationX

Configuration#1: Step#1: simulating alpha_S=0.2, matter: q_0=1 | matterlbt: q_0=1.5: matter06

splitting the trainer from evaluator by implementing evaluator script

running lbt and matter simulation for 600k

Writing Jobscript to run jetscape on singukarity mode

The approach is to implement the job script line by line

implementing code to read from linux/windows/colab file system

Configuration#1: Step#1: simulating alpha_S=0.2, matter: q_0=1 | matterlbt: q_0=1.5: matter04

testing if the single file data set builder output result is working, by readying inside the file

Implementing a file concatenator and reading the result file

putting all created files by simulation in a directory name simulation under data directory, stripping the inside of the function from print method

Saving each simulation best model in its own directory

Preparing small dataset for Xin to recommend machine models

Configuration#1:Step#1: simulating alpha_S=0.2, matter: q_0=1 | matterlbt: q_0=1.5: matter01

(sample 20 plots and distribution Plot) and (confusion matrix and loss plots) are overlapping on each other and showing all of them together

Dataset builder Step#1 schedule a time table to create different simulation data in a week by looking at previous simulation elapsed time in the spread sheet

Running on GPU

Configuration#1: Step#1: simulating alpha_S=0.2, matter: q_0=1 | matterlbt: q_0=1.5: matterlbt02

Using momentum column as a multiplier for the dataset

implementing a job script for running on grid using gpu

Implementing single file runner for whole training Process

Configuration#1: Step#1: simulating alpha_S=0.2, matter: q_0=1 | matterlbt: q_0=1.5: matterlbt05

Create an script to run on grid to gather 1.2 mil events in a single dataset file

Configuration#5: Step#1: simulating alpha_S=0.3, matter: q_0=1 | matterlbt: q_0=2: matterlbt01

Make a manuscript out of the current works

Shall Be in Latex
Check the Neurip Standard for Latex template

Check the deadline for Neurip 2023

Colab notes
Chanwook presentation
Amit Presentation
And meeting Manuscript

9 configuration with different alphas and q0 has been created
Adding simulation results diagrams
Adding Flow chart on how to build the dataset
Adding/Creating a Network Design for each Deep Model
Come up with a structure for article from Loren's Meeting/Other Neurip Article

Implementing CNN with multiple convolutional block

Make a presentation out of the current works

Material can be
colab comments
Chanwook Pictures
Amit Pictures
simulation/training methods and results

I already started this issue months ago but wasn't in progress and not being updated in terms on slides comments/changes

Dataset builder to create the dataset package

Configuration#1: Step#1: simulating alpha_S=0.2, matter: q_0=1 | matterlbt: q_0=1.5: matterlbt01

installing the missing packages on grid tensorflow environment to run the code

name of the best model shall be followed all over the code

after 2 epoch training, it could not open the best model

dataset builder: Step#3 implementing the partial dataset storer, and dump the data and spliting 600k into 40 chunks

Estimating training time on grid base on the sample epoch ran on grid

ratio | no_cores | Mem (GB) | 1 epoch runtime (H) | 50 epochs run time (H)

1| 16 | 64 | 14 | 700

4| 64 | 256 | ~3.5-> 4 | 200

8| 128 | 512 | ~1.75 -> 2 | 100

1GPU |16 | 64 | ~5.25 -> 5.5 | 275
1GPU gres:gpu:tesla |16 | 64 | ~4.5 -> 5.5 | 275
1GPU gres:gpu:tesla |16 | 64 | ~.1 ->.1 | 5 | tensor_gpu_env
1GPU gres:gpu:tesla |16 | 128 | ~.01 ->.01 | .5 | tensor_gpu_env

implementing code to read the data from grid filesystem and adding data folder to grid server

implmenting a shell for running the simulation in the interactive gpu-base grid node

Implementing a single file analyzer

It should implement just analyzing a single final state hadron file
it should get the input parameters from shell file
the shell file should execute two lines for matter and lbt final state haron file
at the end their results should be stored in a dataset pkl file

Configuration#5: Step#1: simulating alpha_S=0.3, matter: q_0=1 | matterlbt: q_0=2: matter02

Configuration#5: Step#1: simulating alpha_S=0.3, matter: q_0=1 | matterlbt: q_0=2: matter04

Configuration#1: Step#1: simulating alpha_S=0.2, matter: q_0=1 | matterlbt: q_0=1.5: matterlbt03

Configuration#5: Step#1: simulating alpha_S=0.3, matter: q_0=1 | matterlbt: q_0=2: matter06

modifying the code to have one specific dataset directory all over the run by checking OS, and if it is running on colab or not

dataset builder: Step#3: implementing an events splitter for storing and loading the events chunk into small files

jetscape-ml-tensorflow-nn-dataset-builder-single-file-event-splitter.ipynb

split the valid events row data into multiple array
to Store the multiple arrays into file system
to Load the n files on n separate jobs
1. to Build the final valid matrix/image of the events on each job
2. to Store the final valid matrix/image of events into file system
To Merge all stored chunk of images into one final file for the specified category e.g. MATTER
And finally doing step 1 to 4 for the second category e.g LBT

Configuration#5: Step#1: simulating alpha_S=0.3, matter: q_0=1 | matterlbt: q_0=2: matter01

Configuration#1: Step#1: simulating alpha_S=0.2, matter: q_0=1 | matterlbt: q_0=1.5: matter05

Configuration#1: Step#1: simulating alpha_S=0.2, matter: q_0=1 | matterlbt: q_0=1.5: matterlbt04

Dataset Builder: Creating the sample events' image grid

Origin: Research.WeeklyMeeting.WithLoren
Description: Implementing a small code for saving plots as a image file. Developing a code from the existing code in the dataset builder codes, to create an image grid from a given dataset with showing
axis from -pi to pi
colored with color
having label beside each row as MMAT (in-Medium Matter) and MLBT (in-Medium LBT)
showing 20 images in the grid
high resolution: to be used for the future references

something better than this

load a chunk of events from a file
convert it to a chunk of images
store it in a chunk of images into files
implement in a way that can be run for different config and pass the parameter thorough shell script

Configuration#1: Step#1: simulating alpha_S=0.2, matter: q_0=1 | matterlbt: q_0=1.5: matterlbt06

run the simulation with the above config by jetscape,
transfer the final state hadron to the related directory named as config-XX-final-state-hadrons
after doing step1 on all 12 simulations go to step 2 for the configurationX

Configuration#5: Step#1: simulating alpha_S=0.3, matter: q_0=1 | matterlbt: q_0=2: matter03

Adding diffrent alpha_S and q_0 result table to presentation to fill later for each methods

Each methodology must have the table and the diagram

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.