OSDL-ML

Machine learning training code for four top quark production search in opposite-sign dilepton channel

Warning

Neural networks generated by code within this repository is trained using elmu and mumu data only, and new networks must be trained once elel and 2018 MC are available.

OSDL_keras_*.py is the Python script used to train the neural networks. The latest version is v4, and contains options to train with different training datasets, different activation functions (ReLU and tanh), and different sets of DeepJetB (using DeepJetB discriminator of four jets with highest pT or four highest DeepJetB discriminator). The following documentation will be based on this version of the script.
OSDL_keras_v4_AUC.py is the Python script used to calculate area under ROC curve for each network generated with OSDL_keras_v4.py.
pd_convert.py converts the data in the form of ntuple into pandas dataframe for easier implementation. Also calculates sphericity variable for each event.
final_checks_v4.ipynb is a Jupyter notebook containing code used to calculate variable ranking based on first-order Taylor's coefficients.

Neural network structure

All neural networks will use the same architecture as follows:

BatchNormalisation layer
Dropout layer (with specified dropout probability, see below)
3 hidden layers with 50 neurons, using ReLU or tanh activation function (see below)
1 output layer with sigmoid activation function

Input variables

pT of each lepton (1 variable × 2 leptons)
Four highest jet pT in each event
DeepJetB discriminator for each corresponding jet (unsorted) or four highest DeepJetB discriminator of jets in each event (sorted)
HT (scalar sum of jet transverse momentum)
HTb (HT except first two b-jets)
HTRat (ratio of pT from first two b-jets over HT)
HTH (ratio of HT over scalar sum of jet momentum)
nMediumDeepJetB number of medium DeepJetB jets
nFTAJet number of jets
sphericity sphericity
isElMu lepton decay channel indicator

Usage

python3 OSDL_keras_v4.py <mode> <variant> <dropout>

mode specifies network configuration mode.
- 1: normal mode - using ReLU activation and unsorted DeepJetB
- 2: tanh mode - using tanh activation and unsorted DeepJetB
- 3: sorted mode - using ReLU activation and sorted DeepJetB
- 4: sorted_tanh mode - using tanh activation and sorted DeepJetB
variant specifies training dataset variant.
- 1: train with combined (elmu + mumu) data
- 2: train only with elmu data
- 3: train only with mumu data
dropout specifies dropout probability. (default value is 0.2)

python3 OSDL_keras_v4_AUC.py <model_path> <train_variant> <train_mode>

model_path specifies path to the model in .hdf5 file format.
train_variant specifies training dataset variant.
- 1: train with combined (elmu + mumu) data
- 2: train only with elmu data
- 3: train only with mumu data
train_mode specifies network configuration mode.
- 1: normal mode
- 2: sorted mode Differs from OSDL_keras_v4.py!
- 3: tanh mode Differs from OSDL_keras_v4.py!
- 4: sorted_tanh mode

Both scripts can be run as batch jobs on lxplus.

vicha-w / osdl-ml Goto Github PK

osdl-ml's Introduction

OSDL-ML

Warning

Contents

Neural network structure

Input variables

Usage

osdl-ml's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent