Machine learning training code for four top quark production search in opposite-sign dilepton channel
Neural networks generated by code within this repository is trained using elmu and mumu data only, and new networks must be trained once elel and 2018 MC are available.
OSDL_keras_*.py
is the Python script used to train the neural networks. The latest version isv4
, and contains options to train with different training datasets, different activation functions (ReLU and tanh), and different sets of DeepJetB (using DeepJetB discriminator of four jets with highest pT or four highest DeepJetB discriminator). The following documentation will be based on this version of the script.OSDL_keras_v4_AUC.py
is the Python script used to calculate area under ROC curve for each network generated withOSDL_keras_v4.py
.pd_convert.py
converts the data in the form of ntuple into pandas dataframe for easier implementation. Also calculatessphericity
variable for each event.final_checks_v4.ipynb
is a Jupyter notebook containing code used to calculate variable ranking based on first-order Taylor's coefficients.
All neural networks will use the same architecture as follows:
- BatchNormalisation layer
- Dropout layer (with specified dropout probability, see below)
- 3 hidden layers with 50 neurons, using ReLU or tanh activation function (see below)
- 1 output layer with sigmoid activation function
- pT of each lepton (1 variable ร 2 leptons)
- Four highest jet pT in each event
- DeepJetB discriminator for each corresponding jet (unsorted) or four highest DeepJetB discriminator of jets in each event (sorted)
HT
(scalar sum of jet transverse momentum)HTb
(HT except first two b-jets)HTRat
(ratio of pT from first two b-jets over HT)HTH
(ratio of HT over scalar sum of jet momentum)nMediumDeepJetB
number of medium DeepJetB jetsnFTAJet
number of jetssphericity
sphericityisElMu
lepton decay channel indicator
python3 OSDL_keras_v4.py <mode> <variant> <dropout>
mode
specifies network configuration mode.- 1: normal mode - using ReLU activation and unsorted DeepJetB
- 2: tanh mode - using tanh activation and unsorted DeepJetB
- 3: sorted mode - using ReLU activation and sorted DeepJetB
- 4: sorted_tanh mode - using tanh activation and sorted DeepJetB
variant
specifies training dataset variant.- 1: train with combined (elmu + mumu) data
- 2: train only with elmu data
- 3: train only with mumu data
dropout
specifies dropout probability. (default value is 0.2)
python3 OSDL_keras_v4_AUC.py <model_path> <train_variant> <train_mode>
model_path
specifies path to the model in.hdf5
file format.train_variant
specifies training dataset variant.- 1: train with combined (elmu + mumu) data
- 2: train only with elmu data
- 3: train only with mumu data
train_mode
specifies network configuration mode.- 1: normal mode
- 2: sorted mode Differs from
OSDL_keras_v4.py
! - 3: tanh mode Differs from
OSDL_keras_v4.py
! - 4: sorted_tanh mode
Both scripts can be run as batch jobs on lxplus.