Coder Social home page Coder Social logo

supernova15 / flavours-of-physics Goto Github PK

View Code? Open in Web Editor NEW

This project forked from gramolin/flavours-of-physics

0.0 2.0 0.0 208 KB

Cloned from Second-ranked solution to the Kaggle "Flavours of Physics" competition

License: MIT License

Python 100.00%

flavours-of-physics's Introduction

Kaggle's Flavours of Physics: the second-ranked solution

This is a solution ranked second on the Private Leaderboard of the Kaggle "Flavours of Physics: Finding τ → μμμ" competition. The model is based on gradient boosting and implemented in Python with the help of the XGBoost library. It is simply a combination of two XGBoost classifiers (boosters) trained on different sets of features. The first booster is an ensemble of 200 decision trees targeting mostly geometric features (such as impact parameters and track isolation variables). The second booster consists of 100 trees trained on purely kinematic features. Final prediction is a weighted average of the probabilities predicted by the individual classifiers (with a weight of 0.78 assigned to the first booster). Combining two independent classifiers allows us to easily pass the correlation test. To pass the agreement test, the only thing needed is to exclude SPDhits from the features used in the training process.

Dependencies

  • The XGBoost library should be installed
  • The standard Python packages numpy, pandas, and csv are required
  • The training and test datasets (the files training.csv and test.csv) can be downloaded from here

How to generate the solution

  1. Put the data files training.csv and test.csv in the data directory.
  2. To train the XGBoost classifiers, run python train.py. The trained boosters will be saved in the files bst1.model and bst2.model, so you can make predictions on new datasets without re-training the model.
  3. To make a prediction, run python predict.py. Results will be written to submission.csv.

Feature engineering

Some new features were designed in addition to the original ones. The original feature SPDhits was not used since it prevents passing the agreement test. Lists of the features used to train each booster are provided below.

Features for the first booster

  • Original features: FlightDistance, FlightDistanceError, LifeTime, IP, IPSig, VertexChi2, dira, pt, DOCAone, DOCAtwo, DOCAthree, IP_p0p2, IP_p1p2, isolationa, isolationb, isolationc, isolationd, isolatione, isolationf, iso, CDF1, CDF2, CDF3, ISO_SumBDT, p0_IsoBDT, p1_IsoBDT, p2_IsoBDT, p0_track_Chi2Dof, p1_track_Chi2Dof, p2_track_Chi2Dof, p0_IP, p0_IPSig, p1_IP, p1_IPSig, p2_IP, p2_IPSig.

  • New features:

    • E is the full energy of the mother particle calculated assuming that the final-state particles p0, p1, and p2 are muons (E = E0 + E1 + E2).
    • FlightDistanceSig is the ratio (FlightDistance / FlightDistanceError).
    • DOCA_sum is the sum (DOCAone + DOCAtwo + DOCAthree).
    • isolation_sum is the sum (isolationa + isolationb + isolationc + isolationd + isolatione + isolationf).
    • IsoBDT_sum is the sum (p0_IsoBDT + p1_IsoBDT + p2_IsoBDT).
    • track_Chi2Dof is calculated as sqrt[(p0_track_Chi2Dof – 1)^2 + (p1_track_Chi2Dof – 1)^2 + (p2_track_Chi2Dof – 1)^2].
    • IP_sum is the sum (p0_IP + p1_IP + p2_IP).
    • IPSig_sum is the sum (p0_IPSig + p1_IPSig + p2_IPSig).
    • CDF_sum is the sum (CDF1 + CDF2 + CDF3).

Features for the second booster

  • Original features: dira, pt, p0_pt, p0_p, p0_eta, p1_pt, p1_p, p1_eta, p2_pt, p2_p, p2_eta.

  • New features:

    • E is the full energy of the mother particle calculated assuming that the final-state particles p0, p1, and p2 are muons (E = E0 + E1 + E2).
    • pz is the longitudinal momentum of the mother particle.
    • beta is the relativistic beta of the mother particle (beta = v / c).
    • gamma is the relativistic gamma of the mother particle (gamma = 1 / sqrt(1 – beta^2)).
    • beta_gamma is beta×gamma calculated as FlightDistance / (LifeTime×c), where c is the speed of light.
    • Delta_E is the difference between energies of the mother particle calculated in two different ways.
    • Delta_M is the difference between masses of the mother particle calculated in two different ways.
    • flag_M equals to 1 if the mass of the mother particle is close to the tau mass; equals to 0 otherwise.
    • E0 is the full energy of the particle p0 calculated as E0 = sqrt[(m_mu)^2 + (p0_p)^2], where m_mu is the muon mass.
    • E1 is the full energy of the particle p1 calculated as E1 = sqrt[(m_mu)^2 + (p1_p)^2], where m_mu is the muon mass.
    • E2 is the full energy of the particle p2 calculated as E2 = sqrt[(m_mu)^2 + (p2_p)^2], where m_mu is the muon mass.
    • E0_ratio is the ratio (E0 / E).
    • E1_ratio is the ratio (E1 / E).
    • E2_ratio is the ratio (E2 / E).
    • p0_pt_ratio is the ratio (p0_pt / pt).
    • p1_pt_ratio is the ratio (p1_pt / pt).
    • p2_pt_ratio is the ratio (p2_pt / pt).
    • eta_01 is the difference (p0_etap1_eta).
    • eta_02 is the difference (p0_etap2_eta).
    • eta_12 is the difference (p1_etap2_eta).
    • t_coll is calculated as (p0_pt + p1_pt + p2_pt) / pt (this equals to unity if the final-state particles p0, p1, and p2 are collinear in the transverse plane).

flavours-of-physics's People

Contributors

gramolin avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.