This repository contains the codes and machine learning models developed for separating accreted and in-situ stars in Milky Way-like galaxies. These are defined as disc galaxies (corotational parameter ε > 0.5) within a dark matter halo of mass M200 ~ 1012 M☉.
The available models are:
-
multilayer perceptron (MLP);
-
MLP with galaxy features;
-
transformational machine learning (TML), based on an ensemble of MLPs;
-
XGBoost.
All models can be found in the implementations developed using both ARTEMIS and Auriga data.
The codes for the training and optimisation of the models are reported in codes for reference.
Compared to traditional selection criteria, machine learning models provide:
-
more realistic description of the distribution of accreted stars in the galaxy;
-
higher purity in the retrieved sample of accreted stars;
-
adjustable classification threshold, which allows the user to decide whether to focus on the purity or completeness of the accreted stars sample.
The performance of the models might vary depending on the data they are applied on. The MLP developed on ARTEMIS data is shown to be the most robust in terms of performance.
The following list of packages is required to use the models:
-
joblib (recommended version: 1.2.0)
-
numpy (reccommended version: 1.24.3)
-
tensorflow (recommended version: 2.13.0)
- Clone this repository
git clone https://github.com/ariasant/ML-accreted-vs-insitu.git
- OPTIONAL: Add the path of the repository to your python path
Ex: for MAC and Linux distributions add this in you .bashrc file:
export PYTHONPATH="${PYTHONPATH}:/path/to/this/repository"
- Create a virtual environment with all the dependencies
Ex: using conda:
conda create -n foo python=3.9.16 joblib=1.2.0 numpy=1.24.3 tensorflow=2.13.0
- Activate the environment
Ex: using conda:
conda activate path/to/environemnt/foo
- Import the model you want to use
Inside your favourite Python development environment:
from models import MLP_ARTEMIS
model = MLP_ARTEMIS()
- Load data
Define your array of input data making sure to follow the indications in the documentation.
- Get predictions
predictions = model.predict(previously_loaded_data,
threshold=0.5
)
The machine learning models have been trained using specific vectors of stellar properties for the star particles in the ARTEMIS and Auriga simulations. When the models are applied to new data, it is crucial to provide them in the order and units specified in the documentation of each model.
Feature | Description | Units |
---|---|---|
Stellar Properties |
||
R | Galactocentric radius in the plane of the disc. | kpc |
z | Distance from the plane of the disc. | kpc |
vθ | Velocity in the direction of disc rotation. | kms-1 |
σ | Dispersion velocity in the plane perpendicular to the disc. | kms-1 |
[Fe/H] | Iron-to-hydrogen abundance. | - |
[α/Fe] | α-to-iron abundance. | - |
MG | Absolute magnitude in the Gaia MG passband. | - |
BP-RP | Colour evaluated in the Gaia GBP and GRP passbands. | - |
Galaxy Properties |
||
κco | Co-rotational parameter. | - |
M* | Total stellar mass within 30 kpc radius. | 1010 M☉ |
r1/2 | Half-radius. | kpc |
vθ MAX | Maximum rotational velocity within 30 kpc radius. | kms-1 |
〈[Fe/H]〉 | Average iron-to-idrogen abundance. | - |
〈[α/H]〉 | Average α-to-idrogen abundance. | - |
These are subclasses of tensorflow.keras.Model
objects. The models have been trained on stellar properties only. Method:
.predict(x,
threshold=None
)
Args | |
---|---|
x |
Numpy array of N stars with shape (N,8) Order: R, z, vθ, σ, [Fe/H], [α/Fe], MG, BP-RP |
threshold | "None" or float between 0 and 1. The classification threshold used for separating accreted and in-situ stars. |
Returns | Numpy array of class predictions: 0 -> In-situ, 1-> Accreted. If |
These are subclasses of tensorflow.keras.Model
objects. The models have been trained on stellar properties and on general properties of the associated galaxies. Method:
.predict(x,
threshold=None
)
Args | |
---|---|
x |
Numpy array of N stars with shape (N,8) Order: R, z, vθ, σ, [Fe/H], [α/Fe], MG, BP-RP, Mstar, r1/2, vθMAX, 〈[Fe/H]〉, 〈[α/H]〉 |
threshold | "None" or float between 0 and 1. The classification threshold used for separating accreted and in-situ stars. |
Returns | Numpy array of class predictions: 0 -> In-situ, 1-> Accreted. If |
These are subclasses of tensorflow.keras.Model
objects. The models are based on an ensemble of MLPs, each of which is tensorflow.keras.Model
object. During the prediction, x is passed to each MLP in the ensemble. The resulting ensemble of predictions is then used as an input to another MLP. Method:
.predict(x,
threshold=None
)
Args | |
---|---|
x |
Numpy array of N stars with shape (N,8) Order: R, z, vθ, σ, [Fe/H], [α/Fe], MG, BP-RP |
threshold | "None" or float between 0 and 1. The classification threshold used for separating accreted and in-situ stars. |
Returns | Numpy array of class predictions: 0 -> In-situ, 1-> Accreted. If |
These are trained instances of the xgboost.XGBClassifier
object. Method:
.predict(x,
threshold=None
)
Args | |
---|---|
x |
Numpy array of N stars with shape (N,8) Order: R, z, vθ, σ, [Fe/H], [α/Fe], MG, BP-RP |
threshold | "None" or float between 0 and 1. The classification threshold used for separating accreted and in-situ stars. |
Returns | Numpy array of class predictions: 0 -> In-situ, 1-> Accreted. If |