Coder Social home page Coder Social logo

tallamjr / astronet Goto Github PK

View Code? Open in Web Editor NEW
14.0 4.0 3.0 638.01 MB

Efficient Deep Learning for Real-time Classification of Astronomical Transients and Multivariate Time-series

License: Apache License 2.0

Jupyter Notebook 99.25% Python 0.32% Shell 0.04% PureBasic 0.39% TeX 0.01%
astroinformatics time-series transformers deep-learning deep-compression depthwise-separable-convolutions real-time tflite efficient-deep-learning time-series-classification

astronet's Introduction

@tallamjr

About Me

My name is Tarek

I am a final year PhD student at the Centre for Data Intensive Science at University College London (UCL). My research focus is on the development of efficient Deep Learning algorithms for real-time classification of Astronomical transient events.

Besides my doctoral research, I enjoy exploring how latest techniques in statistical signal processing and probabilistic machine learning can be used for Learned Image Reconstruction and Learned Image Compression 🗜️ in
Embedded Systems for Embedded Machine Learning (#TinyML) 📱

astronet's People

Contributors

tallamjr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

astronet's Issues

Update results and analysis with hyperparamters tuned on validation set instead of test

It was discovered that previous analysis was optimised linking to the x_test and y_test, this should have been the validation set.

This was updated with:

commit ca18a44d83ac13b9085619570dd7cf0ae04c64e2 (origin/issues/16/nbval, issues/16/nbval)
Author: Tarek Allam <[email protected]>
Date:   Wed Oct 14 18:37:24 2020 +0100

    Change objective function to maximise acc on val

    Previous was maximising objective function on test data which was
    causing overfitting

            modified:   astronet/t2/opt/hypertrain.py
diff --git a/astronet/t2/opt/hypertrain.py b/astronet/t2/opt/hypertrain.py
index ff11057..4163e3a 100644
--- a/astronet/t2/opt/hypertrain.py
+++ b/astronet/t2/opt/hypertrain.py
@@ -117,8 +117,7 @@ class Objective(object):
         model.summary(print_fn=logging.info)

         # Evaluate the model accuracy on the validation set.
-        # score = model.evaluate(X_val, y_val, verbose=0)
-        score = model.evaluate(X_test, y_test, verbose=0)
+        score = model.evaluate(X_val, y_val, verbose=0)
         return score[1]

But the results in **/results.json should have also been updated.

For this issue, the previous results should be wiped, and new analysis run with the changes brought in from ca18a44d8

Refactor loading of WISDM dataset

The loading and preprocessing of the WISDM dataset is repeated across the codebase.

This should only take place in a single place and then be loaded as needed.

Clean codebase

This would be a major refactor and would be linked to #24

Should be completed before release

[BENCHMARK] Compare results to `dl-4-tsc` on the MTS dataset

dl-4-tsc by Fawaz et al compares deep learning classifiers on 128 univariate datasets and 12 multivariate datasets.

It would be good in the first instance to be able to compare results produced here to the multivariate case to gauge the performances on simplified multivariate data.

Show below is a snapshot of the results shown in the repository as of hfawaz/dl-4-tsc@3ee62e1

The following table contains the averaged accuracy over 10 runs of each implemented model on the MTS archive, with the
standard deviation between parentheses.

Datasets MLP FCN ResNet Encoder MCNN t-LeNet MCDCNN Time-CNN TWIESN
AUSLAN 93.3(0.5) 97.5(0.4) 97.4(0.3) 93.8(0.5) 1.1(0.0) 1.1(0.0) 85.4(2.7) 72.6(3.5) 72.4(1.6)
ArabicDigits 96.9(0.2) 99.4(0.1) 99.6(0.1) 98.1(0.1) 10.0(0.0) 10.0(0.0) 95.9(0.2) 95.8(0.3) 85.3(1.4)
CMUsubject16 60.0(16.9) 100.0(0.0) 99.7(1.1) 98.3(2.4) 53.1(4.4) 51.0(5.3) 51.4(5.0) 97.6(1.7) 89.3(6.8)
CharacterTrajectories 96.9(0.2) 99.0(0.1) 99.0(0.2) 97.1(0.2) 5.4(0.8) 6.7(0.0) 93.8(1.7) 96.0(0.8) 92.0(1.3)
ECG 74.8(16.2) 87.2(1.2) 86.7(1.3) 87.2(0.8) 67.0(0.0) 67.0(0.0) 50.0(17.9) 84.1(1.7) 73.7(2.3)
JapaneseVowels 97.6(0.2) 99.3(0.2) 99.2(0.3) 97.6(0.6) 9.2(2.5) 23.8(0.0) 94.4(1.4) 95.6(1.0) 96.5(0.7)
KickvsPunch 61.0(12.9) 54.0(13.5) 51.0(8.8) 61.0(9.9) 54.0(9.7) 50.0(10.5) 56.0(8.4) 62.0(6.3) 67.0(14.2)
Libras 78.0(1.0) 96.4(0.7) 95.4(1.1) 78.3(0.9) 6.7(0.0) 6.7(0.0) 65.1(3.9) 63.7(3.3) 79.4(1.3)
NetFlow 55.0(26.1) 89.1(0.4) 62.7(23.4) 77.7(0.5) 77.9(0.0) 72.3(17.6) 63.0(18.2) 89.0(0.9) 94.5(0.4)
UWave 90.1(0.3) 93.4(0.3) 92.6(0.4) 90.8(0.4) 12.5(0.0) 12.5(0.0) 84.5(1.6) 85.9(0.7) 75.4(6.3)
Wafer 89.4(0.0) 98.2(0.5) 98.9(0.4) 98.6(0.2) 89.4(0.0) 89.4(0.0) 65.8(38.1) 94.8(2.1) 94.9(0.6)
WalkvsRun 70.0(15.8) 100.0(0.0) 100.0(0.0) 100.0(0.0) 75.0(0.0) 60.0(24.2) 45.0(25.8) 100.0(0.0) 94.4(9.1)
Average_Rank 5.208333 2.000000 2.875000 3.041667 7.583333 8.000000 6.833333 4.625000 4.833333
Wins 0 5 3 0 0 0 0 0 2

The MTS data has been downloaded from: http://www.mustafabaydogan.com/files/viewcategory/20-data-sets.html and the processed using dl-4-tsc/utils/utils.py with this change:

diff --git a/utils/utils.py b/utils/utils.py
index 0ef692b..c0ae7ab 100755
--- a/utils/utils.py
+++ b/utils/utils.py
@@ -219,8 +219,8 @@ def transform_to_same_length(x, n_var, max_length):
 
 
 def transform_mts_to_ucr_format():
-    mts_root_dir = '/mnt/Other/mtsdata/'
-    mts_out_dir = '/mnt/nfs/casimir/archives/mts_archive/'
+    mts_root_dir = '/Users/tallamjr/github/tallamjr/origin/astronet/data/mtsdata/'
+    mts_out_dir = '/Users/tallamjr/github/tallamjr/origin/astronet/data/transformed-mtsdata/'
     for dataset_name in MTS_DATASET_NAMES:
         # print('dataset_name',dataset_name)

Then by running: $ python main.py transform_mts_to_ucr_format
N.B Empty folders with the names of the datasets found in mtsdata needed to be created first in transformerd-mtsdata before running the above command.

With this inclusion, it may be desirable to refactor how one loads data into astronet.t2.train.py and astronet.t2.opt.hypertrain.py as there will be many more to list in an if/else block now, this may be better served in astronet.t2.utils.py perhaps

Add test coverage

Currently there are no unit tests. This must be fixed.

It would also be desirable to include pytest coverage to gauge how much of the codebase is covered

Complete `evaluate.py` file

This file should be able to do the following:

  • Load a saved model and evaluate on test data.
  • Save performance plots and confusion matrices as images with corresponding model ID
  • ...

Determine optimal mini batch size

If the size of the training set is not evenly divisible by the a batch size number, then the final batch that is trained will be trained on less items and affect the performance.

There should exist a formula to optimally choose a batch size that allows for a maximal number of samples to exist in the final batch size.

Refactor how to load datasets

Linked to #52 , with the inclusion of more datasets, it may be desirable to refactor how one loads data into astronet.t2.train.py and astronet.t2.opt.hypertrain.py as there will be many more to list in an if/else block now, this may be better served in astronet.t2.utils.py perhaps.

Something like load_dataset(<name-of-dataset>) might be more clean considering they will all return the same form of: X_train, y_train, X_test, y_test

Create dataset for full set of transients in PLAsTiCC dataset

The functionality to create the dataset for Supernovae only events is already in place. To be able to compare metrics with avocado and others, it would be desirable to test the classifier on the full range of transient events that occur in the PLAsTiCC dataset (all 18 classes).

This would essentially just involve skipping the df = filter_dataframe_only_supernova("../data/plasticc/train_subset.txt", data) function call that reduces the data to only contain Supernovae samples.

Reference to other files within code is all relative to the __file__ being run

Currently, all references to other files or data from within code is linked to using pathlib.Path and is relative to the __file__ being run.

It would be desirable to refer to data and other files from an absolute position relative to the main astronet repository. This would allow for more flexibility should a file change and would also make file paths more readable.

An example change that would be required can be seen in 3108fd shown below:

diff --git a/astronet/t2/constants.py b/astronet/t2/constants.py
index 4239421..2d587b7 100644
--- a/astronet/t2/constants.py
+++ b/astronet/t2/constants.py
@@ -1,3 +1,5 @@
+from pathlib import Path
+
 # Central passbands wavelengths
 pb_wavelengths = {
     "lsstu": 3685.0,
@@ -17,3 +19,5 @@ pb_colors = {
     "lsstz": "#ff7f00",  # Orange: https://www.color-hex.com/color/ff7f00
     "lssty": "#e3c530",  # Yellow: https://www.color-hex.com/color/e3c530
 }
+
+astronet_working_directory = f"{Path(__file__).absolute().parent.parent.parent}"
diff --git a/astronet/t2/tests/func/test_gp_interpolation.py b/astronet/t2/tests/func/test_gp_interpolation.py
index 8991e2e..7fa2003 100644
--- a/astronet/t2/tests/func/test_gp_interpolation.py
+++ b/astronet/t2/tests/func/test_gp_interpolation.py
@@ -2,9 +2,7 @@ import numpy as np
 import pandas as pd
 import pytest

-from pathlib import Path
-
-from astronet.t2.constants import pb_wavelengths
+from astronet.t2.constants import pb_wavelengths, astronet_working_directory as asnwd
 from astronet.t2.preprocess import predict_2d_gp, fit_2d_gp
 from astronet.t2.utils import __transient_trim, __filter_dataframe_only_supernova, __remap_filters

@@ -12,7 +10,7 @@ from astronet.t2.utils import __transient_trim, __filter_dataframe_only_supernov
 def test_plasticc_gp_interpolation():

     data = pd.read_csv(
-        f"{Path(__file__).absolute().parent.parent.parent}/data/plasticc/training_set.csv",
+        f"{asnwd}/data/plasticc/training_set.csv",
         sep=",",
     )
     data = __remap_filters(df=data)
@@ -26,7 +24,7 @@ def test_plasticc_gp_interpolation():

     assert data.shape == (1421705, 6)
     df = __filter_dataframe_only_supernova(
-        f"{Path(__file__).absolute().parent.parent.parent}/data/plasticc/train_subset.txt",
+        f"{asnwd}/data/plasticc/train_subset.txt",

Re-implement GPs

Since it is hoped that astronet:t2 will be deployed into FINK for real-time use in production, a faster, more efficient way of pre-processing the light curve might be needed.

  • Investigate use of numpyro/jax for faster GP interpolation

[MAJOR] Implement snX: Supernova Xception

snX, aka Supernova Xception is an adaptation to the Xception: Deep Learning with Depthwise Separable Convolutions architecture by Francois Chollet.

Motivation

As of writing, the state of the art for time-series classification using deep learning can be found in Fawaz et. al review paper: Deep learning for time series classification: a review with InceptionTime leading the way for at least univaritate time-series [uvts] (it is still to be determined if SOTA is also achieved in the multivariate setting).

With regards to InceptionTime, this architecture has shown to favourable to multivariate time series [mvts] classification

... researchers started investigating these complex machine learning models for TSC (Wang et al.,
2017; Cui et al., 2016; Ismail Fawaz et al., 2019a). Precisely, Convolutional Neural Networks
(CNNs) have showed promising results for TSC

...
Given an input MTS, a convolutional layer consists of sliding one-dimensional filters over the time
series, thus enabling the network to extract non-linear discriminant features that are
time-invariant and useful for classification. By cascading multiple layers, the network is able to
further extract hierarchical features that should in theory improve the network’s prediction.

It is recommended to read into detail section 2.2 Deep learning for time series classification of which the above quotes are taken from for further discussion of the motivation for why a convolutional architecture is desirable for time-series and what has previous architectures have been tried, such as: Multi-scale Convolutional Neural Networks (MCNN) (Cui et al., 2016) and Time LeNet (Le Guennec et al., 2016) as well as "Fully Convolutional Neural Networks (FCNs) were shown to achieve
great performance without the need to add pooling layers to reduce the input data’s dimensionality (Wang et al., 2017)."

The above works laid the foundations for applying convolutional neural networks for UVTS and MVTS data classification.

Fawaz et al naturally takes this further with the application of the Inception module, inspired by Szegedy et al. (2015) with modifications specific for time series. Fawaz notes that this method has actually already been applied to Supernova classification:

Inception model was used for Supernovae classification using the light flux of a region in space as
an input MTS for the network (Brunel et al., 2019). However, the authors limited the conception
of their Inception architecture to the one proposed by Google for ImageNet (Szegedy et al., 2017).

Believed to be Inveption-V3, as there does not seem to exist skip connections in Brunel et al. work, found here: https://github.com/Anzzy30/SupernovaeClassification . See https://astro-informatics.slack.com/archives/D1E1A4JJH/p1593101980010600 for more discussion

In our work, we explore much larger filters than any previously proposed network for TSC in order
to reach state-of-the-art performance on the UCR benchmark.

Xception Network, Francois Chollet

As can be seen from the progression above, a trend has emerged between developments in architectures for Deep Computer Vision for classification and the successful application, and adaptation, for time-series classification. It seems plausible that the successor to Inception-v4 should yield improved results to time-series classification also.

The intuition behind this relates to the fact that images are simply 2D signals with 3 channels (RGB), and time-series is simple a 1D signal with M number of channels (or features/dimensions). As the the translation between 1D to ND signals is straight forward with normal signal processing, it is natural to see how improvements in 2D signal processing of images can be translated to 1D signals also.

Francois Chollet describes The Inception hypothesis in section 1.1 as

A convolution layer attempts to learn filters in a 3D space, with 2 spatial dimensions (width and
height) and a channel dimension; thus a single convolution kernel is tasked with simultaneously
mapping cross-channel correlations and spatial correlations. This idea behind the Inception module
is to make this process easier and more efficient by explicitly factoring it into a series of
operations that would independently look at cross-channel correlations and at spatial correlations.
More precisely, the typical Inception module first looks at crosschannel correlations via a set of
1x1 convolutions, mapping the input data into 3 or 4 separate spaces that are smaller than the
original input space, and then maps all correlations in these smaller 3D spaces, via regular 3x3 or
5x5 convolutions

In effect, the fundamental hypothesis behind Inception is that cross-channel correlations
and spatial correlations are sufficiently decoupled that it is
preferable not to map them jointly

...
In effect, we make the following hypothesis: that the mapping of cross-channels correlations and spatial correlations
in the feature maps of convolutional neural networks can be
entirely decoupled.

It is felt that this form is advantageous to our photometric time series data since one can obtain feature maps of each signal in each passband independently, with a cross-channel correlation and temporal correlations decoupled.

Xception is attractive since it maps the spatial (or in our case temporal) correlations for each output channel separately, and then performs a 1x1 depthwise convolution to capture cross-channel correlation -- "An Intuitive Guide to Deep Network Architectures"

Next Steps

Previously I have implemented the 2D Xception network here: https://github.com/tallamjr/dnn . It should be straight forward to copy this implementation and adapt for the 1D setting.

The implementation should live in a separate folder snX which would have the same structure as t2 but with several files "pulled out" in to a layer above for DRY principles; such as:

  • constants.py
  • evaluate.py
  • metrics.py
  • preprocess.py
  • utils.py
  • visualise_data.py
  • visualise_results.py

Also, it would be desireable at this stage to clean out unused files such as: opt/optimise.py and opt/somefile.py

Balance classes with GP augmentation

It would desirable to compare results from balanced and (currently un-balanced) data. Code by Alves et al would be useful for this as there is work that utilises redshift information for this augmentation.

Broker integration

For the final phase of astronet it is hoped to put the best performing model/architecture in production in a real-time setting.

This will be as a science module within the FINK broker. There are tutorials how to incorporate custom code into fink here.

On the face of things, this should be "straight forward", with the majoirty of code re-writing coming in to form of migrating processing from pandas to spark, which with pyspark's pandas api should be trivial.

From the avro schema, it seems like the columns of interest might be (see here for human readable description):

# utility from fink-science
from fink_science.utilities import concat_col

# user-defined function from the current folder
from processor import deltamaglatest
# Fink receives data as Avro. However, the internal processing makes use of Parquet files. We provide here alert data as Parquet: it contains original alert data from ZTF and some added values from Fink:
# Load the data into a Spark DataFrame
df = spark.read.format('parquet').load('sample.parquet')
df.printSchema()
root
 |-- candid: long (nullable = true)
 |-- schemavsn: string (nullable = true)
 |-- publisher: string (nullable = true)
 |-- objectId: string (nullable = true)
 |-- candidate: struct (nullable = true)
 |    |-- jd: double (nullable = true)
 |    |-- sgmag1: float (nullable = true)
 |    |-- srmag1: float (nullable = true)
 |    |-- simag1: float (nullable = true)
 |    |-- szmag1: float (nullable = true)

See the avro spec for details (helpful notebook about the alerts here)

Refactor `def train_val_test_split(df, cols):` to be more configurable

At the moment, the train, validation and test set are determined by hard-coded amounts inside +70 astronet.t2.utils:

...
 70 def train_val_test_split(df, cols):
 71
 72     features = df[cols]
 73     # column_indices = {name: i for i, name in enumerate(features.columns)}
 74
 75     n = len(df)
 76     df_train = df[0 : int(n * 0.8)].copy()
 77     df_val = df[int(n * 0.8) : int(n * 0.95)].copy()
 78     df_test = df[int(n * 0.95) :].copy()
 79
 80     num_features = features.shape[1]
 81
 82     return df_train, df_val, df_test, num_features

It would be desirable able to have this determined by parameters in the function call perhaps.

This will link to #44 here where one would like to compare against benchmark results on the data. To have a fair comparison, the same dataset, and data split should be use.

Another major motivation of this is to allow for optimisation via hyperparameter tuning to be done using a hold-out validation set and keras.callbacks; then when it comes to training, to train on the full "training" set, i.e train + validation (still leaving test set out completely until evaluation afterwards with astronet.t2.evaluate.py)

Linked issues:

Refs:

Add functionality to compare results against Fawaz et al.

In "Deep learning for time series classification: a review" they compare against several multivariate time series benchmark datasets. It would be desirable to be able to load in these same datasets and compare results for the different deep learning approaches in astronet

The results table can be found at https://github.com/hfawaz/dl-4-tsc/

Ideally, if one can load in the data in a similar manor to wisdm or plasticc whereby a resulting X_train, y_train, X_test, y_test dataset is loaded, then it should be straight forward to input into T2Model (and others developed later)

In addition to the above MTS, it would be interesting to be able to compare the FORD dataset described here: https://keras.io/examples/timeseries/timeseries_classification_from_scratch/#build-a-model

Finally, this should link in with #39 which could allow for a table of results to be updated when comparing results

Visualise Self-Attention weights response

Conduct ablation studies

As can be explained here, ablation studies are an important aspect of the empirical deep learning.

Therefore, it would be desirable to explore the effect of the model:

  • With/without redshift : Done in #73
  • With/without positional encoding
  • With/without GAP
  • With/without FC layer before softmax : removed anyway

[FEATURE] Add code for running analysis on PLAsTiCC dataset

This would involve addition of a load_spcc() function, similar to load_wisdm_2010() at the moment in utils.py

Additional conditional checks will need to be updated in various files to see which function to use when loading a dataset

A large component of this would be develop the correct processing steps required for getting the data in the right format, it should return the same as load_wisdm_2010() functions of X_train, y_train .. etc.

Update

It was determined that initially working with PLAsTiCC data was easier for experimentation and prototyping, so this issue has moved to working with that dataset instead for now.

Run without redshift

To be able to compare usefulness of redshift, we should do a run without redshift features

Run `atx` on Myriad

Bring atx to be in-line with the code for t2 such that we can run on Myriad and get results

Add Codecov report via Github actions

The code coverage report does not show within the badge on the README.md

It seems that all is required is to be able to publish the report using a Github action: https://github.com/codecov/codecov-action#usage

If required, one can obtain a token from: https://app.codecov.io/gh/tallamjr/astronet/settings , although from the settings page:

The token below is used exclusively for uploading coverage reports.
Note: Token not required for repositories uploading from Travis, CircleCI, AppVeyor, Azure Pipelines or GitHub Actions.

Add callbacks to monitor training progress and prevent overfitting

Currently, hyperparameter optimisation and training take place for a given number of epochs, it would be desirable to be able to implement an early stopping criteria if it is seem to be overfitting, or that the progress has plateaued.

tensorflow.keras.callbacks seem to be a good fit for this.

See Coursera videos for details on how to implement

Complete `visuals.py`

Complete file to make plots and save figures relating to ROCs and confusion matrices

Set up CI pipeline

There should be checks for consistent behaviour and a continuous integration of new/merged code.

The preferred workflow wold be to use Github actions

Custom loss function error: TypeError: __init__() takes from 1 to 2 positional arguments but 3 were given

This is well summarised here tensorflow/tensorflow#45079 where someone is trying to do essentially the same as is done in astronet i.e. define a custom loss function and use this to optimise for.

When running a test function such as:

$ python astronet/t2/train.py --dataset "plasticc" --epoch 1 --batch-size 256

the follow error occurs from commit 403fcdd onwards (discoverd with git bisect and the test defined above)

(astronet) 15:19:52 ✔ ~/github/tallamjr/origin/astronet (issues/53/load-datasets) :: python astronet/t2/train.py --dataset "plasticc" --epoch 1 --batch-size 256
[20-12-18 15:20:15] {train.py:28} INFO - =======================================================================================================================================
[20-12-18 15:20:15] {train.py:29} INFO - File Path: /Users/tallamjr/github/tallamjr/origin/astronet/astronet/t2/train.py
[20-12-18 15:20:15] {train.py:30} INFO - Parent of Directory Path: /Users/tallamjr/github/tallamjr/origin
(2991, 100, 6) (2991, 3)
[20-12-18 15:20:17] {train.py:54} INFO - None
(256, 100, 6)
Traceback (most recent call last):
  File "astronet/t2/train.py", line 191, in <module>
    training()
  File "astronet/t2/train.py", line 126, in __call__
    mode="min",
  File "/usr/local/anaconda3/envs/astronet/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 1103, in fit
    tmp_logs = self.train_function(iterator)
  File "/usr/local/anaconda3/envs/astronet/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 808, in train_function
    return step_function(self, iterator)
  File "/usr/local/anaconda3/envs/astronet/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 798, in step_function
    outputs = model.distribute_strategy.run(run_step, args=(data,))
  File "/usr/local/anaconda3/envs/astronet/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py", line 1259, in run
    return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
  File "/usr/local/anaconda3/envs/astronet/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py", line 2731, in call_for_each_replica
    return self._call_for_each_replica(fn, args, kwargs)
  File "/usr/local/anaconda3/envs/astronet/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py", line 3420, in _call_for_each_replica
    return fn(*args, **kwargs)
  File "/usr/local/anaconda3/envs/astronet/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 572, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/anaconda3/envs/astronet/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 791, in run_step
    outputs = model.train_step(data)
  File "/usr/local/anaconda3/envs/astronet/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 759, in train_step
    y, y_pred, sample_weight, regularization_losses=self.losses)
  File "/usr/local/anaconda3/envs/astronet/lib/python3.7/site-packages/tensorflow/python/keras/engine/compile_utils.py", line 204, in __call__
    loss_value = loss_obj(y_t, y_p, sample_weight=sw)
  File "/usr/local/anaconda3/envs/astronet/lib/python3.7/site-packages/tensorflow/python/keras/losses.py", line 152, in __call__
    losses = call_fn(y_true, y_pred)
  File "/usr/local/anaconda3/envs/astronet/lib/python3.7/site-packages/tensorflow/python/keras/losses.py", line 256, in call
    return ag_fn(y_true, y_pred, **self._fn_kwargs)
  File "/usr/local/anaconda3/envs/astronet/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 667, in wrapper
    return converted_call(f, args, kwargs, options=options)
  File "/usr/local/anaconda3/envs/astronet/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 393, in converted_call
    return _call_unconverted(f, args, kwargs, options)
  File "/usr/local/anaconda3/envs/astronet/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 478, in _call_unconverted
    return f(*args, **kwargs)
TypeError: __init__() takes from 1 to 2 positional arguments but 3 were given

As this is so closely related to the tensorflow issue linked above and many avenues exhausted, it is felt best to hold off until there is further updates in the thread of a solution is posted.

Expand hyperparameter space of `t2` model

While there is optimisation over a considerable number of parameters, there could still be room to add further parameters that one could optimise over. These include; num_layers (currently set to 1), how many neurons in the final FC layer, currently set to 20, and droprate in Dropout(), currently set to 0.1

Update docstrings

Many functions are lacking full documentation with input/output formats and examples.

To search the outstanding files, run $ ag "TODO" at the top level of astronet

Clean CAM code

Currently the code to producing the CAM plots are all in notebooks, this should be moved to python files and correctly refactored

[feature/20/plasticc] Investigate options for Gaussian Process regression for PLAsTiCC data

Linked to feature #20

SPCC PLAsTiCC data is irregularly sampled, but one required a regular sampling of points for the t2 algorithm to work best. The goal would be to use gaussian process regression to interpolate between the points in 6 dimensions.

GPFlow seems to be a good candidate for this

Update

It was determined that initially working with PLAsTiCC data was easier for experimentation and prototyping, so this issue has moved to working with that dataset instead for now.

[MAJOR] Revisit model architecture

It is time to revisit the architecture to ensure there is a clear understanding of what transformations are taking place with respect to multi-variate time series data and what analogies one can draw in relation to the Transformer architecture that is used for sequence modelling of speech.

It was previously thought that by windowing the input sequence, it would be these windows that would be attending to one another, but this was mistaken. It is each time step, or each item at each time step that attends to every other item in the sequence, including itself.

image

It was also previously thought that by convolving the inputs of input shape (BATCH_SIZE, timesteps, num_features) that this would "preserve temporal information". The (wrongful) reasoning behind this was caused by reading in "Hands-On ML" book:

Keras offers a TimeDistributed layer ... it wraps any layer (e.g., a Dense layer) and applies it at every time step of its input sequence. It does this efficiently, by reshaping the inputs so that each time step is treated as a separate instance (i.e., it reshapes the inputs from [batch size, time steps, input dimensions] to [batch size × time steps, input dimensions];

The Dense layer actually supports sequences as inputs (and even higher-dimensional inputs): it handles them just like TimeDistributed(Dense(...)), meaning it is applied to the last input dimension only (independently across all time steps). Thus, we could replace the last layer with just Dense(10). For the sake of clarity, however, we will keep using TimeDistributed(Dense(10)) because it makes it clear that the Dense layer is applied independently at each time step and that the model will output a sequence, not just a single vector.

  • Note that a TimeDistributed(Dense(n)) layer is equivalent to a Conv1D(n, filter_size=1) layer.

What became apparent is that just because the convolution is applied at each time-step individually, this has no bearing on temporal information being preserved, and in fact from "Attention is all you need" paper, the state:

3.5 Positional Encoding
Since our model contains no recurrence and no convolution, in order for the model to make use of the
order of the sequence, we must inject some information about the relative or absolute position of the tokens in the sequence. To this end, we add "positional encodings" to the input embeddings at the
bottoms of the encoder and decoder stacks. The positional encodings have the same dimension dmodel
as the embeddings, so that the two can be summed.

Therefore, it is believed that Positional Encoding is required to "bring back" the temporal information that is originally present.

The Transformer architecture is whilst used for input sequences of words (sentences), the analogy can be draw to Astrophysical transients in the sense of a light curve being a sentence, and 6-D observations at each time step being equivalent to words. Considering the EncodingLayer only for now, the encoder takes input a batch of sentences/light-curves represented as sequences of word IDs/6-D observations (the input shape is [batch size, max input sentence length]), and it encodes each word/6-D observation into a 512-dimensional/d-model representation (so the encoder’s output shape is [batch size, max input sentence length, d-model]).

image

So, for our model, it will take in a full light curve, consisting of N-timesteps for each object. It will then apply a convolutional embedding to each timestep to transform the data from [batch size, N-timesteps, 6-D] --> [batch size, N-timesteps, d-model]. From here, a positional encoding will be calculated using trigonometric functions to determine the position for each observation in the sequence. These are then summed together to produce an input of shape [batch size, max input sentence length (==N-timesteps), d-model]. At this point, the EncodingLayer will process this input through the multi-head self attention layers as well other layers.

image

Going forward, the items that are required are to first implement a PositionalEncoding class. Following this, a refactor of the architecture as a whole will need to be looked at. Furthermore, a look into the plasticc data preprocessing that created the windowing; this should be revised to 100 (N x GPs) where an input is a whole sequence, i.e a single light-curve for a single object.

TODO

  • PositionalEncoding class
  • Refactor model.py drawing examples from Hands-On book and tensorflow documentation: https://www.tensorflow.org/tutorials/text/transformer
  • Ensure parquet file is correct with "appropriate" windowing taking place. Perhaps reduce number of GPs (investigate)

Refs:

AttributeError: module 'tensorflow._api.v2.experimental' has no attribute 'numpy'

It seems that tensorflow v2.3.1 no longer works for the experimental numpy api. This may have actually always been the case and v2.4.0 was being used all along locally. Regardless, when working in v2.3.1 environment, the following error is thrown when attempting to use the api:

    def test_numpy_average():

        numpy_API = np.average(range(1, 11), weights=range(10, 0, -1))

>       tensorflow_API = tf.experimental.numpy.average(range(1, 11), weights=range(10, 0, -1))
E       AttributeError: module 'tensorflow._api.v2.experimental' has no attribute 'numpy'

numpy_API  = 4.0

astronet/t2/tests/int/test_numpy_api.py:9: AttributeError
================================================= slowest durations ==================================================

(3 durations < 0.005s hidden.  Use -vv to show these durations.)
============================================== short test summary info ===============================================
FAILED astronet/t2/tests/int/test_numpy_api.py::test_numpy_average - AttributeError: module 'tensorflow._api.v2.exp...
================================================= 1 failed in 4.65s ==================================================

This test has been tried on v2.4.0 and it passes,

Successfully installed tensorflow-2.4.0 tensorflow-estimator-2.4.0
(astronet) 13:47:29~/github/tallamjr/origin/astronet (issues/53/load-datasets) :: pytest astronet/t2/tests/int/test_numpy_api.py
================================================ test session starts =================================================
platform darwin -- Python 3.7.8, pytest-6.1.2, py-1.9.0, pluggy-0.13.1 -- /usr/local/anaconda3/envs/astronet/bin/python
cachedir: .pytest_cache
rootdir: /Users/tallamjr/github/tallamjr/origin/astronet, configfile: pytest.ini
plugins: nbval-0.9.6, cov-2.10.1, subtests-0.3.2
collected 1 item

astronet/t2/tests/int/test_numpy_api.py::test_numpy_average
        astronet/t2/tests/int/test_numpy_api.py::test_numpy_averagePASSED

================================================= slowest durations ==================================================

(3 durations < 0.005s hidden.  Use -vv to show these durations.)
================================================= 1 passed in 14.92s =================================================

so a solution is to upgrade the requirements.txt to v2.4.0 however, when one does this on Hypatia, another kind of error is thrown, described here tensorflow/tensorflow#45744 .

As of this tensorflow/tensorflow#45744 (comment) there seems to be a fix in the pipeline, so on Hypatia, the plan is the revert back to v2.3.1 until the Illegal instruction error is fixed and then bump to v2.4.0 for tensorflow. The knockon effect is that no PLAsTiCC analysis can be run until then

Stack Encoding Layers to be in line with Vaswani paper

In Vaswani et. al, they stack there Encoding blocks x N, where N = 6 in the paper.

This can be seen below:
image

In the Tensorflow documentation and guides on Transformers they implement this as follows:

class Encoder(tf.keras.layers.Layer):
  def __init__(self, num_layers, d_model, num_heads, dff, input_vocab_size,
               maximum_position_encoding, rate=0.1):
    super(Encoder, self).__init__()

    self.d_model = d_model
    self.num_layers = num_layers

    self.embedding = tf.keras.layers.Embedding(input_vocab_size, d_model)
    self.pos_encoding = positional_encoding(maximum_position_encoding, 
                                            self.d_model)


    self.enc_layers = [EncoderLayer(d_model, num_heads, dff, rate) 
                       for _ in range(num_layers)]

    self.dropout = tf.keras.layers.Dropout(rate)

  def call(self, x, training, mask):

    seq_len = tf.shape(x)[1]

    # adding embedding and position encoding.
    x = self.embedding(x)  # (batch_size, input_seq_len, d_model)
    x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
    x += self.pos_encoding[:, :seq_len, :]

    x = self.dropout(x, training=training)

    for i in range(self.num_layers):
      x = self.enc_layers[i](x, training, mask)

    return x  # (batch_size, input_seq_len, d_model)

Notice the lines:

    self.enc_layers = [EncoderLayer(d_model, num_heads, dff, rate) 
                       for _ in range(num_layers)]

and

    for i in range(self.num_layers):
      x = self.enc_layers[i](x, training, mask)

    return x  # (batch_size, input_seq_len, d_model)

This could perhaps be used with TransformerBlock in place of EncoderLayer

Re-train PLAsTiCC model with 'weighted' Log-Loss metric and loss for optimisation

Previous models have been optimised using the following line in astronet/t2/opt/hypertrain.py:118

        # Evaluate the model accuracy on the validation set.
        score = model.evaluate(X_val, y_val, verbose=0)
        return score[1]

However, it would be desirable to optimise this using the metric defined by AI. Malz et. al of a weighted Log-Loss

An example implementation of this can be found in astrorapids or perhaps with sklearn.metrics.log_loss akin to:

def plasticc_log_loss(y_true, probs):
    """Implementation of weighted log loss used for the Kaggle challenge.

    Parameters
    ----------
    y_true: np.array of shape (# samples,)
        Array of the true classes
    probs : np.array of shape (# samples, # features)
        Class probabilities for each sample. The order of the classes corresponds to
        that in the attribute `classes_` of the classifier used.

    Returns
    -------
    float
        Weighted log loss used for the Kaggle challenge
    """
    predictions = probs.copy()
    labels = np.unique(y_true) # assumes the probabilities are also ordered in the same way
    weights_dict = {6:1/18, 15:1/9, 16:1/18, 42:1/18, 52:1/18, 53:1/18, 62:1/18, 64:1/9,
                    65:1/18, 67:1/18, 88:1/18, 90:1/18, 92:1/18, 95:1/18, 99:1/19,
                    1:1/18, 2:1/18, 3:1/18}

    # sanitize predictions
    epsilon = sys.float_info.epsilon  # this is machine dependent but essentially prevents log(0)
    predictions = np.clip(predictions, epsilon, 1.0 - epsilon)
    predictions = predictions / np.sum(predictions, axis=1)[:, np.newaxis]

    predictions = np.log(predictions) # logarithm because we want a log loss
    class_logloss, weights = [], [] # initialize the classes logloss and weights
    for i in range(np.shape(predictions)[1]): # run for each class
        current_label = labels[i]
        result = np.average(predictions[y_true==current_label, i]) # only those events are from that class
        class_logloss.append(result)
        weights.append(weights_dict[current_label])
    return -1 * np.average(class_logloss, weights=weights)

There, the above code in astronet/t2/opt/hypertrain.py:118 would be something like follows:

        # Evaluate weighted Log Loss on the validation set.
        probs = model.predict(X_val)
        logloss = plasticc_log_loss(y_val, probs)
        return -logloss  # Minus since we want to maximise the objective

And then in astronet/t2/train.py the model.compile( .. ) would need something like:

        model.compile(
            loss="categorical_crossentropy", optimizer=optimizers.Adam(lr=lr), metrics=[plasticc_log_loss]
        )

[MR/35] Implement `PositionalEncoding` class

As discussed in #35 , it seems a PositionalEncoding class is required to restore temporal information to the input sequence.

From Hands On ML book:

The positional embeddings are simply dense vectors (much like word embeddings) that represent the position of a word in the sentence. The nth positional embedding is added to the word embedding of the nth word in each sentence. This gives the model access to each word’s position, which is needed because the Multi-Head Attention layers do not consider the order or the position of the words; they only look at their relationships. Since all the other layers are
time-distributed, they have no way of knowing the position of each word (either relative or absolute). Obviously, the relative and absolute word positions are important, so we need to give this information to the Transformer somehow, and positional embeddings are a good way to do this.

Examples can be found at https://www.tensorflow.org/tutorials/text/transformer which uses functions akin to:

def get_angles(pos, i, d_model):
  angle_rates = 1 / np.power(10000, (2 * (i//2)) / np.float32(d_model))
  return pos * angle_rates

def positional_encoding(position, d_model):
  angle_rads = get_angles(np.arange(position)[:, np.newaxis],
                          np.arange(d_model)[np.newaxis, :],
                          d_model)

  # apply sin to even indices in the array; 2i
  angle_rads[:, 0::2] = np.sin(angle_rads[:, 0::2])

  # apply cos to odd indices in the array; 2i+1
  angle_rads[:, 1::2] = np.cos(angle_rads[:, 1::2])

  pos_encoding = angle_rads[np.newaxis, ...]

  return tf.cast(pos_encoding, dtype=tf.float32)

Which is then used later in an EncodingLayer like so:

class Encoder(tf.keras.layers.Layer):
  def __init__(self, num_layers, d_model, num_heads, dff, input_vocab_size,
               maximum_position_encoding, rate=0.1):
    super(Encoder, self).__init__()

    self.d_model = d_model
    self.num_layers = num_layers

    self.embedding = tf.keras.layers.Embedding(input_vocab_size, d_model)
    self.pos_encoding = positional_encoding(maximum_position_encoding, 
                                            self.d_model)


    self.enc_layers = [EncoderLayer(d_model, num_heads, dff, rate) 
                       for _ in range(num_layers)]

    self.dropout = tf.keras.layers.Dropout(rate)

  def call(self, x, training, mask):

    seq_len = tf.shape(x)[1]

    # adding embedding and position encoding.
    x = self.embedding(x)  # (batch_size, input_seq_len, d_model)
    x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
    x += self.pos_encoding[:, :seq_len, :]

    x = self.dropout(x, training=training)

    for i in range(self.num_layers):
      x = self.enc_layers[i](x, training, mask)

    return x  # (batch_size, input_seq_len, d_model)

OR, in Hands-On ML book (pg 558), a PositionalEncoding class is defined like:

import numpy as np

class PositionalEncoding(keras.layers.Layer):
    def __init__(self, max_steps, max_dims, dtype=tf.float32, **kwargs):
        super(PositionalEncoding).__init__(dtype=dtype, **kwargs)
        if max_dims % 2 == 1:
            max_dims += 1  # max_dims must be even
        p, i = np.meshgrid(np.arange(max_steps), np.arange(max_dims // 2))
        pos_emb = np.empty((1, max_steps, max_dims))
        pos_emb[0, :, ::2] = np.sin(p / 10000 ** (2 * i / max_dims)).T
        pos_emb[0, :, 1::2] = np.cos(p / 10000 ** (2 * i / max_dims)).T
        self.positional_embedding = tf.constant(pos_emb.astype(self.dtype))

    def call(self, inputs):
        shape = tf.shape(inputs)
        return inputs + self.positional_embedding[:, : shape[-2], : shape[-1]]

To be used elsewhere as:

positional_encoding = PositionalEncoding(max_steps, max_dims=embed_size) 
encoder_in = positional_encoding(conv_embeddings)

With the diff of model.py perhaps something like:

diff --git a/astronet/t2/model.py b/astronet/t2/model.py
index 50eb080..89351a4 100644
--- a/astronet/t2/model.py
+++ b/astronet/t2/model.py
@@ -2,7 +2,7 @@ import tensorflow as tf
 from tensorflow import keras
 from tensorflow.keras import layers
 
-from astronet.t2.transformer import ConvEmbedding, TransformerBlock
+from astronet.t2.transformer import ConvEmbedding, PositionalEncoding, TransformerBlock
 
 
 class T2Model(keras.Model):
@@ -22,6 +22,7 @@ class T2Model(keras.Model):
         self.num_classes    = num_classes
 
         self.embedding      = ConvEmbedding(num_filters=self.num_filters, input_shape=input_dim)
+        self.pos_encoding   = PositionalEncoding(max_steps=input_dim[1], max_dims=embed_dim)
         self.encoder        = TransformerBlock(self.embed_dim, self.num_heads, self.ff_dim)
         self.pooling        = layers.GlobalAveragePooling1D()
         self.dropout1       = layers.Dropout(0.1)
@@ -32,6 +33,7 @@ class T2Model(keras.Model):
     def call(self, inputs, training=None):
 
         x = self.embedding(inputs)
+        x = self.pos_encoding(x)
         x = self.encoder(x)
         x = self.pooling(x)
         if training:

Include `pool_size` as hyperparameter to optimise over

Currently the pool_size configuration in some layers are set to the self.kernel_size as this was previously the same setting as 3 for both.

It might be worth investigating the effect of allowing this to vary and to be optimised for.

The code for adding something like this could look like:

        pool_size = trial.suggest_categorical("kernel_size", [3, 6, 12, 24, 48, 96])  # --> Pooling width
...
        model = SNXModel(
            num_classes=num_classes,
            kernel_size=kernel_size,
            pool_size=pool_size
        )

With modifications made to snX/layers.py file accordingly

Revisit implementation of Multi-head attention

Currently there are two implementations of Multi-head attention. The one in use at the moment can be found in astronet.t2.attention.py with the other found in astronet.t2.multihead_attention.py

The current one does not use mask-ing whilst the other does, it is not certain if this is something that is required for the set up we are going for with Supernova.

There exists unit tests for the astronet.t2.multihead_attention.py implementation, but not for the one that is currently in use, this should be addressed. The astronet.t2.multihead_attention.py also returns outputs and attention_weights

Next steps would be to implement tests for the one in use, and compare implementations to decide which is best.

Refs:

Evaluation notebook

Whilst there exists code to visualise results in astronet.t2.visualise_results.py it would be useful to have a notebook that allows for visualisation of results and functionality for visualising weights and the response of the attention mechanism.

[BENCHMARK] Compare results to `snmachine`

To gauge how well the model is holding up compared to other methods, this issue is for determining the items that need to be address in order to have a clear and fair comparison.

One item is to ensure that the same objects are bing considered in the train and test sets. This will need to be checked and will be able to be implemented either in tandem or after #43

The goal would be to compare with the table below, specifically LL_test, with no redshift (z) information:

image

Have a pipeline notebook - `T2-Analysis-Pipeline.ipynb`

It would be desirable to have a pipeline notebook that can be run in conjunction with CI checks (related to #4 ) where the results can be crossed checked.

This would also be useful to demonstrate how the pipeline is run for other and to ensure reproducibility.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.