Coder Social home page Coder Social logo

henrysky / astronn Goto Github PK

View Code? Open in Web Editor NEW
188.0 9.0 53.0 181.3 MB

Deep Learning for Astronomers with Tensorflow

Home Page: http://astronn.readthedocs.io/

License: MIT License

Python 99.98% HTML 0.02%
tensorflow neural-network python astronomy astrophysics science neural-networks

astronn's People

Contributors

dependabot[bot] avatar henrysky avatar nolankoblischke avatar richardscottoz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

astronn's Issues

DR16 astroNN catalog of distances produces incorrect parsec values for Md and Mg stars

System information

  • Have I written custom code?:
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04 or Windows 10 v1709 x64):
  • astroNN (Build or Version):
  • Did you try the latest astroNN commit?:
  • TensorFlow installed from (source or binary, official build?):
  • TensorFlow version:
  • Python version:
  • CUDA & cuDNN version (if applicable):
  • GPU model and memor (if applicable)y:
  • Exact command/script to reproduce (if applicable):

Describe the problem

astroNN Gaia DR2 parallax zero-point offset with deep learning

Gaia DR2 calculates it as โˆ’0.029 mas.
Sloan Digital Sky Survey Apogee calculates it as โˆ’0.0523 mas.
Modified parallax = parallax - zero point offset
Data model: apogee_astroNN provides spectro-photometric deep learning parsec distances.
Distance in parsecs to the Orion Nebula for star classes BA, Fd, GKd and GKg pretty much agree. But astroNN appears to produce 4-5 times larger distances for Md and Mg stars.

Parsecs calculated with parallax zero point offset options:
Parsec- no offset
Dist - Apogee Deep Learning
DistApogee - use Apogee offset
DistGaia - use Gaia offset

11D510AD-2511-47D8-B70A-EC2785E3D07C
D31ABBE0-7658-41D5-ACD2-43AD2ED29C37
C2C2D6FE-2466-4B09-9060-F3428E2C73F3
D2886AD6-A25C-43BD-AEDF-88B3D20A9D4D
9437CA6F-0CE7-4C21-B745-30F431C435F0

Source code / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.

Suggestion

Optional, if you have any idea how to fix the issue

Current .h5 dataset loading mechanism is problematic

Currently, this is viewed as a low priority performance related issue. Probably wont be fixed in near future

System information

  • Have I written custom code?: Irrelevant
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04 or Windows 10 v1709 x64): Irrelevant
  • astroNN (Build or Version): commit 29fde34
  • TensorFlow installed from (source or binary, official build?): Irrelevant
  • TensorFlow version: Irrelevant
  • Keras version: Irrelevant
  • Python version: Irrelevant
  • CUDA/cuDNN version (Only neccessary if you are using Tensorflow-gpu): Irrelevant
  • GPU model and memory (Only neccessary if you are using Tensorflow-gpu): Irrelevant
  • Exact command/script to reproduce (optional): Irrelevant

Describe the problem

Current .h5 dataset loading mechanism is problematic due to the fact that astroNN load the whole dataset into memory regardless of the size. It will eventually be a serious problem if the dataset is too big and have too little memory (Already a little problem of loading APOGEE training data (~12GB on my 16GB RAM laptop and desktop)

Source code / logs

Irrelevant

Suggestion

Neural Network/Data generator should talk to H5Loader directly instead of H5Loader loads the whole dataset to memory to Neural Network/Data generator.

Weird errors raised by running the new accelerated BNN test() method

System information

  • Have I written custom code?: Irrelevant
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04 or Windows 10 v1709 x64): Win10 v1706 x64, CentOS 7.4
  • astroNN (Build or Version): 0.9.2.8dev
  • Did you try the latest astroNN commit?: Yes
  • TensorFlow installed from (source or binary, official build?): official GPU build for Windows and CPU for CentOS
  • TensorFlow version: 1.7.0rc-1 for Windows, 1.7.0 for CentOS
  • Keras version: 2.1.5
  • Python version: 3.6
  • CUDA/cuDNN version (Only neccessary if you are using Tensorflow-gpu): 9.0/7.0
  • GPU model and memory (Only neccessary if you are using Tensorflow-gpu): Irrelevant
  • Exact command/script to reproduce (optional): Running BNN test() mulitple times in a row

Describe the problem

Running BNN test() mulitple times (the 7th time??) in a row will raise a weird error complaining shape not right, or dimension not right, can be reproduced on both CPU and GPU on my Windows and astro department linux server.

This bug is initially discovered by doing open/globular clusters benchmark, because I need to run BNN test() method for every cluster by stopped by this bug

Source code / logs

Variation 1:

ValueError                                Traceback (most recent call last)
<ipython-input-7-c36beede380f> in <module>()
     57         print(np.sum(np.isnan(spec)))
     58     print(name, ' and number of stars: ', indices.shape[0])
---> 59     pred, pred_var = bcnn.test(spec[1:])
     60     means = np.mean(pred, axis=0)
     61     mad_stds = mad_std(pred, axis=0)

d:\university\ast425\astronn\astroNN\models\BayesianCNNBase.py in test(self, input_data, inputs_err)
    210                                                                                           inputs_err[data_gen_shape:])
    211             remainder_result = np.asarray(new.predict_generator(remainder_generator, steps=1))
--> 212             result = np.concatenate((result, remainder_result))
    213 
    214         if result.ndim < 3:  # in case only 1 test data point, in such case we need to add a dimension

ValueError: all the input arrays must have same number of dimensions

Variation 2

ValueError                                Traceback (most recent call last)
<ipython-input-8-b4056e9283f7> in <module>()
     55         print(np.sum(np.isnan(spec)))
     56     print(name, ' and number of stars: ', indices.shape[0])
---> 57     pred, pred_var = bcnn.test(spec[1:])
     58     means = np.mean(pred, axis=0)
     59     mad_stds = mad_std(pred, axis=0)

d:\university\ast425\astronn\astroNN\models\BayesianCNNBase.py in test(self, input_data, inputs_err)
    218 
    219         predictions = result[:, :half_first_dim, 0]  # mean prediction
--> 220         mc_dropout_uncertainty = result[:, :half_first_dim, 1] * (self.labels_std ** 2)  # model uncertainty
    221         predictions_var = np.exp(result[:, half_first_dim:, 0]) * (self.labels_std ** 2)  # predictive uncertainty
    222 

ValueError: operands could not be broadcast together with shapes (1,5075) (25,) 

Suggestion

The cause is unknown but BNN test_old() method is unaffected

Parrallel odeint integration wrt func or parameter

If I have an ODE function for example like this:

class ODE(object):
    def __init__(self, k1, k2):
        self.k1, self.k2 = k1, k2

    def __call__(self, y, t):
        d_1 = - self.k1 * y[0] + self.k2 * y[1]
        d_2 = self.k1 * y[0] - self.k2 * y[1]

        return tf.stack([d_1, d_2])

ode_func = ODE(3., 5.)

And if I now would like to do this in parallel over k1, k2, would this be the way to do it?

class ODE(object):
    def __init__(self, k1, k2):
        self.k1, self.k2 = k1, k2
        self.size = len(k1)

    def __call__(self, y, t):
        d_1 = - self.k1 * y[:self.size] + self.k2 * y[self.size:]
        d_2 = self.k1 * y[:self.size] - self.k2 * y[self.size:]

        return tf.concat([d_1, d_2], axis=0)

cpu_fallback()
gpu_memory_manage()


k1 = tf.constant(np.arange(1., 6), dtype=tf.float64)
k2 = tf.constant(np.arange(1., 6)[::-1], dtype=tf.float64)

ode_func = ODE(k1, k2)

NUM_SAMPLES=100
y_init = tf.concat([np.ones(5, dtype=np.float), np.zeros(5, dtype=np.float)], axis=0)
t = tf.constant(np.linspace(0., 10., num=NUM_SAMPLES), dtype=tf.float64)
f = ODE(k1, k2)
y = odeint(f, y_init, t, precision=tf.float64)

Issue loading the Galaxy10 dataset

Thank you for this lovely library first and foremost.

I am trying to access the Galaxy10 DECals dataset (as opposed to the SDSS one) without using the h5 reader as I want to use it as a colab demo.

I've run both ! pip install astroNN and tried cloning directly into the colab following your instructions on this commit: 9dcd394

Despite that, using load_galaxy10 still seems to be loading the SDSS dataset and not the DECals. Do you have any guidance?

I've looked at your code and I can't see why it's loading the old dataset.

Maybe the issue is in imports?

# Import statements
 
from astroNN.datasets import load_galaxy10
from tensorflow.keras import utils
 
# To load images and labels (will download automatically at the first time)
# labels corresponds to galaxy classes as specified by Galaxy Zoo
images, labels = load_galaxy10()

Thank you so much for your help!

Transfer learning & Fine-tuning

Hi, Henry. I've got a well trained astroNN model, but I want to do some transfer learning to make it adaptable to another survey. What I've done is remove the top dense layer of the base model and build a new dense layer, but now it can only be treat like an ordinary keras model. By the way, the base model itself is a custom model under the parent class ''BayesianCNNBase''

I'm wondering:

  1. What should I do if I want to build a new astroNN model on an astroNN base model? Should I build a new class, say ''transfer_model'', under ''BayesianCNNBase'' and load the base model in my new def model() function?
  2. How can I do the fine-tuning step(fit_on_batch seems not enough)?

Thank you!

tensorflow 2.4.1

Hello, thank you for your work!
Does astroNN work with tensorflow 2.4.1?
Because whenever I import a module I get

cannot import name 'get_default_session' from 'tensorflow'

For example I am trying to do

from astroNN.models.apogee_models import ApogeeBCNN

thank you in advance, Lucia

Galaxy-10 missing images

I was considering doing a few demos with the Galaxy10 dataset but noticed that the Galaxy10.h5 file linked here has 21785 images and not the 25753 stated on the webpage. Was this a typo or are some images missing?

Thanks for assembling this fun toy dataset :)

ApogeeBCNN() dimensions

Hello and thank you for sharing your work.
I want to classify images with color depth with a Bayesian Neural Network.
Though, with this model, I am getting a dimensions error:

Input 0 of layer max_pooling1d_13 is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: (None, 75, 75, 3)

My input is a dataset loaded with

training_dataset = tf.keras.preprocessing.image_dataset_from_directory

and converted to tensors with

images, labels = next(iter(training_dataset))

so I am trying to train the model with

bcnn_net = ApogeeBCNN()
bcnn_net.fit(images, labels )

Why am I getting this error? Is there a specific way to pass the data?

Thank you, Lucia

Loading Galaxy10 dataset

"To load images and labels (will download automatically at the first time)"
"# First time downloading location will be ~/.astroNN/datasets/"
images, labels = load_galaxy10()

Trying to load the galaxy10 dataset using astroNN but i am getting the following error:
URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate(_ssl.c:1131)>

Anyone knows why this is? Thanks in advance.

Keras's fit_generator failed when use_multiprocessing=True on WIndows only

System information

  • Have I written custom code?: Nope
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04 or Windows 10 v1709 x64): Windows 10 v1709 x64
  • astroNN (Build or Version): commit b27d557
  • TensorFlow installed from (source or binary, official build?): official py36 build
  • TensorFlow version: 1.5-rc-1
  • Keras version: 2.1.3
  • Python version: 3.6.3
  • CUDA/cuDNN version (Only neccessary if you are using Tensorflow-gpu): Cuda 9.0, CuDNN 7.0
  • GPU model and memory (Only neccessary if you are using Tensorflow-gpu): GTX1060 6GB
  • Exact command/script to reproduce (optional): use_multiprocessing=True in fit_generator

Describe the problem

astroNN's generator is already thread safe

It is a known issue on Windows caused by python. Probably will work on Linux/MacOS.

So far the only issue is CPU can't generate data fast enough for a fast GPU (GTX970 or above and at least 4 threads CPU).

Only neccessary when you are using BCNN with GPU training

Link: matterport/Mask_RCNN#13
Link: keras-team/keras#6582

Source code / logs

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-17f261cd711f> in <module>()
      2 bcnn = Apogee_BCNN()
      3 bcnn.max_epochs = 75
----> 4 bcnn.train(x,y,x_err,y_err)

d:\university\ast425\astronn\astroNN\models\Apogee_BCNN.py in train(self, input_data, labels, inputs_err, labels_err)
    111                                        validation_steps=self.val_num // self.batch_size,
    112                                        epochs=self.max_epochs, verbose=2, workers=os.cpu_count(),
--> 113                                        callbacks=[reduce_lr, csv_logger], use_multiprocessing=True)
    114 
    115         # Call the post training checklist to save parameters

~\Anaconda3\lib\site-packages\keras\legacy\interfaces.py in wrapper(*args, **kwargs)
     89                 warnings.warn('Update your `' + object_name +
     90                               '` call to the Keras 2 API: ' + signature, stacklevel=2)
---> 91             return func(*args, **kwargs)
     92         wrapper._original_function = func
     93         return wrapper

~\Anaconda3\lib\site-packages\keras\engine\training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
   2097                             val_enqueuer = GeneratorEnqueuer(validation_data,
   2098                                                              use_multiprocessing=use_multiprocessing,
-> 2099                                                              wait_time=wait_time)
   2100                         val_enqueuer.start(workers=workers, max_queue_size=max_queue_size)
   2101                         validation_generator = val_enqueuer.get()

Suggestion

Detect user's OS and enable multiprocessing in fit_generator on MacOS and Linux

Complete Tensorflow support without installing Keras separately

System information

  • Have I written custom code?: Irrelevant
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04 or Windows 10 v1709 x64): Irrelevant
  • astroNN (Build or Version): Irrelevant
  • Did you try the latest astroNN commit?: Irrelevant
  • TensorFlow installed from (source or binary, official build?): Irrelevant
  • TensorFlow version: >=1.5.0
  • Keras version: Irrelevant
  • Python version: >=3.5
  • CUDA/cuDNN version (Only neccessary if you are using Tensorflow-gpu): Irrelevant
  • GPU model and memory (Only neccessary if you are using Tensorflow-gpu): Irrelevant
  • Exact command/script to reproduce (optional): Irrelevant

Describe the problem

Since Tensorflow 1.5.0, Keras is an official part of Tensroflow API (tensorflow.keras). astroNN should support both keras and tensorflow.keras.

What is done?

  • Loss functions are all written in tensorflow

What is not done?

  • Layers and CallBacks are all written with keras
  • Models and training process are all written with keras
  • Session management is currently done with keras
  • astroNN's configuration file

Source code / logs

A relevant discussion on Keras github

Suggestion

  • Configuration file (let users choose keras or tensorflow.keras)
  • Default confuguration should point to keras or tensorflow.keras??

ODE example on tensorflow 2.2.0

When I run the odeint example on tensorflow 2.2.0 i get the error:

  File "C:\Users\jhsmi\pp\astroNN\astroNN\neuralode\dop853.py", line 177, in dopri853core
    if tf.equal(hmax, 0.0):
  File "C:\Users\jhsmi\Miniconda3\envs\py37_tf_dev\lib\site-packages\tensorflow\python\framework\ops.py", line 778, in __bool__
    self._disallow_bool_casting()
  File "C:\Users\jhsmi\Miniconda3\envs\py37_tf_dev\lib\site-packages\tensorflow\python\framework\ops.py", line 545, in _disallow_bool_casting
    "using a `tf.Tensor` as a Python `bool`")
  File "C:\Users\jhsmi\Miniconda3\envs\py37_tf_dev\lib\site-packages\tensorflow\python\framework\ops.py", line 532, in _disallow_when_autograph_enabled
    " decorating it directly with @tf.function.".format(task))
tensorflow.python.framework.errors_impl.OperatorNotAllowedInGraphError: using a `tf.Tensor` as a Python `bool` is not allowed: AutoGraph did not convert this function. Try decorating it directly with @tf.function.

It works fine for me on TF 2.1.0

Bugs in 3 of the demo_tutorial/NN_uncertainty_analysis

System information

But these introductory examples are buggy. As a beginner on deeplearning, it is not obvious for me to correct some simple bugs.

Those notebooks are very old and are not working anymore.

  • **OS Platform and Distribution MacOSX: Big Sur (but same on binder)

  • astroNN (Build or Version): master

  • Did you try the latest astroNN commit?: I have done git clone from master

  • TensorFlow installed from (source or binary, official build?): pip install

  • TensorFlow version: tensorflow 2.12.0

  • Python version: Python 3.9.16

  • Exact command/script to reproduce (if applicable):

Describe the problem

Describe the problem clearly here. Be sure to describe here why it's a bug in astroNN (instead of Tensorflow's problem) or a feature request.

Among the 4 examples

  • Uncertainty_Demo_MNIST.ipynb --> OK
  • Uncertainty_Demo_quad.ipynb --> Does not work
  • Uncertainty_Demo_x_sinx.ipynb --> Does not work
  • Uncertainty_Demo_x_sinx_tfp.ipyn --> Does note work

After minor numpy format correction I have found inUncertainty_Demo_quad.ipynb , the generator generate_train_batch(x, y, y_err) is not accepted by model.fit(), more over the proposed model.fit_generator() is not accepted anymore by Tensorflow.

In the section Third, use a single model to get both epistemic and aleatoric uncertainty with variational inference

I tried to skip the generator by providing directly the data not involving any generator, but the data format was not accepted.

   the_in,the_out =  next(generator) 
  model.fit(the_in,the_out, epochs=20, max_queue_size=20, verbose=0, 
                steps_per_epoch= x.shape[0] // batch_size)

I have no deep knowledge in Tensorflow to understand the data format error.

     TypeError: You are passing KerasTensor(type_spec=TensorSpec(shape=(), dtype=tf.float32, name=None), name='Placeholder:0', description="created by layer 'tf.cast_2'"), an intermediate Keras symbolic input/output, to a TF API that does not allow registering custom dispatchers, such as `tf.cond`, `tf.function`, gradient tapes, or `tf.map_fn`. Keras Functional model construction only supports TF API calls that *do* support dispatching, such as `tf.math.add` or `tf.reshape`. Other APIs cannot be called directly on symbolic Kerasinputs/outputs. You can work around this limitation by putting the operation in a custom Keras layer `call` and calling that layer on this symbolic input/output.

I hope you could quickly fix these simple examples such I could start from a simple working example.

Many thanks.

Can not reproduce results of Uncertainty_Demo_MNIST.ipynb

Hi, thanks for sharing these great implementation on github! Nice work.

I ran your notebook Uncertainty_Demo_MNIST.ipynb.
However I can not get the same results as it showed in the notebook output. The loss I got are all nan.

Could you suggest why?

The output I got from the second cell (Train the neural network on MNIST training set):

Number of Training Data: 54000, Number of Validation Data: 6000
====Message from Normalizer====
You selected mode: 255
Featurewise Center: False
Datawise Center: False
Featurewise std Center: False
Datawise std Center: False
====Message ends====
====Message from Normalizer====
You selected mode: 0
Featurewise Center: False
Datawise Center: False
Featurewise std Center: False
Datawise std Center: False
====Message ends====
Sorry but there is a known issue of the loss not handling loss correctly. I will fix it in May-- Henry 19 April 2018
Epoch 1/5
 - 163s - loss: nan - output_loss: nan - variance_output_loss: nan - output_categorical_accuracy: 0.0980 - val_loss: nan - val_output_loss: nan - val_variance_output_loss: nan - val_output_categorical_accuracy: 0.0991
Epoch 2/5
 - 159s - loss: nan - output_loss: nan - variance_output_loss: nan - output_categorical_accuracy: 0.0987 - val_loss: nan - val_output_loss: nan - val_variance_output_loss: nan - val_output_categorical_accuracy: 0.1047

Epoch 00002: ReduceLROnPlateau reducing learning rate to 0.0024999999441206455.
Epoch 3/5
 - 157s - loss: nan - output_loss: nan - variance_output_loss: nan - output_categorical_accuracy: 0.1001 - val_loss: nan - val_output_loss: nan - val_variance_output_loss: nan - val_output_categorical_accuracy: 0.0971

Epoch 00003: ReduceLROnPlateau reducing learning rate to 0.0012499999720603228.
Epoch 4/5
 - 157s - loss: nan - output_loss: nan - variance_output_loss: nan - output_categorical_accuracy: 0.0967 - val_loss: nan - val_output_loss: nan - val_variance_output_loss: nan - val_output_categorical_accuracy: 0.1008

Epoch 00004: ReduceLROnPlateau reducing learning rate to 0.0006249999860301614.
Epoch 5/5
 - 157s - loss: nan - output_loss: nan - variance_output_loss: nan - output_categorical_accuracy: 0.0998 - val_loss: nan - val_output_loss: nan - val_variance_output_loss: nan - val_output_categorical_accuracy: 0.1003

Epoch 00005: ReduceLROnPlateau reducing learning rate to 0.0003124999930150807.
Completed Training, 794.97s in total

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.