Coder Social home page Coder Social logo

tensorflow / adanet Goto Github PK

View Code? Open in Web Editor NEW
3.5K 173.0 532.0 2.51 MB

Fast and flexible AutoML with learning guarantees.

Home Page: https://adanet.readthedocs.io

License: Apache License 2.0

Python 34.98% Jupyter Notebook 63.96% Shell 0.10% Starlark 0.96%
automl tensorflow learning-theory deep-learning neural-architecture-search gpu machine-learning ensemble tpu python

adanet's Introduction

Python PyPI DOI CII Best Practices OpenSSF Scorecard Fuzzing Status Fuzzing Status OSSRank Contributor Covenant TF Official Continuous TF Official Nightly

Documentation
Documentation

TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML-powered applications.

TensorFlow was originally developed by researchers and engineers working within the Machine Intelligence team at Google Brain to conduct research in machine learning and neural networks. However, the framework is versatile enough to be used in other areas as well.

TensorFlow provides stable Python and C++ APIs, as well as a non-guaranteed backward compatible API for other languages.

Keep up-to-date with release announcements and security updates by subscribing to [email protected]. See all the mailing lists.

Install

See the TensorFlow install guide for the pip package, to enable GPU support, use a Docker container, and build from source.

To install the current release, which includes support for CUDA-enabled GPU cards (Ubuntu and Windows):

$ pip install tensorflow

Other devices (DirectX and MacOS-metal) are supported using Device plugins.

A smaller CPU-only package is also available:

$ pip install tensorflow-cpu

To update TensorFlow to the latest version, add --upgrade flag to the above commands.

Nightly binaries are available for testing using the tf-nightly and tf-nightly-cpu packages on PyPi.

Try your first TensorFlow program

$ python
>>> import tensorflow as tf
>>> tf.add(1, 2).numpy()
3
>>> hello = tf.constant('Hello, TensorFlow!')
>>> hello.numpy()
b'Hello, TensorFlow!'

For more examples, see the TensorFlow tutorials.

Contribution guidelines

If you want to contribute to TensorFlow, be sure to review the contribution guidelines. This project adheres to TensorFlow's code of conduct. By participating, you are expected to uphold this code.

We use GitHub issues for tracking requests and bugs, please see TensorFlow Forum for general questions and discussion, and please direct specific questions to Stack Overflow.

The TensorFlow project strives to abide by generally accepted best practices in open-source software development.

Patching guidelines

Follow these steps to patch a specific version of TensorFlow, for example, to apply fixes to bugs or security vulnerabilities:

  • Clone the TensorFlow repo and switch to the corresponding branch for your desired TensorFlow version, for example, branch r2.8 for version 2.8.
  • Apply (that is, cherry-pick) the desired changes and resolve any code conflicts.
  • Run TensorFlow tests and ensure they pass.
  • Build the TensorFlow pip package from source.

Continuous build status

You can find more community-supported platforms and configurations in the TensorFlow SIG Build community builds table.

Official Builds

Build Type Status Artifacts
Linux CPU Status PyPI
Linux GPU Status PyPI
Linux XLA Status TBA
macOS Status PyPI
Windows CPU Status PyPI
Windows GPU Status PyPI
Android Status Download
Raspberry Pi 0 and 1 Status Py3
Raspberry Pi 2 and 3 Status Py3
Libtensorflow MacOS CPU Status Temporarily Unavailable Nightly Binary Official GCS
Libtensorflow Linux CPU Status Temporarily Unavailable Nightly Binary Official GCS
Libtensorflow Linux GPU Status Temporarily Unavailable Nightly Binary Official GCS
Libtensorflow Windows CPU Status Temporarily Unavailable Nightly Binary Official GCS
Libtensorflow Windows GPU Status Temporarily Unavailable Nightly Binary Official GCS

Resources

Learn more about the TensorFlow community and how to contribute.

Courses

License

Apache License 2.0

adanet's People

Contributors

achoum avatar arkanath avatar brettkoonce avatar captainpete avatar chamorajg avatar csvillalta avatar cweill avatar eugenhotaj avatar eustomaqua avatar ghassenj avatar github30 avatar hanna-maz avatar lc0 avatar majestickhan avatar mihaimaruseac avatar mistobaan avatar sararob avatar scottyak avatar shawpan avatar smallyolk2024 avatar vlejd avatar ziy avatar zjost avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

adanet's Issues

[Question] Custom loss function

how to use a custom loss function for regression? An example would be best.
For example, loss function is

\sum_{i=1}^{n} -log(\Phi(y_i - f(x_i)))

Problem with adanet_objective.ipynb

Hi,

I wanted to run the adanet_objective.ipynb Notebook with Tensorflow 1.12.
This part throws an error:
`def ensemble_architecture(result):

architecture = result["architecture/adanet/ensembles"]

--> summary_proto = tf.summary.Summary.FromString(architecture) <-- Error here.

return summary_proto.value[0].tensor.string_val[0]`
There doesnt seem to be a "FromString" method. I found it in very old API documentations, but I couldn't find a replacement in the newer versions. Can somebody please tell me how to replace this line?

Thanks!

Using LSTM with AdaNet

I can easily understand why DNN and CNN apply well to AdaNet since all subnetworks have the same architecture and hyperparameters. However, I am curious if it is possible to apply LSTM to AdaNet. I cannot fully comprehend how to incorporate tf.nn.dtynamic_rnn(), because when we have multiple LSTM cells it will be as the following:

outputs, states = tf.nn.dynamic_rnn([cell1, cell2,..., cell_n], inputs = X, swap_memory = False, time_major = False, dtype = tf.float32)

Can you provide any insight about that? Thank you so much!

while running tutorials, TypeError: 'NoneType' object is not iterable

edit:
gpu version, with cuda9.0

Have I written custom code: no
OS Platform and Distribution : win10, x64
TensorFlow installed from (source or binary): pip binary
TensorFlow version (use command below): 1.8  (gpu version)
Cuda version: 9.0
Cudnn: 7.1
Python version: 3.5.2

when i run the two tutorials, it ended with the same issue:

INFO:tensorflow:Saving dict for global step 5000: accuracy = 0.8413, average_loss = 0.46480885, global_step = 5000, loss = 0.46422938

TypeError Traceback (most recent call last)
in ()
6 eval_spec=tf.estimator.EvalSpec(
7 input_fn=input_fn("test", training=False, batch_size=BATCH_SIZE),
----> 8 steps=None))

TypeError: 'NoneType' object is not iterable

then i tried the solution mentioned in Fix invalid UTF-8 encoding with Python 3 , but it stayed the same.

ps:
i intended to try python2, but there was no python2 v1.8 version for win platform.

Change dataset and caused NaN loss

Hi, while following the tutorial example in SimpleDnn, I changed the dataset which is after one-hot process, have 43 features, 11 labels. However I faced the problem shows 'Nan loss during training'.
I check the labels from 1-11 which doesn't including any zero. Does anyone have the same problem?
Besides, when adanet goes to second layer, the report becomes:
INFO: tensorflow: Report materialization [1000/??]
What can this problem be?

How-to serve

How-to question: I've been testing and learning with the "adanet_objective" sample. How do you use the recommended model to run predictions samples and eventually serve for live data feed?

May add multi_head support ?

For multi-objective learning, may use multi_head to define "logits", head used in adanet
to increment network construction,.
Can I used multi_head in adanet, or the support may be added in the future ?

Adanet in Keras gives ValueError: Tensor tensor_object is not found in checkpoint after warm-starting variable

  • Implemented adanet in Keras in building dense layers and setting up kernel_initializer.
  • Successfully trained, evaluated and the best_candidate moved onto next iteration in Iteration-1, but before moving towards next iteration "warm-starting variable" section is giving an error as below.

Error
ValueError: Tensor adanet/iteration_0/ensemble_t0_2_layer_dnn/weighted_subnetwork_0/subnetwork/dense_1/kernel is not found in /home/ec2-user/SageMaker/logs/keras_ada_logs/model.ckpt-500 checkpoint {'adanet/iteration_0/train_op/is_over/is_over_var_fn/is_over_var': [], 'adanet/iteration_0/step': [], 'adanet/iteration_0/ensemble_t0_2_layer_dnn/weighted_subnetwork_0/subnetwork/dense_3/kernel': [99, 50] . . . }

Error Stacktrace

INFO:tensorflow:Warm-starting from: ('/home/ec2-user/SageMaker/logs/keras_ada_logs/model.ckpt-500',)
INFO:tensorflow:Warm-starting variable: global_step; prev_var_name: Unchanged
INFO:tensorflow:Warm-starting variable: adanet/iteration_0/step; prev_var_name: Unchanged
INFO:tensorflow:Warm-starting variable: adanet/iteration_0/ensemble_t0_2_layer_dnn/weighted_subnetwork_0/subnetwork/dense_1/kernel; prev_var_name: Unchanged

ValueError Traceback (most recent call last)
in ()
1 #train and evaluate on default parameters
2
----> 3 results, _ = train_and_evaluate()

in train_and_evaluate(learn_mixture_weights, adanet_lambda)
51 steps=None)

---> 53 return tf.estimator.train_and_evaluate(AdaNetEstimator, train_spec, eval_spec)
~/anaconda3/envs/python3/lib/python3.6/site-packages/tensorflow/python/estimator/training.py in train_and_evaluate(estimator, train_spec, eval_spec)
469 '(with task id 0). Given task id {}'.format(config.task_id))

--> 471 return executor.run()
~/anaconda3/envs/python3/lib/python3.6/site-packages/tensorflow/python/estimator/training.py in run(self)
608 config.task_type != run_config_lib.TaskType.EVALUATOR):
609 logging.info('Running training and evaluation locally (non-distributed).')
--> 610 return self.run_local()
611
612 # Distributed case.

~/anaconda3/envs/python3/lib/python3.6/site-packages/tensorflow/python/estimator/training.py in run_local(self)
709 max_steps=self._train_spec.max_steps,
710 hooks=train_hooks,
--> 711 saving_listeners=saving_listeners)
712
713 eval_result = listener_for_eval.eval_result or _EvalResult(

~/anaconda3/envs/python3/lib/python3.6/site-packages/adanet/core/estimator.py in train(self, input_fn, hooks, steps, max_steps, saving_listeners)
499 # in order to use them when preparing the next iteration.
500 self._train_hooks = hooks or ()
--> 501 self._prepare_next_iteration(input_fn)
502
503 # This inner loop serves mainly for synchronizing the workers with the

~/anaconda3/envs/python3/lib/python3.6/site-packages/adanet/core/estimator.py in _prepare_next_iteration(self, train_input_fn)
618 params[self._Keys.INCREMENT_ITERATION] = True
619 self._call_adanet_model_fn(train_input_fn, tf.estimator.ModeKeys.TRAIN,
--> 620 params)
621
622 def _architecture_filename(self, iteration_number):

~/anaconda3/envs/python3/lib/python3.6/site-packages/adanet/core/estimator.py in _call_adanet_model_fn(self, input_fn, mode, params)
578 tf.train.get_or_create_global_step()
579 features, labels = input_fn()
--> 580 self._adanet_model_fn(features, labels, mode, params)
581
582 def _prepare_next_iteration(self, train_input_fn):

~/anaconda3/envs/python3/lib/python3.6/site-packages/adanet/core/estimator.py in _adanet_model_fn(self, features, labels, mode, params)
1102 if self._Keys.INCREMENT_ITERATION in params:
1103 latest_checkpoint = tf.train.latest_checkpoint(self.model_dir)
-> 1104 tf.train.warm_start(latest_checkpoint, vars_to_warm_start=[".*"])
1105 previous_ensemble_reports, all_reports = [], []
1106 if self._report_materializer:

~/anaconda3/envs/python3/lib/python3.6/site-packages/tensorflow/python/training/warm_starting_util.py in warm_start(ckpt_to_initialize_from, vars_to_warm_start, var_name_to_vocab_info, var_name_to_prev_var_name)
461 if len(variable) == 1:
462 variable = variable[0]
--> 463 _warm_start_var(variable, ckpt_to_initialize_from, prev_var_name)
464
465 prev_var_name_not_used = set(

~/anaconda3/envs/python3/lib/python3.6/site-packages/tensorflow/python/training/warm_starting_util.py in _warm_start_var(var, prev_ckpt, prev_tensor_name)
179 # Assume tensor name remains the same.
180 prev_tensor_name = current_var_name
--> 181 checkpoint_utils.init_from_checkpoint(prev_ckpt, {prev_tensor_name: var})

~/anaconda3/envs/python3/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py in init_from_checkpoint(ckpt_dir_or_file, assignment_map)
185 else:
186 distribution_strategy_context.get_tower_context().merge_call(
--> 187 _init_from_checkpoint, ckpt_dir_or_file, assignment_map)

~/anaconda3/envs/python3/lib/python3.6/site-packages/tensorflow/python/training/distribute.py in merge_call(self, merge_fn, *args, **kwargs)
1051 """
1052 require_tower_context(self)
-> 1053 return self._merge_call(merge_fn, *args, **kwargs)
1054
1055 def _merge_call(self, merge_fn, *args, **kwargs):

~/anaconda3/envs/python3/lib/python3.6/site-packages/tensorflow/python/training/distribute.py in _merge_call(self, merge_fn, *args, **kwargs)
1059 self._distribution_strategy))
1060 try:
-> 1061 return merge_fn(self._distribution_strategy, *args, **kwargs)
1062 finally:
1063 _pop_per_thread_mode()

~/anaconda3/envs/python3/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py in init_from_checkpoint(, ckpt_dir_or_file, assignment_map)
214 if tensor_name_in_ckpt not in variable_map:
215 raise ValueError("Tensor %s is not found in %s checkpoint %s" % (
--> 216 tensor_name_in_ckpt, ckpt_dir_or_file, variable_map
217 ))
218 if _is_variable(var):

Training takes > 1 day on Boston Housing example using 8 GPU machine

Using tf 1.9.0 and running example notebook. I think the problem is in the eval spec definition which has this code section:

eval_spec = tf.estimator.EvalSpec(
      input_fn=input_fn("test", training=False, batch_size=BATCH_SIZE),
      steps=None,
      start_delay_secs=1,
      throttle_secs=1,
  )

This seems to cause evaluation every 1 second, and lead to a ginormous tf.events file (>20 GB).

[Question]distrubute training

Hi,
I have used two GPUs for distributed training, using a custom subnet. and it works well when using single GPU to run. but there is an error when using Multiple GPUs by distribution = tf.contrib.distribute.MirroredStrategy()

2018-12-29 03:48:10.474563: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:0 with 14874 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:2d:00.0, compute capability: 7.0) 2018-12-29 03:48:10.474844: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:1 with 14874 MB memory) -> physical GPU (device: 1, name: Tesla V100-SXM2-16GB, pci bus id: 0000:32:00.0, compute capability: 7.0) INFO:tensorflow:A GPU is available on the machine, consider using NCHW data format for increased speed on GPU. INFO:tensorflow:Error reported to Coordinator: No variables to optimize. Traceback (most recent call last): File "/root/anaconda3/envs/tianxh_py36/lib/python3.6/site-packages/tensorflow/python/training/coordinator.py", line 297, in stop_on_exception yield File "/root/anaconda3/envs/tianxh_py36/lib/python3.6/site-packages/tensorflow/contrib/distribute/python/mirrored_strategy.py", line 797, in run self.main_result = self.main_fn(*self.main_args, **self.main_kwargs) File "/root/anaconda3/envs/tianxh_py36/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1195, in _call_model_fn model_fn_results = self._model_fn(features=features, **kwargs) File "/home/tianxh/AdaNet/adanet/core/estimator.py", line 1121, in _adanet_model_fn previous_ensemble_spec=previous_ensemble_spec) File "/home/tianxh/AdaNet/adanet/core/iteration.py", line 257, in build_iteration labels=labels) File "/home/tianxh/AdaNet/adanet/core/ensemble.py", line 526, in append_new_subnetwork previous_ensemble_spec=ensemble_spec) File "/home/tianxh/AdaNet/adanet/core/ensemble.py", line 663, in _build_ensemble_spec previous_ensemble=previous_ensemble)) File "tianxh_test.py", line 301, in build_subnetwork_train_op return optimizer.minimize(loss, var_list=var_list) File "/root/anaconda3/envs/tianxh_py36/lib/python3.6/site-packages/tensorflow/python/training/optimizer.py", line 400, in minimize grad_loss=grad_loss) File "/root/anaconda3/envs/tianxh_py36/lib/python3.6/site-packages/tensorflow/contrib/estimator/python/estimator/extenders.py", line 303, in compute_gradients return self._optimizer.compute_gradients(*args, **kwargs) File "/root/anaconda3/envs/tianxh_py36/lib/python3.6/site-packages/tensorflow/python/training/optimizer.py", line 513, in compute_gradients raise ValueError("No variables to optimize.")

it seems that the first thread is ok, the second thread can't get vars through tf.trainable_variables()
var_list comes from adanet/core/ensemble.py line495
var_list = tf.trainable_variables()

Could you please give some advice?
Thanks

Include as sample the paper examples

It would be interesting to include as samples all the use cases described in the paper.
That would help to make the paper results easy to reproduce.
Also would contribute some diverse and more complex examples.
The Criteo example where hyperparameter tuning has been run would be a good one to start with particularly because of the embedding step.

WARNING:tensorflow:Trapezoidal rule is known to produce incorrect PR-AUCs; please switch to "careful_interpolation" instead.

Hello,
I am using tf 1.11 and tft 0.11 with Adanet 0.4 (but also happened with 0.3). I am getting the warning:

WARNING:tensorflow:Trapezoidal rule is known to produce incorrect PR-AUCs; please switch to "careful_interpolation" instead.

A bit of search has taken me to:
https://stackoverflow.com/questions/50850258/tensorflow-estimator-switching-to-careful-interpolation-to-get-the-correct-pr-a/51288757

I have tried to implement this calling on the object obtained from adanet.Estimator but I receive an error saying the estimator object can not be directly called.

Any suggestions on how to solve the problem or avoid it in the first place?
I am using a modification of the adanet_objective example just changing it to use other data (the Criteo dataset) so, I don't really understand why the warning appears because it doesn't when I run the original example.

Combination of subnetworks

In Google's paper, "each unit in layer k of the subnetwork may have connections to existing units in layer k-1 if AdaNet", but the GIF shows differently. How do the subnetwoks combine exactly?

Saving model fails

Since the example code did not actually save the model, I added the code below to do so. However, it fails with "TypeError: Value passed to parameter 'dense_defaults' has DataType float64 not in list of allowed values: float32, int64, string". The available "allowed values" for x_in fail with various other issues. How does one actually save the model to SavedModel format?

x_in = tf.feature_column.numeric_column(FEATURES_KEY, dtype=tf.float64)
feature_columns = [x_in]
feature_spec = tf.feature_column.make_parse_example_spec(feature_columns)
export_input_fn = tf.estimator.export.build_parsing_serving_input_receiver_fn(feature_spec)

print("Exporting model")
estimator.export_saved_model(export_dir, export_input_fn)

Test failed

When I installed bazel, I executed the bazel test -c opt //... command and got a failed test.

external/com_google_protobuf/python: warning: directory does not exist.
FAIL: //adanet/core:estimator_test (shard 3 of 20) (see /home/zh/.cache/bazel/_bazel_zh/a7a1e19f4c4409a4f7aaae78d6eca6b2/execroot/org_adanet/bazel-out/k8-opt/testlogs/adanet/core/estimator_test/shard_3_of_20/test.log)
FAIL: //adanet/core:estimator_test (shard 4 of 20) (see /home/zh/.cache/bazel/_bazel_zh/a7a1e19f4c4409a4f7aaae78d6eca6b2/execroot/org_adanet/bazel-out/k8-opt/testlogs/adanet/core/estimator_test/shard_4_of_20/test.log)
FAIL: //adanet/core:estimator_test (shard 6 of 20) (see /home/zh/.cache/bazel/_bazel_zh/a7a1e19f4c4409a4f7aaae78d6eca6b2/execroot/org_adanet/bazel-out/k8-opt/testlogs/adanet/core/estimator_test/shard_6_of_20/test.log)
FAIL: //adanet/core:estimator_test (shard 12 of 20) (see /home/zh/.cache/bazel/_bazel_zh/a7a1e19f4c4409a4f7aaae78d6eca6b2/execroot/org_adanet/bazel-out/k8-opt/testlogs/adanet/core/estimator_test/shard_12_of_20/test.log)
FAIL: //adanet/core:estimator_test (shard 8 of 20) (see /home/zh/.cache/bazel/_bazel_zh/a7a1e19f4c4409a4f7aaae78d6eca6b2/execroot/org_adanet/bazel-out/k8-opt/testlogs/adanet/core/estimator_test/shard_8_of_20/test.log)
FAIL: //adanet/core:estimator_test (shard 14 of 20) (see /home/zh/.cache/bazel/_bazel_zh/a7a1e19f4c4409a4f7aaae78d6eca6b2/execroot/org_adanet/bazel-out/k8-opt/testlogs/adanet/core/estimator_test/shard_14_of_20/test.log)
FAIL: //adanet/core:estimator_test (shard 7 of 20) (see /home/zh/.cache/bazel/_bazel_zh/a7a1e19f4c4409a4f7aaae78d6eca6b2/execroot/org_adanet/bazel-out/k8-opt/testlogs/adanet/core/estimator_test/shard_7_of_20/test.log)
FAIL: //adanet/core:estimator_test (shard 5 of 20) (see /home/zh/.cache/bazel/_bazel_zh/a7a1e19f4c4409a4f7aaae78d6eca6b2/execroot/org_adanet/bazel-out/k8-opt/testlogs/adanet/core/estimator_test/shard_5_of_20/test.log)

FAILED: //adanet/core:estimator_test (Summary)
      /home/zh/.cache/bazel/_bazel_zh/a7a1e19f4c4409a4f7aaae78d6eca6b2/execroot/org_adanet/bazel-out/k8-opt/testlogs/adanet/core/estimator_test/shard_3_of_20/test.log
      /home/zh/.cache/bazel/_bazel_zh/a7a1e19f4c4409a4f7aaae78d6eca6b2/execroot/org_adanet/bazel-out/k8-opt/testlogs/adanet/core/estimator_test/shard_4_of_20/test.log
      /home/zh/.cache/bazel/_bazel_zh/a7a1e19f4c4409a4f7aaae78d6eca6b2/execroot/org_adanet/bazel-out/k8-opt/testlogs/adanet/core/estimator_test/shard_6_of_20/test.log
      /home/zh/.cache/bazel/_bazel_zh/a7a1e19f4c4409a4f7aaae78d6eca6b2/execroot/org_adanet/bazel-out/k8-opt/testlogs/adanet/core/estimator_test/shard_12_of_20/test.log
      /home/zh/.cache/bazel/_bazel_zh/a7a1e19f4c4409a4f7aaae78d6eca6b2/execroot/org_adanet/bazel-out/k8-opt/testlogs/adanet/core/estimator_test/shard_8_of_20/test.log
      /home/zh/.cache/bazel/_bazel_zh/a7a1e19f4c4409a4f7aaae78d6eca6b2/execroot/org_adanet/bazel-out/k8-opt/testlogs/adanet/core/estimator_test/shard_14_of_20/test.log
      /home/zh/.cache/bazel/_bazel_zh/a7a1e19f4c4409a4f7aaae78d6eca6b2/execroot/org_adanet/bazel-out/k8-opt/testlogs/adanet/core/estimator_test/shard_7_of_20/test.log

My tensorflow version is 1.7,What is the problem?

[Question] about nasnet-a used as subnetwork in adanet

@cweill
hi,Weill,
I saw an article on the Google ai blog:https://ai.googleblog.com/2018/10/introducing-adanet-fast-and-flexible.html
The article mentions using nasnet-a as subnetwork after 8 adanet iteration and get the error-rate of 2.3% on cifar-10
and with fewer parameters at the same time. I would like to ask two questions.
1.do you use the entire nasnet-a architecture network(for exapmle, N=6 and F=32) as subnetwork, or use the normal cell as subnetwork or something else.
2. how do the subnetworks combine to each other.

Thanks !

Failed to get device properties, error code: 30

When I use the GTX 1050 Ti to train the adanet,the error appers like this:

2018-12-27 13:21:33.808781: E tensorflow/core/grappler/clusters/utils.cc:83] Failed to get device properties, error code: 30

Here are my code:

import adanet
import tensorflow as tf
import data_input
import os

os.environ["CUDA_VISIBLE_DEVICES"] = "0"

feature_columns = [tf.feature_column.numeric_column('features', shape = [224, 224, 3])]

head = tf.contrib.estimator.binary_classification_head()

estimator = adanet.AutoEnsembleEstimator(
head=head,
candidate_pool=[
tf.contrib.estimator.LinearEstimator(
head=head,
feature_columns=feature_columns,
optimizer=tf.train.AdamOptimizer(0.01)),
tf.contrib.estimator.DNNEstimator(
head=head,
feature_columns=feature_columns,
optimizer=tf.train.AdamOptimizer(0.01),
hidden_units=[1000, 500, 100]),
tf.contrib.estimator.DNNEstimator(
head=head,
feature_columns=feature_columns,
optimizer=tf.train.AdamOptimizer(0.01),
hidden_units=[500])
],
max_iteration_steps=1000,
model_dir=r"C:\Users\12420\Desktop\Chest X-Ray Automatic Identification\result"
)

estimator.train(input_fn=data_input.input_fn('training', 5), max_steps= 10000)

How can I solve this problem?I‘d appreciate much if you can help me with this!

[Question] How to retrieve the detailed ensemble architecture?

Hi,
I'd like to retrieve, during the evaluation the detailed architecture of the ensemble and its sub network.

Using

def ensemble_architecture(result):
  """Extracts the ensemble architecture from evaluation results."""

  architecture = result["architecture/adanet/ensembles"]
  # The architecture is a serialized Summary proto for TensorBoard.
  summary_proto = tf.summary.Summary.FromString(architecture)
  return summary_proto.value[0].tensor.string_val[0]

It gives some hints like: Architecture: b"| b'simple_cnn' | b'simple_cnn' |" but that's not that detailed.
I'd like to have something like model.summary() with keras.

Is there a way to get the detailed architecture ?

Thanks !

Keras API

As mentioned in discussion on Hacker News, native support for Keras's APIs (including layer support) would make implementation of adanet a lot easier for Keras-based projects (either via tf.keras or external keras).

TPU support for Estimator

Hi

Is it possible to write AdaNet programs to utilize TPU? Are there corresponding AdaNet APIs?

Thanks for any help!

[Question] How to decide the value of "complexity" param in Subnetwork ?

The paper say: "standard deviations of the outputs of the last hidden layer on the training data as surrogate for Rademacher complexities",
And customizing_adanet.ipynb example use tf.constant(1), is the value choose is robust to the conclusion ?
Or should set the value by standard deviations ?

use the demo code train a new dataset (got some problems )

I got some problems with AdaNet API,when I use the customizing_adanet.ipynb in the
adanet/adanet/examples/tutorials/ to train a new dataset, I got some error like this:

InvalidArgumentError: ValueError: generator yielded an element of shape (150, 150) where an element of shape () was expected.
Traceback (most recent call last):

File "D:\Program Files\Anaconda3-5.2\lib\site-packages\tensorflow\python\ops\script_ops.py", line 206, in call
ret = func(*args)

File "D:\Program Files\Anaconda3-5.2\lib\site-packages\tensorflow\python\data\ops\dataset_ops.py", line 454, in generator_py_func
"of shape %s was expected." % (ret_array.shape, expected_shape))

ValueError: generator yielded an element of shape (150, 150) where an element of shape () was expected.

[[{{node PyFunc}} = PyFuncTin=[DT_INT64], Tout=[DT_FLOAT, DT_INT32], token="pyfunc_184", _device="/device:CPU:0"]]
[[{{node IteratorGetNext}} = IteratorGetNextoutput_shapes=[[?,150,150,1], [?]], output_types=[DT_FLOAT, DT_INT32], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

 The old data is fashion_mnist, whose image is 28x28, my dataset is 150x150, I've tried lots of ways to adjust the parameters or change the demo code, but it still not works, that's so disappointing!How can I make it?I’d appreciate much if who can do me a favor !

error when running tutorial examples with adanet

I tried to run tutorial examples with adanet, it ended with following error:

File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/coordinator.py", line 297, in stop_on_exception
yield
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/distribute/python/mirrored_strategy.py", line 795, in run
self.main_result = self.main_fn(*self.main_args, **self.main_kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 1195, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/adanet/core/estimator.py", line 1109, in _adanet_model_fn
previous_ensemble_spec=previous_ensemble_spec)
File "/usr/local/lib/python2.7/dist-packages/adanet/core/iteration.py", line 256, in build_iteration
labels=labels)
File "/usr/local/lib/python2.7/dist-packages/adanet/core/ensemble.py", line 515, in append_new_subnetwork
previous_ensemble_spec=ensemble_spec)
File "/usr/local/lib/python2.7/dist-packages/adanet/core/ensemble.py", line 558, in _build_ensemble_spec
summary))
File "/usr/local/lib/python2.7/dist-packages/adanet/core/ensemble.py", line 875, in _adanet_weighted_ensemble_logits
weighted_subnetworks, bias, summary)
File "/usr/local/lib/python2.7/dist-packages/adanet/core/ensemble.py", line 911, in _adanet_weighted_ensemble_logits_helper
ensemble_complexity_regularization)
File "/usr/local/lib/python2.7/dist-packages/adanet/core/tpu_estimator.py", line 43, in _fn
return fn(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/adanet/core/summary.py", line 272, in scalar
collections=[self._TMP_COLLECTION_NAME])
File "/usr/local/lib/python2.7/dist-packages/adanet/core/tpu_estimator.py", line 43, in _fn
return fn(*args, **kwargs)
TypeError: scalar() got an unexpected keyword argument 'collections'

May I have your help? Thanks in advance!

Add distributed candidate evaluation support

Hello,
I am running Adanet 0.5.0 in GCP with Runtime version 1.10.
I am using a CPU configuration with multiple nodes.
The training phase is very fast but it gets totally slowed down by the evaluations. The evaluation don't seem to take advantage of the multiple nodes and the logs are flooded with "Waiting for chief to finish" messages coming from the workers and generated by Adanet Estimator.
I think support for evaluation phase to use the multiple nodes should be added and that be a priority change as not only the nodes are not used, you also keep paying for them.
Is that feasible?
Thanks in advance
Jose

[Announcement] NeurIPS 2018 meetup

If you're attending NeurIPS '18 in Montreal this week and want to meet up, I'll be here until Saturday after the workshops. I'm also hosting an informal AutoML meetup at the NeurIPS Google AI booth at 1pm Thursday, December 5th 2018.

I'm happy to learn about your ideas, use-cases, and feedback.

[Question] How to add custom evaluation metrics?

How to add custom metrics in Adanet? tensorflow api gives the following error

def custom_metrics(features, labels, predictions):
    return {
        'customMetric': 0
    }
estimator = get_adanet_model()
estimator = tf.contrib.estimator.add_metrics(estimator, custom_metrics)

Error:

UserWarning: The adanet.Estimator's model_fn should not be called directly in TRAIN mode, because its behavior is undefined outside the context of its train method.

Adanet problem

I had a problem when I ran this code

TypeError Traceback (most recent call last)
in ()
----> 1 results, _ = train_and_evaluate()
2 print("Loss:", results["average_loss"])
3 print("Architecture:", ensemble_architecture(results))

in train_and_evaluate(learn_mixture_weights, adanet_lambda)
24 optimizer=tf.train.RMSPropOptimizer(learning_rate= 0.001),
25 learn_mixture_weights=learn_mixture_weights,
---> 26 seed=42),
27
28 # Lambda is a the strength of complexity regularization. A larger

TypeError: object() takes no parameters

MS coco data set for adanet

Hi,
one did any tried to train an MS coco or other real datasets with this network.
Just interested to see if this can actually solve such a problem.

ImportError: cannot import name 'report_pb2'

I'm using Winpython 3.6, bazel test -c opt returns no error, I can install the pip package.

But when I try to import adanet I get "ImportError: cannot import name 'report_pb2'"

Full error message is:

Traceback (most recent call last):
File "", line 1, in
File "C:\Users\Michael\Desktop\WinPython3-6GPU\examples\Sonstiges\adanet\adanet_init_.py", line 22, in
from adanet.core import Ensemble
File "C:\Users\Michael\Desktop\WinPython3-6GPU\examples\Sonstiges\adanet\adanet\core_init_.py", line 25, in
from adanet.core.estimator import Estimator
File "C:\Users\Michael\Desktop\WinPython3-6GPU\examples\Sonstiges\adanet\adanet\core\estimator.py", line 33, in
from adanet.core.report_accessor import _ReportAccessor
File "C:\Users\Michael\Desktop\WinPython3-6GPU\examples\Sonstiges\adanet\adanet\core\report_accessor.py", line 24, in
from adanet.core import report_pb2 as report_proto
ImportError: cannot import name 'report_pb2'

Any idea why?

bazel build not working

bazel test -c opt //...
ERROR: $HOME/.cache/bazel/_bazel_fmilo/9c8a25ad1f3a0959bd40f01be01ceda5/external/com_google_protobuf/BUILD:645:1: no such target '//external:python_headers': target 'python_headers' not declared in package 'external' defined by $HOME/workspace/adanet/WORKSPACE and referenced by '@com_google_protobuf//:python/google/protobuf/internal/_api_implementation.so'
ERROR: Analysis of target '//adanet/core:report_accessor' failed; build aborted: Analysis failed
INFO: Elapsed time: 0.293s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded)
FAILED: Build did NOT complete successfully (0 packages loaded)

Ubuntu 14.04

bazel version
Build label: 0.18.1rc4
Build target: bazel-out/k8-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Mon Oct 29 09:54:52 2018 (1540806892)
Build timestamp: 1540806892
Build timestamp as int: 1540806892

ValueError: Tensor adanet/iteration_0/ensemble_t0_3_self_built_cnn/weighted_subnetwork_0/subnetwork/batch_normalization_2/moving_mean is not found in /data4/data/adanet/modelZoo/2018_12_10_11_23/model.ckpt-7500 checkpoint

Hello,
I am using tf 1.12.0,Python 2.7.15,I run my own built subnetwork under the fashion mnist dataset,but I am getting the warning:

Traceback (most recent call last):
File "class_builtCNN.py", line 39, in
steps=None))
File "/home/guost/anaconda2/lib/python2.7/site-packages/tensorflow/python/estimator/training.py", line 471, in train_and_evaluate
return executor.run()
File "/home/guost/anaconda2/lib/python2.7/site-packages/tensorflow/python/estimator/training.py", line 610, in run
return self.run_local()
File "/home/guost/anaconda2/lib/python2.7/site-packages/tensorflow/python/estimator/training.py", line 711, in run_local
saving_listeners=saving_listeners)
File "/home/guost/Adanet/Newversion/1206/adanet-master/adanet/core/estimator.py", line 499, in train
self._prepare_next_iteration(input_fn)
File "/home/guost/Adanet/Newversion/1206/adanet-master/adanet/core/estimator.py", line 618, in _prepare_next_iteration
params)
File "/home/guost/Adanet/Newversion/1206/adanet-master/adanet/core/estimator.py", line 578, in _call_adanet_model_fn
self._adanet_model_fn(features, labels, mode, params)
File "/home/guost/Adanet/Newversion/1206/adanet-master/adanet/core/estimator.py", line 1093, in _adanet_model_fn
tf.train.warm_start(latest_checkpoint, vars_to_warm_start=[".*"])
File "/home/guost/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/warm_starting_util.py", line 463, in warm_start
_warm_start_var(variable, ckpt_to_initialize_from, prev_var_name)
File "/home/guost/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/warm_starting_util.py", line 181, in _warm_start_var
checkpoint_utils.init_from_checkpoint(prev_ckpt, {prev_tensor_name: var})
File "/home/guost/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/checkpoint_utils.py", line 187, in init_from_checkpoint
_init_from_checkpoint, ckpt_dir_or_file, assignment_map)
File "/home/guost/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/distribute.py", line 1053, in merge_call
return self._merge_call(merge_fn, *args, **kwargs)
File "/home/guost/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/distribute.py", line 1061, in _merge_call
return merge_fn(self._distribution_strategy, *args, **kwargs)
File "/home/guost/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/checkpoint_utils.py", line 216, in _init_from_checkpoint
tensor_name_in_ckpt, ckpt_dir_or_file, variable_map
ValueError: Tensor adanet/iteration_0/ensemble_t0_3_self_built_cnn/weighted_subnetwork_0/subnetwork/batch_normalization_2/moving_mean is not found in /data4/data/adanet/modelZoo/2018_12_10_11_23/model.ckpt-7500 checkpoint

Any suggestions on how to solve the problem or avoid it in the first place?
Thank you so much
@cweill

`from adanet.examples import simple_dnn` raises exception

Stack-trace when installing from pypi. See #3.

In customizing_adanet notebook, when doing !pip install adanet and then from adanet.examples import simple_dnn

ImportErrorTraceback (most recent call last)
<ipython-input-2-ef5cc5f68dee> in <module>()
      6 
      7 import adanet
----> 8 from adanet.examples import simple_dnn
      9 import tensorflow as tf
     10 

ImportError: No module named examples

Customizing_adanet tutorial error

In customizing_adanet notebook, When I ran the Customizing_adanet tutorial error notebook, I got the following error at the end:

TypeError                                 Traceback (most recent call last)
<ipython-input-10-9f72d27cf659> in <module>
     26     eval_spec=tf.estimator.EvalSpec(
     27         input_fn=input_fn("test", training=False, batch_size=BATCH_SIZE),
---> 28         steps=None))
     29 print("Accuracy:", results["accuracy"])
     30 print("Loss:", results["average_loss"])

~/anaconda3/envs/Adanet/lib/python3.5/site-packages/tensorflow/python/estimator/training.py in train_and_evaluate(estimator, train_spec, eval_spec)
    445         '(with task id 0).  Given task id {}'.format(config.task_id))
    446 
--> 447   return executor.run()
    448 
    449 

~/anaconda3/envs/Adanet/lib/python3.5/site-packages/tensorflow/python/estimator/training.py in run(self)
    529         config.task_type != run_config_lib.TaskType.EVALUATOR):
    530       logging.info('Running training and evaluation locally (non-distributed).')
--> 531       return self.run_local()
    532 
    533     # Distributed case.

~/anaconda3/envs/Adanet/lib/python3.5/site-packages/tensorflow/python/estimator/training.py in run_local(self)
    667           input_fn=self._train_spec.input_fn,
    668           max_steps=self._train_spec.max_steps,
--> 669           hooks=train_hooks)
    670 
    671       if not self._continuous_eval_listener.before_eval():

~/anaconda3/envs/Adanet/lib/python3.5/site-packages/adanet/core/estimator.py in train(self, input_fn, hooks, steps, max_steps, saving_listeners)
    440             hooks=hooks,
    441             max_steps=max_steps,
--> 442             saving_listeners=saving_listeners)
    443 
    444         # If training ended because the maximum number of training steps

~/anaconda3/envs/Adanet/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py in train(self, input_fn, hooks, steps, max_steps, saving_listeners)
    364 
    365       saving_listeners = _check_listeners_type(saving_listeners)
--> 366       loss = self._train_model(input_fn, hooks, saving_listeners)
    367       logging.info('Loss for final step: %s.', loss)
    368       return self

~/anaconda3/envs/Adanet/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py in _train_model(self, input_fn, hooks, saving_listeners)
   1117       return self._train_model_distributed(input_fn, hooks, saving_listeners)
   1118     else:
-> 1119       return self._train_model_default(input_fn, hooks, saving_listeners)
   1120 
   1121   def _train_model_default(self, input_fn, hooks, saving_listeners):

~/anaconda3/envs/Adanet/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py in _train_model_default(self, input_fn, hooks, saving_listeners)
   1130       worker_hooks.extend(input_hooks)
   1131       estimator_spec = self._call_model_fn(
-> 1132           features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
   1133       return self._train_with_estimator_spec(estimator_spec, worker_hooks,
   1134                                              hooks, global_step_tensor,

~/anaconda3/envs/Adanet/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py in _call_model_fn(self, features, labels, mode, config)
   1105 
   1106     logging.info('Calling model_fn.')
-> 1107     model_fn_results = self._model_fn(features=features, **kwargs)
   1108     logging.info('Done calling model_fn.')
   1109 

~/anaconda3/envs/Adanet/lib/python3.5/site-packages/adanet/core/estimator.py in _model_fn(self, features, labels, mode, params)
   1044           mode=mode,
   1045           previous_ensemble_summary=previous_ensemble_summary,
-> 1046           previous_ensemble_spec=previous_ensemble_spec)
   1047 
   1048     # Variable which allows us to read the current iteration from a checkpoint.

~/anaconda3/envs/Adanet/lib/python3.5/site-packages/adanet/core/iteration.py in build_iteration(self, iteration_number, subnetwork_builders, features, mode, labels, previous_ensemble_summary, previous_ensemble_spec)
    213             mode=mode,
    214             iteration_step=iteration_step_tensor,
--> 215             labels=labels)
    216         candidate = self._candidate_builder.build_candidate(
    217             ensemble_spec=ensemble_spec,

~/anaconda3/envs/Adanet/lib/python3.5/site-packages/adanet/core/ensemble.py in append_new_subnetwork(self, ensemble_spec, subnetwork_builder, iteration_step, summary, features, mode, labels)
    321               iteration_step=iteration_step,
    322               summary=summary,
--> 323               previous_ensemble=ensemble)
    324           trainable_vars_after = tf.trainable_variables()
    325           var_list = list(

<ipython-input-8-13ccd77121f3> in build_subnetwork(self, features, logits_dimension, training, iteration_step, summary, previous_ensemble)
     25                        previous_ensemble=None):
     26     """See `adanet.subnetwork.Builder`."""
---> 27     images = features.values()[0]
     28     kernel_initializer = tf.keras.initializers.he_normal(seed=self._seed)
     29     x = tf.layers.conv2d(

TypeError: 'dict_values' object does not support indexing

This is your latest code, how should I solve this problem?

Colaboratory Notebooks for the tutorials

I noticed that for the Jupyter Notebook tutorials, there wasn't any corresponding Colaboratory notebook. It would be nice to have them for quickly getting started and for beginners, since setting up the notebooks locally require a lot of steps (install cuda, update protoc, update protobuf, etc.)

Would it be reasonable to add them? If so, I'd be happy to start a PR, if it's logistically possible.

Thank you!

Retrieving the best performed model from the ensemble?

How to separate the best performed model from the ensemble and save separately. Using tensorboard we can visualise the metric and find out the name of the based performed subnetwork. How can we save the graph of that subnetwork and it's corresponding weights.

Multi-output regression

Hi guys, thanks a lot for the cool piece of work!
I'd like to apply adanet on a multi-output regression problem.
I started from the tutorial 1 trying to apply as few changes as possible.
My understanding is that I probably just need to modify the estimator head.
Unfortunately just changing the label_dimension parameter does not work.

 estimator = adanet.Estimator(
      head=tf.contrib.estimator.regression_head(
          label_dimension=LABEL_DIMENSION,
          loss_reduction=tf.losses.Reduction.SUM_OVER_BATCH_SIZE), ...

I get that on the very first iteration the model diverges:
ERROR:tensorflow:Model diverged with loss = NaN.

Do you have any clue?

error stack:

WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmpsJKglT
INFO:tensorflow:Using config: {'_save_checkpoints_secs': None, '_num_ps_replicas': 0, '_keep_checkpoint_max': 5, '_task_type': 'worker', '_global_id_in_cluster': 0, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f50ea95ed50>, '_model_dir': '/tmp/tmpsJKglT', '_protocol': None, '_save_checkpoints_steps': 50000, '_keep_checkpoint_every_n_hours': 10000, '_service': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_tf_random_seed': 42, '_save_summary_steps': 50000, '_device_fn': None, '_experimental_distribute': None, '_num_worker_replicas': 1, '_task_id': 0, '_log_step_count_steps': 100, '_evaluation_master': '', '_eval_distribute': None, '_train_distribute': None, '_master': ''}
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps 50000 or save_checkpoints_secs None.
INFO:tensorflow:Beginning training AdaNet iteration 0
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
2018-11-12 12:56:19.318101: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmpsJKglT/model.ckpt.
ERROR:tensorflow:Model diverged with loss = NaN.
Traceback (most recent call last):
  File "test_adanet.py", line 277, in <module>
    results, _ = train_and_evaluate()
  File "test_adanet.py", line 265, in train_and_evaluate
    return tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
  File "/opt/miniconda2/envs/python-2.7.10/lib/python2.7/site-packages/tensorflow/python/estimator/training.py", line 471, in train_and_evaluate
    return executor.run()
  File "/opt/miniconda2/envs/python-2.7.10/lib/python2.7/site-packages/tensorflow/python/estimator/training.py", line 610, in run
    return self.run_local()
  File "/opt/miniconda2/envs/python-2.7.10/lib/python2.7/site-packages/tensorflow/python/estimator/training.py", line 711, in run_local
    saving_listeners=saving_listeners)
  File "/opt/miniconda2/envs/python-2.7.10/lib/python2.7/site-packages/adanet/core/estimator.py", line 447, in train
    saving_listeners=saving_listeners)
  File "/opt/miniconda2/envs/python-2.7.10/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 354, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/opt/miniconda2/envs/python-2.7.10/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 1207, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/opt/miniconda2/envs/python-2.7.10/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 1241, in _train_model_default
    saving_listeners)
  File "/opt/miniconda2/envs/python-2.7.10/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 1471, in _train_with_estimator_spec
    _, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])
  File "/opt/miniconda2/envs/python-2.7.10/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 671, in run
    run_metadata=run_metadata)
  File "/opt/miniconda2/envs/python-2.7.10/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1156, in run
    run_metadata=run_metadata)
  File "/opt/miniconda2/envs/python-2.7.10/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1255, in run
    raise six.reraise(*original_exc_info)
  File "/opt/miniconda2/envs/python-2.7.10/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1240, in run
    return self._sess.run(*args, **kwargs)
  File "/opt/miniconda2/envs/python-2.7.10/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1320, in run
    run_metadata=run_metadata))
  File "/opt/miniconda2/envs/python-2.7.10/lib/python2.7/site-packages/tensorflow/python/training/basic_session_run_hooks.py", line 753, in after_run
    raise NanLossDuringTrainingError
tensorflow.python.training.basic_session_run_hooks.NanLossDuringTrainingError: NaN loss during training.

Adanet_object model not saved

Well I was able to complete the execution of the "Adanet_object", but i dont see the model saved.
Where is the model/architecture saved ?

Test error when running tutorial examples

After pip install adanet, I went to run tutorial examples, I got following error, what should I do to solve this problem?

WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmpp5sv1icl
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpp5sv1icl', '_tf_random_seed': 42, '_save_summary_steps': 50000, '_save_checkpoints_steps': 50000, '_save_checkpoints_secs': None, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x2b8920228208>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after 600 secs (eval_spec.throttle_secs) or training is finished.
INFO:tensorflow:Beginning training AdaNet iteration 0
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.


InternalError Traceback (most recent call last)
in ()
65
66
---> 67 results, _ = train_and_evaluate()
68 print("Loss:", results["average_loss"])
69 print("Architecture:", ensemble_architecture(results))

in train_and_evaluate(learn_mixture_weights, adanet_lambda)
53 input_fn=input_fn("test", training=False, batch_size=BATCH_SIZE),
54 steps=None)
---> 55 return tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
56
57

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/training.py in train_and_evaluate(estimator, train_spec, eval_spec)
445 '(with task id 0). Given task id {}'.format(config.task_id))
446
--> 447 return executor.run()
448
449

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/training.py in run(self)
529 config.task_type != run_config_lib.TaskType.EVALUATOR):
530 logging.info('Running training and evaluation locally (non-distributed).')
--> 531 return self.run_local()
532
533 # Distributed case.

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/training.py in run_local(self)
667 input_fn=self._train_spec.input_fn,
668 max_steps=self._train_spec.max_steps,
--> 669 hooks=train_hooks)
670
671 if not self._continuous_eval_listener.before_eval():

~/anaconda3/lib/python3.6/site-packages/adanet/core/estimator.py in train(self, input_fn, hooks, steps, max_steps, saving_listeners)
440 hooks=hooks,
441 max_steps=max_steps,
--> 442 saving_listeners=saving_listeners)
443
444 # If training ended because the maximum number of training steps

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py in train(self, input_fn, hooks, steps, max_steps, saving_listeners)
364
365 saving_listeners = _check_listeners_type(saving_listeners)
--> 366 loss = self._train_model(input_fn, hooks, saving_listeners)
367 logging.info('Loss for final step: %s.', loss)
368 return self

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py in _train_model(self, input_fn, hooks, saving_listeners)
1117 return self._train_model_distributed(input_fn, hooks, saving_listeners)
1118 else:
-> 1119 return self._train_model_default(input_fn, hooks, saving_listeners)
1120
1121 def _train_model_default(self, input_fn, hooks, saving_listeners):

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py in _train_model_default(self, input_fn, hooks, saving_listeners)
1133 return self._train_with_estimator_spec(estimator_spec, worker_hooks,
1134 hooks, global_step_tensor,
-> 1135 saving_listeners)
1136
1137 def _train_model_distributed(self, input_fn, hooks, saving_listeners):

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py in _train_with_estimator_spec(self, estimator_spec, worker_hooks, hooks, global_step_tensor, saving_listeners)
1331 save_summaries_steps=self._config.save_summary_steps,
1332 config=self._session_config,
-> 1333 log_step_count_steps=self._config.log_step_count_steps) as mon_sess:
1334 loss = None
1335 while not mon_sess.should_stop():

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py in MonitoredTrainingSession(master, is_chief, checkpoint_dir, scaffold, hooks, chief_only_hooks, save_checkpoint_secs, save_summaries_steps, save_summaries_secs, config, stop_grace_period_secs, log_step_count_steps, max_wait_secs, save_checkpoint_steps)
413 all_hooks.extend(hooks)
414 return MonitoredSession(session_creator=session_creator, hooks=all_hooks,
--> 415 stop_grace_period_secs=stop_grace_period_secs)
416
417

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py in init(self, session_creator, hooks, stop_grace_period_secs)
824 super(MonitoredSession, self).init(
825 session_creator, hooks, should_recover=True,
--> 826 stop_grace_period_secs=stop_grace_period_secs)
827
828

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py in init(self, session_creator, hooks, should_recover, stop_grace_period_secs)
547 stop_grace_period_secs=stop_grace_period_secs)
548 if should_recover:
--> 549 self._sess = _RecoverableSession(self._coordinated_creator)
550 else:
551 self._sess = self._coordinated_creator.create_session()

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py in init(self, sess_creator)
1010 """
1011 self._sess_creator = sess_creator
-> 1012 _WrappedSession.init(self, self._create_session())
1013
1014 def _create_session(self):

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py in _create_session(self)
1015 while True:
1016 try:
-> 1017 return self._sess_creator.create_session()
1018 except _PREEMPTION_ERRORS as e:
1019 logging.info('An error was raised while a session was being created. '

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py in create_session(self)
704 """Creates a coordinated session."""
705 # Keep the tf_sess for unit testing.
--> 706 self.tf_sess = self._session_creator.create_session()
707 # We don't want coordinator to suppress any exception.
708 self.coord = coordinator.Coordinator(clean_stop_exception_types=[])

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py in create_session(self)
475 init_op=self._scaffold.init_op,
476 init_feed_dict=self._scaffold.init_feed_dict,
--> 477 init_fn=self._scaffold.init_fn)
478
479

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/session_manager.py in prepare_session(self, master, init_op, saver, checkpoint_dir, checkpoint_filename_with_path, wait_for_checkpoint, max_wait_secs, config, init_feed_dict, init_fn)
279 wait_for_checkpoint=wait_for_checkpoint,
280 max_wait_secs=max_wait_secs,
--> 281 config=config)
282 if not is_loaded_from_checkpoint:
283 if init_op is None and not init_fn and self._local_init_op is None:

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/session_manager.py in _restore_checkpoint(self, master, saver, checkpoint_dir, checkpoint_filename_with_path, wait_for_checkpoint, max_wait_secs, config)
182 """
183 self._target = master
--> 184 sess = session.Session(self._target, graph=self._graph, config=config)
185
186 if checkpoint_dir and checkpoint_filename_with_path:

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py in init(self, target, graph, config)
1561
1562 """
-> 1563 super(Session, self).init(target, graph, config=config)
1564 # NOTE(mrry): Create these on first __enter__ to avoid a reference cycle.
1565 self._default_graph_context_manager = None

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py in init(self, target, graph, config)
631 if self._created_with_new_api:
632 # pylint: disable=protected-access
--> 633 self._session = tf_session.TF_NewSession(self._graph._c_graph, opts)
634 # pylint: enable=protected-access
635 else:

InternalError: Failed to create session.

What scene of use example this project need , cv nlp or others ?

Compare with other projects, such as Auto-Keras, they enrich their scene example such as text classifier
image classifier and so on,
Are there some prefer scenes, AdaNet need to enrich ?
Because, AdaNet may add many columns in "vertical" which may make network widen, compare with tableau dataset, cv problem may use many deep structure such as many ResNet connections,
this may induce model construct "direction" not confirm with intuition in some specific scene.
So may this make some trouble ? or have some suggestions for deep construction making ?

adanet for object detection

Hi, thank you for sharing this brilliant automl tool!I wish I could implement your code, but in most industry scene tasks are complicated and will not be able to be solved by classification.

When would you provide adanet for object detection task? Or do you have the plan?

I find out that most popular automl tools are not capable to apply on task like object detection, semantic segmantation, such as autokeras. Could you please tell me the most significant technical barrier to search networks architectures like faster rcnn?

I would appreciate your reply, thanks.

Error running adanet_objective on Windows 7 (Japanese) with Python 3

I try to run example on Windows 7 (Japanese) .
https://github.com/tensorflow/adanet/blob/master/adanet/examples/tutorials/adanet_objective.ipynb
And this error occured.

INFO:tensorflow:Encountered end of input during report materialization
INFO:tensorflow:Materialized subnetwork_reports.
Traceback (most recent call last):
File "", line 1, in
File "", line 43, in train_and_evaluate
File "C:\Users\XXXXX\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\estimator\training.py", line 471, in train_and_evaluate
return executor.run()
File "C:\Users\XXXXX\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\estimator\training.py", line 610, in run
return self.run_local()
File "C:\Users\XXXXX\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\estimator\training.py", line 711, in run_local
saving_listeners=saving_listeners)
File "C:\Anaconda3_5_2\lib\site-packages\adanet\core\estimator.py", line 461,in train
self._prepare_next_iteration()
File "C:\Anaconda3_5_2\lib\site-packages\adanet\core\estimator.py", line 569,in _prepare_next_iteration
tf.estimator.ModeKeys.EVAL, params)
File "C:\Anaconda3_5_2\lib\site-packages\adanet\core\estimator.py", line 541,in _call_adanet_model_fn
self._model_fn(features, labels, mode, params)
File "C:\Anaconda3_5_2\lib\site-packages\adanet\core\estimator.py", line 1083, in _model_fn
self._materialize_report(current_iteration)
File "C:\Anaconda3_5_2\lib\site-packages\adanet\core\estimator.py", line 752,in _materialize_report
materialized_reports)
File "C:\Anaconda3_5_2\lib\site-packages\adanet\core\report_accessor.py", line 190, in write_iteration_report
materialized_reports),
File "C:\Anaconda3_5_2\lib\site-packages\adanet\core\report_accessor.py", line 150, in _create_iteration_report_pb
iteration_report_pb.subnetwork_reports.extend(subnetwork_report_pb_list)
File "C:\Anaconda3_5_2\lib\site-packages\adanet\core\report_accessor.py", line 137, in _create_subnetwork_report_proto
dictionary=materialized_subnetwork_report.metrics)
File "C:\Anaconda3_5_2\lib\site-packages\adanet\core\report_accessor.py", line 114, in _update_proto_map_from_dict
field[key].string_value = six.u(value)
ValueError: b"\n\x83\x01\n;adanet/iteration_2/ensemble_2_layer_dnn/architecture/adanetB:\x08\x07\x12\x00B4| b'1_layer_dnn' | b'2_layer_dnn' | b'2_layer_dnn' |J\x08\n\x06\n\x04text" has type str, but isn't valid UTF-8 encoding. Non-UTF-8 strings must be converted to unicode objects before being added.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.