Comments (7)
First of all, thank you for your interest in our work and our challenge!
Regarding the error, it looks like this error is a very Tensorflow related very generic error. This error could be caused by a Tensorflow internal error or by an error in the iGNNition framework implementation. So, in order to reproduce it, could you please specify the following?:
- Operating System
- Python Version (looks like you are using 3.7)
- Tensorflow Version
- CUDA Version
- Example you are trying to run
- Does the example run correctly without the GPU flag enabled?
from gnnetworkingchallenge.
The system I used is Ubuntu 18.04, with Python3.7, tensorflow-gpu=2.1.0 and cuda10.1. And the example I'm trying to run is Tensorflow Baseline of 2020. I tried run it without GPU, but there were still these errors. I have no idea that if you could take a look at my logs when you are in free time, which may be a presumptuous request.
Thanks a lot!
from gnnetworkingchallenge.
Try to remove your current installation of TensorFlow and install it directly using:
pip install tensorflow==2.1.0
(which already includes tensorflow-gpu)
The tensorflow-gpu package seems to have some problems when using the XLA instructions (which seems is your issue).
If the problem does not solve by doing this, you can try to upload the logs file and I will take a look at it.
from gnnetworkingchallenge.
Thank you very much for your reply! I tried to use tensorflow==2.1.0, but these errors still exited. The following is the logs:
(test) yj@DL:/data/yj/Documents/2020test/code$ python3 main.py
2021-06-15 16:39:39.876851: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-10.1/lib64
2021-06-15 16:39:39.876954: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-10.1/lib64
2021-06-15 16:39:39.876968: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
INFO:tensorflow:Using config: {'_model_dir': '../logs/model_log', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
rewrite_options {
meta_optimizer_iterations: ONE
}
}
, '_keep_checkpoint_max': 20, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
WARNING:tensorflow:From /data/yj/yes/envs/test/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1635: calling BaseResourceVariable.init (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
WARNING:tensorflow:From /data/yj/yes/envs/test/lib/python3.7/site-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
2021-06-15 16:39:40.821831: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-06-15 16:39:40.854148: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:02:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.582GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s
2021-06-15 16:39:40.855742: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 1 with properties:
pciBusID: 0000:82:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.582GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s
2021-06-15 16:39:40.856123: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-06-15 16:39:40.858860: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-06-15 16:39:40.860338: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2021-06-15 16:39:40.860725: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2021-06-15 16:39:40.862982: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2021-06-15 16:39:40.864523: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2021-06-15 16:39:40.869364: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-06-15 16:39:40.872986: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0, 1
INFO:tensorflow:Calling model_fn.
/data/yj/yes/envs/test/lib/python3.7/site-packages/tensorflow_core/python/framework/indexed_slices.py:433: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
Traceback (most recent call last):
File "/data/yj/yes/envs/test/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 2326, in get_attr
c_api.TF_OperationGetAttrValueProto(self._c_op, name, buf)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Operation 'route_net_model/UnsortedSegmentSum_6' has no attr named '_XlaCompile'.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data/yj/yes/envs/test/lib/python3.7/site-packages/tensorflow_core/python/ops/gradients_util.py", line 331, in _MaybeCompile
xla_compile = op.get_attr("_XlaCompile")
File "/data/yj/yes/envs/test/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 2330, in get_attr
raise ValueError(str(e))
ValueError: Operation 'route_net_model/UnsortedSegmentSum_6' has no attr named '_XlaCompile'.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "main.py", line 184, in
model_dir=config['DIRECTORIES']['logs'])
File "main.py", line 64, in train_and_evaluate
tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
File "/data/yj/yes/envs/test/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 473, in train_and_evaluate
return executor.run()
File "/data/yj/yes/envs/test/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 613, in run
return self.run_local()
File "/data/yj/yes/envs/test/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 714, in run_local
saving_listeners=saving_listeners)
File "/data/yj/yes/envs/test/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 374, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/data/yj/yes/envs/test/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1164, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/data/yj/yes/envs/test/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1194, in _train_model_default
features, labels, ModeKeys.TRAIN, self.config)
File "/data/yj/yes/envs/test/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1152, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "/data/yj/Documents/2020test/code/routenet_model.py", line 256, in model_fn
grads = tf.gradients(total_loss, model.trainable_variables)
File "/data/yj/yes/envs/test/lib/python3.7/site-packages/tensorflow_core/python/ops/gradients_impl.py", line 274, in gradients_v2
unconnected_gradients)
File "/data/yj/yes/envs/test/lib/python3.7/site-packages/tensorflow_core/python/ops/gradients_util.py", line 669, in _GradientsHelper
lambda: grad_fn(op, *out_grads))
File "/data/yj/yes/envs/test/lib/python3.7/site-packages/tensorflow_core/python/ops/gradients_util.py", line 336, in _MaybeCompile
return grad_fn() # Exit early
File "/data/yj/yes/envs/test/lib/python3.7/site-packages/tensorflow_core/python/ops/gradients_util.py", line 669, in
lambda: grad_fn(op, *out_grads))
File "/data/yj/yes/envs/test/lib/python3.7/site-packages/tensorflow_core/python/ops/math_grad.py", line 476, in _UnsortedSegmentSumGrad
return _GatherDropNegatives(grad, op.inputs[1])[0], None, None
File "/data/yj/yes/envs/test/lib/python3.7/site-packages/tensorflow_core/python/ops/math_grad.py", line 444, in _GatherDropNegatives
dtype=is_positive_shape.dtype)],
File "/data/yj/yes/envs/test/lib/python3.7/site-packages/tensorflow_core/python/ops/array_ops.py", line 2659, in ones
output = _constant_if_small(one, shape, dtype, name)
File "/data/yj/yes/envs/test/lib/python3.7/site-packages/tensorflow_core/python/ops/array_ops.py", line 2391, in _constant_if_small
if np.prod(shape) < 1000:
File "<array_function internals>", line 6, in prod
File "/data/yj/yes/envs/test/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 3031, in prod
keepdims=keepdims, initial=initial, where=where)
File "/data/yj/yes/envs/test/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 87, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
File "/data/yj/yes/envs/test/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 728, in array
" array.".format(self.name))
NotImplementedError: Cannot convert a symbolic Tensor (gradients/route_net_model/UnsortedSegmentSum_6_grad/sub:0) to a numpy array.
from gnnetworkingchallenge.
Everything looks fine to me...
What I would suggest now is two things:
- Create a new virtual environment with a Tensorflow 2.3 (I checked it and the code is compatible)
- Make sure all the TF dependencies are satisfied. It looks like it could be a problem with the NumPy installation due to this line:
Cannot convert a symbolic Tensor (gradients/route_net_model/UnsortedSegmentSum_6_grad/sub:0) to a numpy array.
from gnnetworkingchallenge.
Thanks a lot! Your advise solved my problem completely. Wish every success in your life!
from gnnetworkingchallenge.
Closing this issue as the problem seems to be solved.
from gnnetworkingchallenge.
Related Issues (11)
- Regarding using index HOT 2
- What's the difference between "flow_type" in the data and goole drive CBR, MB itself? HOT 1
- `predict.py` causing errors HOT 9
- Why I predict the result is -1 allοΌ HOT 6
- TF version warnings HOT 1
- Same value for all predictions HOT 6
- implementation in the given code HOT 10
- The performance of this baseline project HOT 1
- Traffic matrix extracted from simulation? HOT 2
- Problem with the Simulator when changing the Max topology number and the max bandwidth HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gnnetworkingchallenge.