Coder Social home page Coder Social logo

Comments (6)

MiquelFerriol avatar MiquelFerriol commented on July 18, 2024 1

I am still unable to reproduce your issue, even after doing a fresh installation of everything.
Let me give you some hints that probably can cause the problem:

  • Data normalization/standardization: This may be the simplest solution and the one that works the best. You can do so by changing the transformation function.
  • Changing activation functions: It is possible that the Relu activation function (that is now used) kills some neurons and this may cause the model to learn some fixed distribution.
  • Learning issues: If this process only happens 'sometimes' and after continuing training a saved model, it is possible that it is caused by some training issues. Actions like modifying the learning rate, optimizer, decay rate..., might also be helpful to avoid this problem.

Hope it helps!

from gnnetworkingchallenge.

MiquelFerriol avatar MiquelFerriol commented on July 18, 2024

Hi @diegocervera,
As you mention this is weird behavior. Could you please make a fresh run of the baseline and paste here the output?
It should look something like this:

Checking if there is a pretrained model:
Starting training from scratch...

Training the model (As you mention the Loss should decrease):
4000/4000 [==============================] - 365s 90ms/step - loss: 1453.4537 - MAPE: 1453.4537 - val_loss: 64.2551 - val_MAPE: 64.2551

The model is saved:
Epoch 00001: saving model to ../trained_model\01-64.26-64.26

Evaluation is performed:
50/50 [==============================] - 4s 78ms/step - loss: 66.9117 - MAPE: 66.9117

Predictions are printed:
[[-0.03231816] [ 0.08848949] [ 0.01077648] ... [ 0.02286335] [ 0.10745592] [ 0.06697173]]

from gnnetworkingchallenge.

diegocervera avatar diegocervera commented on July 18, 2024

@MiquelFerriol, see below the log of the reported behaviour. The results are not consistent though and sometimes the predictions do have different values. I'm currently trying to figure out in which scenarios all the predictions become the same.

/home/dc6/anaconda3/envs/sixdegreeschg/bin/python /home/dc6/Documents/Projects/GIT_clones/six_degrees/resources/temp/code/main.py
2021-09-03 11:22:52.663488: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-09-03 11:22:52.663671: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-09-03 11:22:52.664766: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
Found a pretrained model, restoring...
2021-09-03 11:22:52.828575: I tensorflow/core/profiler/lib/profiler_session.cc:136] Profiler session initializing.
2021-09-03 11:22:52.828589: I tensorflow/core/profiler/lib/profiler_session.cc:155] Profiler session started.
2021-09-03 11:22:52.828607: I tensorflow/core/profiler/lib/profiler_session.cc:172] Profiler session tear down.
2021-09-03 11:22:52.836855: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-09-03 11:22:52.856774: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3199980000 Hz
Epoch 1/2
WARNING:tensorflow:AutoGraph could not transform <bound method RouteNetModel.call of <tensorflow.python.eager.function.TfMethodTarget object at 0x7ffaf46ae910>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module 'gast' has no attribute 'Index'
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
/home/dc6/anaconda3/envs/sixdegreeschg/lib/python3.9/site-packages/tensorflow/python/framework/indexed_slices.py:435: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/GatherV2_14_grad/Reshape_1:0", shape=(None,), dtype=int32), values=Tensor("gradients/GatherV2_14_grad/Reshape:0", shape=(None, 16), dtype=float32), dense_shape=Tensor("gradients/GatherV2_14_grad/Cast:0", shape=(2,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.
  warnings.warn(
/home/dc6/anaconda3/envs/sixdegreeschg/lib/python3.9/site-packages/tensorflow/python/framework/indexed_slices.py:435: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/GatherV2_13_grad/Reshape_1:0", shape=(None,), dtype=int32), values=Tensor("gradients/GatherV2_13_grad/Reshape:0", shape=(None, 32), dtype=float32), dense_shape=Tensor("gradients/GatherV2_13_grad/Cast:0", shape=(2,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.
  warnings.warn(
/home/dc6/anaconda3/envs/sixdegreeschg/lib/python3.9/site-packages/tensorflow/python/framework/indexed_slices.py:435: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/GatherV2_12_grad/Reshape_1:0", shape=(None,), dtype=int32), values=Tensor("gradients/GatherV2_12_grad/Reshape:0", shape=(None, 16), dtype=float32), dense_shape=Tensor("gradients/GatherV2_12_grad/Cast:0", shape=(2,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.
  warnings.warn(
/home/dc6/anaconda3/envs/sixdegreeschg/lib/python3.9/site-packages/tensorflow/python/framework/indexed_slices.py:435: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/GatherV2_11_grad/Reshape_1:0", shape=(None,), dtype=int32), values=Tensor("gradients/GatherV2_11_grad/Reshape:0", shape=(None, 32), dtype=float32), dense_shape=Tensor("gradients/GatherV2_11_grad/Cast:0", shape=(2,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.
  warnings.warn(
/home/dc6/anaconda3/envs/sixdegreeschg/lib/python3.9/site-packages/tensorflow/python/framework/indexed_slices.py:435: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/GatherV2_10_grad/Reshape_1:0", shape=(None,), dtype=int32), values=Tensor("gradients/GatherV2_10_grad/Reshape:0", shape=(None, 16), dtype=float32), dense_shape=Tensor("gradients/GatherV2_10_grad/Cast:0", shape=(2,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.
  warnings.warn(
/home/dc6/anaconda3/envs/sixdegreeschg/lib/python3.9/site-packages/tensorflow/python/framework/indexed_slices.py:435: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/GatherV2_9_grad/Reshape_1:0", shape=(None,), dtype=int32), values=Tensor("gradients/GatherV2_9_grad/Reshape:0", shape=(None, 32), dtype=float32), dense_shape=Tensor("gradients/GatherV2_9_grad/Cast:0", shape=(2,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.
  warnings.warn(
/home/dc6/anaconda3/envs/sixdegreeschg/lib/python3.9/site-packages/tensorflow/python/framework/indexed_slices.py:435: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/GatherV2_8_grad/Reshape_1:0", shape=(None,), dtype=int32), values=Tensor("gradients/GatherV2_8_grad/Reshape:0", shape=(None, 16), dtype=float32), dense_shape=Tensor("gradients/GatherV2_8_grad/Cast:0", shape=(2,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.
  warnings.warn(
/home/dc6/anaconda3/envs/sixdegreeschg/lib/python3.9/site-packages/tensorflow/python/framework/indexed_slices.py:435: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/GatherV2_7_grad/Reshape_1:0", shape=(None,), dtype=int32), values=Tensor("gradients/GatherV2_7_grad/Reshape:0", shape=(None, 32), dtype=float32), dense_shape=Tensor("gradients/GatherV2_7_grad/Cast:0", shape=(2,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.
  warnings.warn(
/home/dc6/anaconda3/envs/sixdegreeschg/lib/python3.9/site-packages/tensorflow/python/framework/indexed_slices.py:435: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/GatherV2_6_grad/Reshape_1:0", shape=(None,), dtype=int32), values=Tensor("gradients/GatherV2_6_grad/Reshape:0", shape=(None, 16), dtype=float32), dense_shape=Tensor("gradients/GatherV2_6_grad/Cast:0", shape=(2,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.
  warnings.warn(
/home/dc6/anaconda3/envs/sixdegreeschg/lib/python3.9/site-packages/tensorflow/python/framework/indexed_slices.py:435: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/GatherV2_5_grad/Reshape_1:0", shape=(None,), dtype=int32), values=Tensor("gradients/GatherV2_5_grad/Reshape:0", shape=(None, 32), dtype=float32), dense_shape=Tensor("gradients/GatherV2_5_grad/Cast:0", shape=(2,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.
  warnings.warn(
/home/dc6/anaconda3/envs/sixdegreeschg/lib/python3.9/site-packages/tensorflow/python/framework/indexed_slices.py:435: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/GatherV2_4_grad/Reshape_1:0", shape=(None,), dtype=int32), values=Tensor("gradients/GatherV2_4_grad/Reshape:0", shape=(None, 16), dtype=float32), dense_shape=Tensor("gradients/GatherV2_4_grad/Cast:0", shape=(2,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.
  warnings.warn(
/home/dc6/anaconda3/envs/sixdegreeschg/lib/python3.9/site-packages/tensorflow/python/framework/indexed_slices.py:435: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/GatherV2_3_grad/Reshape_1:0", shape=(None,), dtype=int32), values=Tensor("gradients/GatherV2_3_grad/Reshape:0", shape=(None, 32), dtype=float32), dense_shape=Tensor("gradients/GatherV2_3_grad/Cast:0", shape=(2,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.
  warnings.warn(
/home/dc6/anaconda3/envs/sixdegreeschg/lib/python3.9/site-packages/tensorflow/python/framework/indexed_slices.py:435: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/GatherV2_2_grad/Reshape_1:0", shape=(None,), dtype=int32), values=Tensor("gradients/GatherV2_2_grad/Reshape:0", shape=(None, 16), dtype=float32), dense_shape=Tensor("gradients/GatherV2_2_grad/Cast:0", shape=(2,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.
  warnings.warn(
/home/dc6/anaconda3/envs/sixdegreeschg/lib/python3.9/site-packages/tensorflow/python/framework/indexed_slices.py:435: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/GatherV2_1_grad/Reshape_1:0", shape=(None,), dtype=int32), values=Tensor("gradients/GatherV2_1_grad/Reshape:0", shape=(None, 32), dtype=float32), dense_shape=Tensor("gradients/GatherV2_1_grad/Cast:0", shape=(2,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.
  warnings.warn(
/home/dc6/anaconda3/envs/sixdegreeschg/lib/python3.9/site-packages/tensorflow/python/framework/indexed_slices.py:435: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/GatherV2_grad/Reshape_1:0", shape=(None,), dtype=int32), values=Tensor("gradients/GatherV2_grad/Reshape:0", shape=(None, 16), dtype=float32), dense_shape=Tensor("gradients/GatherV2_grad/Cast:0", shape=(2,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.
  warnings.warn(
   1/4000 [..............................] - ETA: 13:07:42 - loss: 44.4313 - MAPE: 44.43132021-09-03 11:23:05.054102: I tensorflow/core/profiler/lib/profiler_session.cc:136] Profiler session initializing.
2021-09-03 11:23:05.054121: I tensorflow/core/profiler/lib/profiler_session.cc:155] Profiler session started.
   2/4000 [..............................] - ETA: 59:30 - loss: 43.4709 - MAPE: 43.4709   2021-09-03 11:23:05.579088: I tensorflow/core/profiler/lib/profiler_session.cc:71] Profiler session collecting data.
2021-09-03 11:23:05.729546: I tensorflow/core/profiler/lib/profiler_session.cc:172] Profiler session tear down.
2021-09-03 11:23:05.785050: I tensorflow/core/profiler/rpc/client/save_profile.cc:137] Creating directory: ../trained_model/logs/train/plugins/profile/2021_09_03_11_23_05
2021-09-03 11:23:05.812587: I tensorflow/core/profiler/rpc/client/save_profile.cc:143] Dumped gzipped tool data for trace.json.gz to ../trained_model/logs/train/plugins/profile/2021_09_03_11_23_05/dc6-desk.trace.json.gz
2021-09-03 11:23:05.873235: I tensorflow/core/profiler/rpc/client/save_profile.cc:137] Creating directory: ../trained_model/logs/train/plugins/profile/2021_09_03_11_23_05
2021-09-03 11:23:05.878183: I tensorflow/core/profiler/rpc/client/save_profile.cc:143] Dumped gzipped tool data for memory_profile.json.gz to ../trained_model/logs/train/plugins/profile/2021_09_03_11_23_05/dc6-desk.memory_profile.json.gz
2021-09-03 11:23:05.879813: I tensorflow/core/profiler/rpc/client/capture_profile.cc:251] Creating directory: ../trained_model/logs/train/plugins/profile/2021_09_03_11_23_05Dumped tool data for xplane.pb to ../trained_model/logs/train/plugins/profile/2021_09_03_11_23_05/dc6-desk.xplane.pb
Dumped tool data for overview_page.pb to ../trained_model/logs/train/plugins/profile/2021_09_03_11_23_05/dc6-desk.overview_page.pb
Dumped tool data for input_pipeline.pb to ../trained_model/logs/train/plugins/profile/2021_09_03_11_23_05/dc6-desk.input_pipeline.pb
Dumped tool data for tensorflow_stats.pb to ../trained_model/logs/train/plugins/profile/2021_09_03_11_23_05/dc6-desk.tensorflow_stats.pb
Dumped tool data for kernel_stats.pb to ../trained_model/logs/train/plugins/profile/2021_09_03_11_23_05/dc6-desk.kernel_stats.pb

4000/4000 [==============================] - 1378s 342ms/step - loss: 44.6780 - MAPE: 44.6780 - val_loss: 46.7430 - val_MAPE: 46.7430

Epoch 00001: saving model to ../trained_model/01-46.74-46.74
Epoch 2/2
4000/4000 [==============================] - 1453s 363ms/step - loss: 45.8965 - MAPE: 45.8965 - val_loss: 47.1610 - val_MAPE: 47.1610

Epoch 00002: saving model to ../trained_model/02-47.16-47.16
50/50 [==============================] - 9s 168ms/step - loss: 48.4529 - MAPE: 48.4529
[[0.08492264]
 [0.08492264]
 [0.08492264]
 ...
 [0.08492264]
 [0.08492264]
 [0.08492264]]
samples: 43500
pred_std: 2.2351742e-08

Process finished with exit code 0

from gnnetworkingchallenge.

MiquelFerriol avatar MiquelFerriol commented on July 18, 2024

Some comments:

  • I see you are using Tensorflow 2.5. We tested it and should work, however, could you try using TF 2.4? It may be some differences between those versions that can cause some problems.

  • Take into account that, if you always use the same training directory, the model will be loaded and never trained from scratch. It is possible the model si so overfitted (for some reason) that always outputs the same results.

Let's see if we fix this!

from gnnetworkingchallenge.

diegocervera avatar diegocervera commented on July 18, 2024

I'm using TF 2.4 now, and so far I haven't seen the issue again. I'll let you know if it happens again. Thanks!

from gnnetworkingchallenge.

diegocervera avatar diegocervera commented on July 18, 2024

It happen again, this time when continuing training a saved model.

from gnnetworkingchallenge.

Related Issues (11)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.