Comments (16)
Wonderful! Thanks a lot @Star9daisy for pointing this out!
I am still curious to understand why the slow training only happens on my specific CPU (working well on other CPUs or on my GPU). Could it be because of hardware-dependent JIT or graph-building optimizations?
from keras.
I can make it work with a basic class instance without subclassing keras.losses.Loss
:
class QuantileLoss:
def __init__(self, quantile: float = 0.5):
self.quantile = quantile
def __call__(self, y_true, y_pred):
error = y_pred - y_true
loss = ops.maximum((self.quantile * error), (self.quantile - 1) * error)
return ops.mean(loss)
model.compile(loss=QuantileLoss(quantile=0.5))
Is it the way to go since keras 3
?
from keras.
The code looks fine, what is the error you encounter?
from keras.
My code is run by a jupyterlab server (using the lastest official docker images jupyter/tensorflow-notebook
and jupyter/pytorch-notebook
from jupyter/docker-stack) and I connect to it via the vscode-jupypter extension.
The crash is caused by the model.fit()
call. It happens within a few seconds when using the torch
backend, and a bit later with the tensorflow
backend (after a few epochs). But there is no explicit error message I can share with you.
According to this link, the root cause could be a buggy installation of tensorflow/pytorch due to mixing pip
and conda
packages (jupyter official image installs tensorflow via pip while the other packages are installed via mamba/conda)
from keras.
I reproduced the bug with the latest tensorflow/tensorflow
officiel Docker image with the following code:
Run the official image:
docker run -it --rm tensorflow/tensorflow bash
Install pandas
, copy and run the python code:
apt-get update && apt-get install vim
pip install pandas
vim test.py # and then copy and save the code below
python test.py
Python code:
import numpy as np
import pandas as pd
from keras.layers import Dense, Input
from keras.models import Model
from keras.losses import Loss
from keras import ops
class QuantileLoss(Loss):
def __init__(
self,
name: str = "quantile",
quantile: float = 0.5,
reduction="sum_over_batch_size",
) -> None:
super().__init__(name=name, reduction=reduction)
self.quantile = quantile
def call(self, y_true, y_pred):
error = y_pred - y_true
loss = ops.maximum((self.quantile * error), (self.quantile - 1) * error)
return ops.mean(loss)
X = np.random.random((100000, 100))
y = pd.Series(np.random.random((100000,)))
features = Input(shape=(X.shape[1],))
layers = Dense(200, activation="relu")(features)
labels = Dense(1, activation=None)(layers)
model = Model(features, labels)
model.compile(optimizer="adam", loss=QuantileLoss(quantile=0.5))
model.fit(
X,
y.to_numpy(), # Working well with just `y`
verbose=True,
epochs=50,
batch_size=10000,
)
Training time and memory usage is very different depending on the type of the y
target:
pd.Series
: I have 8 ms/step during trainingnp.ndarray
: I have 600 ms/step with high memory usage (that crashes/freezes my laptop)
My code runs on CPU (i7-9750H) / Ubuntu 23.10 / Docker 24.0.5 with Keras 3.0.5 and Tensorflow 2.16.1
from keras.
I cannot reproduce with image glcr.b-data.ch/jupyterlab/cuda/python/scipy:3.12.3
(Container: CUDA 12.4.1 + Python 3.12.3).
Cross reference:
Code run on CPU (Intel(R) Xeon(R) Silver 4210R) / GPU (Quadro RTX 4000, Compute Capability 7.5) / Ubuntu 22.04 (Container) with Keras 3.3.3, Numpy 1.26.4 and Tensorflow 2.16.1.
from keras.
This strange behavior may be CPU-specific. Could you reproduce the bug using only the CPU without CUDA?
from keras.
This strange behavior may be CPU-specific. Could you reproduce the bug using only the CPU without CUDA?
No. I cannot reproduce with image glcr.b-data.ch/jupyterlab/python/scipy:3.12.3
(Container: Python 3.12.3) on Debian 12 (bookworm) using Docker 26.1.0 either:
Cross reference:
Code run on CPU (Intel(R) Xeon(R) Silver 4210R) / Ubuntu 22.04 (Container) with Keras 3.3.3, Numpy 1.26.4 and Tensorflow 2.16.1.
from keras.
Thanks @benz0li for testing it!
@sachinprasadhs : Now that we know that this issue is not reproductible easily, is there something else I should look at and/or test to better diagnose the issue?
from keras.
Thanks @benz0li for testing it!
P.S.: On my machine, I cannot reproduce the bug with the latest tensorflow/tensorflow
(using CPU) either.
from keras.
is there something else I should look at and/or test to better diagnose the issue?
Yes: Output of python test.py
, i.e. log files. (Optional: Use latest versions of docker, numpy, keras and pandas)
from keras.
I confirm that my issue happens on CPU with latest versions of Tensorflow 2.16.1 / Keras 3.3.3 / Numpy 1.26.4 / Pandas 2.2.2. It only happens when using my CPU (it is working well on my GPU with tensorflow/tensorflow:latest-gpu
image)
Running test.py
with a np.ndarray
for the y
target:
root@0a0414c2c84b:/# python test.py
2024-05-07 22:02:20.541375: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Epoch 1/50
2024-05-07 22:02:23.140812: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 400000000 exceeds 10% of free system memory.
2024-05-07 22:02:23.300856: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 400000000 exceeds 10% of free system memory.
2024-05-07 22:02:23.451243: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 400000000 exceeds 10% of free system memory.
2024-05-07 22:02:23.637732: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 400000000 exceeds 10% of free system memory.
2024-05-07 22:02:23.816659: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 400000000 exceeds 10% of free system memory.
10/10 ━━━━━━━━━━━━━━━━━━━━ 8s 663ms/step - loss: 0.1569
Epoch 2/50
10/10 ━━━━━━━━━━━━━━━━━━━━ 7s 645ms/step - loss: 0.1397
Epoch 3/50
10/10 ━━━━━━━━━━━━━━━━━━━━ 7s 657ms/step - loss: 0.1344
Epoch 4/50
10/10 ━━━━━━━━━━━━━━━━━━━━ 7s 663ms/step - loss: 0.1313
Epoch 5/50
10/10 ━━━━━━━━━━━━━━━━━━━━ 7s 658ms/step - loss: 0.1293
[...]
Running test.py
with a pd.Series
for the y
target:
root@0a0414c2c84b:/# python test.py
2024-05-07 22:04:24.869910: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Epoch 1/50
10/10 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - loss: 0.1597
Epoch 2/50
10/10 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - loss: 0.1419
Epoch 3/50
10/10 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - loss: 0.1360
Epoch 4/50
10/10 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - loss: 0.1326
Epoch 5/50
10/10 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - loss: 0.1298
[...]
My docker version is 24.0.5. Haven't tested it with latest version of Docker but I could try it next week if necessary.
from keras.
Hi @mthiboust, I tried to reproduce your issue but failed. However, I do find the solution to your problem -- it's all about the tensor shape.
I print out the shape of y_true
, y_pred
, loss
in QuantileLoss.call
and compile the model with run_eagerly=True
to show the middle output correctly:
y is a numpy array of shape (100000,):
[#1 QuantileLoss(Loss)]
(10000,) (10000, 1) (10000, 10000)
[#2 QuantileLoss]
(10000, 1) (10000, 1) (10000, 1)
[#3 quantile_loss_fn]
(10000, 1) (10000, 1) (10000, 1)
y is a pandas series of shape (100000,):
[#1 QuantileLoss(Loss)]
(10000, 1) (10000, 1) (10000, 1)
[#2 QuantileLoss]
(10000, 1) (10000, 1) (10000, 1)
[#3 quantile_loss_fn]
(10000, 1) (10000, 1) (10000, 1)
Now we can see that the only "strange" case happens when you subtract y_pred
with y_true
in custom loss that subclasses keras.losses.Loss
-- (10000,) - (10000,1) => (10000, 10000). This is the reason why the training process is much slower. And probably, I guess it is this large tensor that crashed your jupyter kernel.
So it is now clear all you need to do is to ensure y_true
has the same rank or dimension like y_pred
. You could use ops.squeeze
: error = ops.squeeze(y_pred) - ops.squeeze(y_true)
. This way the speed will be the same.
Check out this colab notebook here:
https://colab.research.google.com/drive/1tMDC0repnsJ8Z3R_sn-Ef1MCxkduBrhJ?usp=sharing
from keras.
Are you satisfied with the resolution of your issue?
Yes
No
from keras.
@mthiboust My pleasure to see that it solved your problem 😄.
By the way, I don't understand very well that slow training only happens on your CPU... I've tried it on colab and on my local servers. All of them show that (10000, 10000) is slower than the (10000, 1) case. Emmm, since the former one is a larger array, it is reasonable that your CPU works on it for a longer time.
from keras.
Ah, I may have misunderstood what you meant by I tried to reproduce your issue but failed
in your previous message. It makes sense if you also see the slow down.
from keras.
Related Issues (20)
- To Keras community: What interpretations do you have for these curves? HOT 3
- No module named 'keras.src.engine' HOT 7
- Feature request: keras.ops.linalg.lstsq HOT 4
- Example Doubt HOT 3
- More Customisation in utils.ProgBar HOT 6
- Progress bar crash when empty dataset HOT 1
- Multihead Attention Seed Specification HOT 1
- Unable to make two instances of the MobileNetV3 within the same model HOT 2
- NumPy 2.0 support HOT 3
- Add backend-agnostic worker-process data loading HOT 8
- Keras does not save weights properly HOT 2
- Potential bug in legacy h5 weights loading. HOT 2
- Enable Discussions Tab in Github HOT 1
- FeatureSpace multiple output from one input HOT 3
- `keras.Sequential` sometimes states misleading reason for failing to construct model HOT 2
- Implement tool for saved Keras model file inspection, diff, and patching. HOT 5
- Request for a map function like map_fn in TF and vmap in Jax HOT 5
- AttributeError raised: 'list' object has no attribute 'dtype' when running the official example of SparseCategoricalAccuracy, TopKCategoricalAccuracy, SparseTopKCategoricalAccuracy HOT 2
- ValueError: (F1Score|FBetaScore) expects 2D inputs with shape (batch_size, output_dim).
- `plot_model` does not work for all models in `keras.applications` HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from keras.