Coder Social home page Coder Social logo

Comments (16)

mthiboust avatar mthiboust commented on May 18, 2024 1

Wonderful! Thanks a lot @Star9daisy for pointing this out!

I am still curious to understand why the slow training only happens on my specific CPU (working well on other CPUs or on my GPU). Could it be because of hardware-dependent JIT or graph-building optimizations?

from keras.

mthiboust avatar mthiboust commented on May 18, 2024

I can make it work with a basic class instance without subclassing keras.losses.Loss:

class QuantileLoss:
    def __init__(self, quantile: float = 0.5):
        self.quantile = quantile

    def __call__(self, y_true, y_pred):
        error = y_pred - y_true
        loss = ops.maximum((self.quantile * error), (self.quantile - 1) * error)
        return ops.mean(loss)

model.compile(loss=QuantileLoss(quantile=0.5))

Is it the way to go since keras 3?

from keras.

fchollet avatar fchollet commented on May 18, 2024

The code looks fine, what is the error you encounter?

from keras.

mthiboust avatar mthiboust commented on May 18, 2024

My code is run by a jupyterlab server (using the lastest official docker images jupyter/tensorflow-notebook and jupyter/pytorch-notebook from jupyter/docker-stack) and I connect to it via the vscode-jupypter extension.

The crash is caused by the model.fit() call. It happens within a few seconds when using the torch backend, and a bit later with the tensorflow backend (after a few epochs). But there is no explicit error message I can share with you.

According to this link, the root cause could be a buggy installation of tensorflow/pytorch due to mixing pip and conda packages (jupyter official image installs tensorflow via pip while the other packages are installed via mamba/conda)

from keras.

mthiboust avatar mthiboust commented on May 18, 2024

I reproduced the bug with the latest tensorflow/tensorflow officiel Docker image with the following code:

Run the official image:

docker run -it --rm tensorflow/tensorflow bash

Install pandas, copy and run the python code:

apt-get update && apt-get install vim
pip install pandas
vim test.py # and then copy and save the code below
python test.py

Python code:

import numpy as np
import pandas as pd

from keras.layers import Dense, Input
from keras.models import Model
from keras.losses import Loss
from keras import ops

class QuantileLoss(Loss):
    def __init__(
        self,
        name: str = "quantile",
        quantile: float = 0.5,
        reduction="sum_over_batch_size",
    ) -> None:
        super().__init__(name=name, reduction=reduction)
        self.quantile = quantile

    def call(self, y_true, y_pred):
        error = y_pred - y_true
        loss = ops.maximum((self.quantile * error), (self.quantile - 1) * error)
        return ops.mean(loss)


X = np.random.random((100000, 100))
y = pd.Series(np.random.random((100000,)))

features = Input(shape=(X.shape[1],))
layers = Dense(200, activation="relu")(features)
labels = Dense(1, activation=None)(layers)

model = Model(features, labels)

model.compile(optimizer="adam", loss=QuantileLoss(quantile=0.5))

model.fit(
    X,
    y.to_numpy(), # Working well with just `y`
    verbose=True,
    epochs=50,
    batch_size=10000,
)

Training time and memory usage is very different depending on the type of the y target:

  1. pd.Series: I have 8 ms/step during training
  2. np.ndarray: I have 600 ms/step with high memory usage (that crashes/freezes my laptop)

My code runs on CPU (i7-9750H) / Ubuntu 23.10 / Docker 24.0.5 with Keras 3.0.5 and Tensorflow 2.16.1

from keras.

benz0li avatar benz0li commented on May 18, 2024

I cannot reproduce with image glcr.b-data.ch/jupyterlab/cuda/python/scipy:3.12.3 (Container: CUDA 12.4.1 + Python 3.12.3).

Cross reference:

Code run on CPU (Intel(R) Xeon(R) Silver 4210R) / GPU (Quadro RTX 4000, Compute Capability 7.5) / Ubuntu 22.04 (Container) with Keras 3.3.3, Numpy 1.26.4 and Tensorflow 2.16.1.

from keras.

mthiboust avatar mthiboust commented on May 18, 2024

This strange behavior may be CPU-specific. Could you reproduce the bug using only the CPU without CUDA?

from keras.

benz0li avatar benz0li commented on May 18, 2024

This strange behavior may be CPU-specific. Could you reproduce the bug using only the CPU without CUDA?

No. I cannot reproduce with image glcr.b-data.ch/jupyterlab/python/scipy:3.12.3 (Container: Python 3.12.3) on Debian 12 (bookworm) using Docker 26.1.0 either:

Cross reference:

Code run on CPU (Intel(R) Xeon(R) Silver 4210R) / Ubuntu 22.04 (Container) with Keras 3.3.3, Numpy 1.26.4 and Tensorflow 2.16.1.

from keras.

mthiboust avatar mthiboust commented on May 18, 2024

Thanks @benz0li for testing it!

@sachinprasadhs : Now that we know that this issue is not reproductible easily, is there something else I should look at and/or test to better diagnose the issue?

from keras.

benz0li avatar benz0li commented on May 18, 2024

Thanks @benz0li for testing it!

P.S.: On my machine, I cannot reproduce the bug with the latest tensorflow/tensorflow (using CPU) either.

from keras.

benz0li avatar benz0li commented on May 18, 2024

is there something else I should look at and/or test to better diagnose the issue?

Yes: Output of python test.py, i.e. log files. (Optional: Use latest versions of docker, numpy, keras and pandas)

from keras.

mthiboust avatar mthiboust commented on May 18, 2024

I confirm that my issue happens on CPU with latest versions of Tensorflow 2.16.1 / Keras 3.3.3 / Numpy 1.26.4 / Pandas 2.2.2. It only happens when using my CPU (it is working well on my GPU with tensorflow/tensorflow:latest-gpu image)

Running test.py with a np.ndarray for the y target:

root@0a0414c2c84b:/# python test.py
2024-05-07 22:02:20.541375: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Epoch 1/50
2024-05-07 22:02:23.140812: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 400000000 exceeds 10% of free system memory.
2024-05-07 22:02:23.300856: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 400000000 exceeds 10% of free system memory.
2024-05-07 22:02:23.451243: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 400000000 exceeds 10% of free system memory.
2024-05-07 22:02:23.637732: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 400000000 exceeds 10% of free system memory.
2024-05-07 22:02:23.816659: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 400000000 exceeds 10% of free system memory.
10/10 ━━━━━━━━━━━━━━━━━━━━ 8s 663ms/step - loss: 0.1569
Epoch 2/50
10/10 ━━━━━━━━━━━━━━━━━━━━ 7s 645ms/step - loss: 0.1397
Epoch 3/50
10/10 ━━━━━━━━━━━━━━━━━━━━ 7s 657ms/step - loss: 0.1344
Epoch 4/50
10/10 ━━━━━━━━━━━━━━━━━━━━ 7s 663ms/step - loss: 0.1313
Epoch 5/50
10/10 ━━━━━━━━━━━━━━━━━━━━ 7s 658ms/step - loss: 0.1293
[...]

Running test.py with a pd.Series for the y target:

root@0a0414c2c84b:/# python test.py
2024-05-07 22:04:24.869910: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Epoch 1/50
10/10 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - loss: 0.1597  
Epoch 2/50
10/10 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - loss: 0.1419 
Epoch 3/50
10/10 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - loss: 0.1360 
Epoch 4/50
10/10 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - loss: 0.1326 
Epoch 5/50
10/10 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - loss: 0.1298
[...]

My docker version is 24.0.5. Haven't tested it with latest version of Docker but I could try it next week if necessary.

from keras.

Star9daisy avatar Star9daisy commented on May 18, 2024

Hi @mthiboust, I tried to reproduce your issue but failed. However, I do find the solution to your problem -- it's all about the tensor shape.

I print out the shape of y_true, y_pred, loss in QuantileLoss.call and compile the model with run_eagerly=True to show the middle output correctly:

y is a numpy array of shape (100000,):

[#1 QuantileLoss(Loss)]
(10000,) (10000, 1) (10000, 10000)

[#2 QuantileLoss]
(10000, 1) (10000, 1) (10000, 1)

[#3 quantile_loss_fn]
(10000, 1) (10000, 1) (10000, 1)
y is a pandas series of shape (100000,):

[#1 QuantileLoss(Loss)]
(10000, 1) (10000, 1) (10000, 1)

[#2 QuantileLoss]
(10000, 1) (10000, 1) (10000, 1)

[#3 quantile_loss_fn]
(10000, 1) (10000, 1) (10000, 1)

Now we can see that the only "strange" case happens when you subtract y_pred with y_true in custom loss that subclasses keras.losses.Loss -- (10000,) - (10000,1) => (10000, 10000). This is the reason why the training process is much slower. And probably, I guess it is this large tensor that crashed your jupyter kernel.

So it is now clear all you need to do is to ensure y_true has the same rank or dimension like y_pred. You could use ops.squeeze: error = ops.squeeze(y_pred) - ops.squeeze(y_true). This way the speed will be the same.

Check out this colab notebook here:
https://colab.research.google.com/drive/1tMDC0repnsJ8Z3R_sn-Ef1MCxkduBrhJ?usp=sharing

from keras.

google-ml-butler avatar google-ml-butler commented on May 18, 2024

Are you satisfied with the resolution of your issue?
Yes
No

from keras.

Star9daisy avatar Star9daisy commented on May 18, 2024

@mthiboust My pleasure to see that it solved your problem 😄.

By the way, I don't understand very well that slow training only happens on your CPU... I've tried it on colab and on my local servers. All of them show that (10000, 10000) is slower than the (10000, 1) case. Emmm, since the former one is a larger array, it is reasonable that your CPU works on it for a longer time.

from keras.

mthiboust avatar mthiboust commented on May 18, 2024

Ah, I may have misunderstood what you meant by I tried to reproduce your issue but failed in your previous message. It makes sense if you also see the slow down.

from keras.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.