Coder Social home page Coder Social logo

tensorflow / recommenders-addons Goto Github PK

View Code? Open in Web Editor NEW
561.0 34.0 127.0 12.75 MB

Additional utils and helpers to extend TensorFlow when build recommendation systems, contributed and maintained by SIG Recommenders.

License: Apache License 2.0

Shell 1.20% Python 25.62% Starlark 1.84% Smarty 2.26% C++ 17.68% Dockerfile 1.08% Cuda 46.49% C 3.83%
tensorflow-recommenders-addons sig-recommenders recommender-system tensorflow dynamic-embedding

recommenders-addons's Introduction

TensorFlow Recommenders Addons


TensorFlow Recommenders logo PyPI Status Badge PyPI - Python Version Documentation

TensorFlow Recommenders Addons(TFRA) are a collection of projects related to large-scale recommendation systems built upon TensorFlow by introducing the Dynamic Embedding Technology to TensorFlow that makes TensorFlow more suitable for training models of Search, Recommendations, and Advertising and makes building, evaluating, and serving sophisticated recommenders models easy. See approved TensorFlow RFC #313. Those contributions will be complementary to TensorFlow Core and TensorFlow Recommenders etc.

For Apple silicon(M1), please refer to Apple Silicon Support.

Main Features

  • Make key-value data structure (dynamic embedding) trainable in TensorFlow
  • Get better recommendation effect compared to static embedding mechanism with no hash conflicts
  • Compatible with all native TensorFlow optimizers and initializers
  • Compatible with native TensorFlow CheckPoint and SavedModel format
  • Fully support train and inference recommenders models on GPUs
  • Support TF serving and Triton Inference Server as inference framework
  • Support variant Key-Value implements as dynamic embedding storage and easy to extend
  • Support half synchronous training based on Horovod
    • Synchronous training for dense weights
    • Asynchronous training for sparse weights

Subpackages

Contributors

TensorFlow Recommenders-Addons depends on public contributions, bug fixes, and documentation. This project exists thanks to all the people and organizations who contribute. [Contribute]



A special thanks to NVIDIA Merlin Team and NVIDIA China DevTech Team, who have provided GPU acceleration technology support and code contribution.

Tutorials & Demos

See tutorials and demo for end-to-end examples of each subpackages.

Installation

Stable Builds

TensorFlow Recommenders-Addons is available on PyPI for Linux, macOS. To install the latest version, run the following:

pip install tensorflow-recommenders-addons

By default, CPU version will be installed. To install GPU version, run the following:

pip install tensorflow-recommenders-addons-gpu

To use TensorFlow Recommenders-Addons:

import tensorflow as tf
import tensorflow_recommenders_addons as tfra

Compatibility with Tensorflow

TensorFlow C++ APIs are not stable and thus we can only guarantee compatibility with the version TensorFlow Recommenders-Addons(TFRA) was built against. It is possible TFRA will work with multiple versions of TensorFlow, but there is also a chance for segmentation faults or other problematic crashes. Warnings will be emitted if your TensorFlow version does not match what it was built against.

Additionally, TFRA custom ops registration does not have a stable ABI interface so it is required that users have a compatible installation of TensorFlow even if the versions match what we had built against. A simplification of this is that TensorFlow Recommenders-Addons custom ops will work with pip-installed TensorFlow but will have issues when TensorFlow is compiled differently. A typical example of this would be conda-installed TensorFlow. RFC #133 aims to fix this.

Compatibility Matrix

GPU is supported by version 0.2.0 and later.

TFRA TensorFlow Compiler CUDA CUDNN Compute Capability CPU
0.6.0 2.8.3 GCC 7.3.1 11.2 8.1 6.0, 6.1, 7.0, 7.5, 8.0, 8.6 x86
0.6.0 2.6.0 Xcode 13.1 - - - Apple M1
0.5.1 2.8.3 GCC 7.3.1 11.2 8.1 6.0, 6.1, 7.0, 7.5, 8.0, 8.6 x86
0.5.1 2.6.0 Xcode 13.1 - - - Apple M1
0.5.0 2.8.3 GCC 7.3.1 11.2 8.1 6.0, 6.1, 7.0, 7.5, 8.0, 8.6 x86
0.5.0 2.6.0 Xcode 13.1 - - - Apple M1
0.4.0 2.5.1 GCC 7.3.1 11.2 8.1 6.0, 6.1, 7.0, 7.5, 8.0, 8.6 x86
0.4.0 2.5.0 Xcode 13.1 - - - Apple M1
0.3.1 2.5.1 GCC 7.3.1 11.2 8.1 6.0, 6.1, 7.0, 7.5, 8.0, 8.6 x86
0.2.0 2.4.1 GCC 7.3.1 11.0 8.0 6.0, 6.1, 7.0, 7.5, 8.0 x86
0.2.0 1.15.2 GCC 7.3.1 10.0 7.6 6.0, 6.1, 7.0, 7.5 x86
0.1.0 2.4.1 GCC 7.3.1 - - - x86

Check nvidia-support-matrix for more details.

NOTICE

  • The release packages have a strict version binding relationship with TensorFlow.
  • Due to the significant changes in the Tensorflow API, we can only ensure version 0.2.0 compatibility with TF1.15.2 on CPU & GPU, but there are no official releases, you can only get it through compiling by the following:
PY_VERSION="3.9" \
TF_VERSION="2.15.1" \
TF_NEED_CUDA=1 \
sh .github/workflows/make_wheel_Linux_x86.sh

# .whl file will be created in ./wheelhouse/
  • If you need to work with TensorFlow 1.14.x or older version, we suggest you give up, but maybe this doc can help you : Extract headers from TensorFlow compiling directory. At the same time, we find some OPs used by TRFA have better performance, so we highly recommend you update TensorFlow to 2.x.

Installing from Source

For all developers, we recommend you use the development docker containers which are all GPU enabled:

docker pull tfra/dev_container:latest-tf2.15.1-python3.9  # Available tensorflow and python combinations can be found [here](https://www.tensorflow.org/install/source#linux)
docker run --privileged --gpus all -it --rm -v $(pwd):$(pwd) tfra/dev_container:latest-tf2.15.1-3.9

CPU Only

You can also install from source. This requires the Bazel build system (version == 5.1.1). Please install a TensorFlow on your compiling machine, The compiler needs to know the version of Tensorflow and its headers according to the installed TensorFlow.

export TF_VERSION="2.15.1"  # "2.11.0" are well tested.
pip install tensorflow==$TF_VERSION

git clone https://github.com/tensorflow/recommenders-addons.git
cd recommenders-addons

# This script links project with TensorFlow dependency
python configure.py

bazel build --enable_runfiles build_pip_pkg
bazel-bin/build_pip_pkg artifacts

pip install artifacts/tensorflow_recommenders_addons-*.whl

GPU Support

Only TF_NEED_CUDA=1 is required and other environment variables are optional:

export TF_VERSION="2.15.1"  # "2.11.0" is well tested.
export PY_VERSION="3.9" 
export TF_NEED_CUDA=1
export TF_CUDA_VERSION=12.2 # nvcc --version to check version
export TF_CUDNN_VERSION=8.9 # print("cuDNN version:", tf.sysconfig.get_build_info()["cudnn_version"])
export CUDA_TOOLKIT_PATH="/usr/local/cuda"
export CUDNN_INSTALL_PATH="/usr/lib/x86_64-linux-gnu"

python configure.py

And then build the pip package and install:

bazel build --enable_runfiles build_pip_pkg
bazel-bin/build_pip_pkg artifacts
pip install artifacts/tensorflow_recommenders_addons_gpu-*.whl

to run unit test

cp -f ./bazel-bin/tensorflow_recommenders_addons/dynamic_embedding/core/*.so ./tensorflow_recommenders_addons/dynamic_embedding/core/
pip install pytest
python tensorflow_recommenders_addons/tests/run_all_test.py
# and run pytest such as
pytest -s tensorflow_recommenders_addons/dynamic_embedding/python/kernel_tests/hkv_hashtable_ops_test.py

Apple Silicon Support

Requirements:

  • macOS 12.0.0+
  • Python 3.9
  • tensorflow-macos 2.9.0
  • bazel 5.1.1

The natively supported TensorFlow is maintained by Apple. Please see the instruction Get started with tensorflow-metal to install the Tensorflow on apple silicon devices.

Install TFRA on Apple Silicon via PIP

python -m pip install tensorflow-recommenders-addons --no-deps

Build TFRA on Apple Silicon from Source

# Install bazelisk
brew install bazelisk

# Build wheel from source
PY_VERSION=3.9.0 TF_VERSION=2.9.0 TF_NEED_CUDA="0" sh .github/workflows/make_wheel_macOS_arm64.sh

# Install the wheel
python -m pip install --no-deps ./artifacts/*.whl

Known Issues:

The Apple silicon version of TFRA doesn't support:

  • Data type float16
  • Synchronous training based on Horovod
  • save_to_file_system
  • load_from_file_system
  • warm_start_util

save_to_file_system and load_from_file_system are not supported because TFIO is not supported on apple silicon devices. Horovod and warm_start_util are not supported because the natively supported tensorflow-macos doesn't support V1 Tensorflow networks.

These issues may be fixed in the future release.

Data Type Matrix for tfra.dynamic_embedding.Variable
Values \ Keys int64 int32 string
float CPU, GPU CPU, GPU CPU
half CPU, GPU - CPU
int32 CPU, GPU CPU CPU
int8 CPU, GPU - CPU
int64 CPU - CPU
double CPU, CPU CPU CPU
bool - - CPU
string CPU - -
To use GPU by tfra.dynamic_embedding.Variable

The tfra.dynamic_embedding.Variable will ignore the device placement mechanism of TensorFlow, you should specify the devices onto GPUs explicitly for it.

import tensorflow as tf
import tensorflow_recommenders_addons as tfra

de = tfra.dynamic_embedding.get_variable("VariableOnGpu",
                                         devices=["/job:ps/task:0/GPU:0", ],
                                         # ...
                                         )

Usage restrictions on GPU

  • Only work on Nvidia GPU with cuda compute capability 6.0 or higher.
  • Considering the size of the .whl file, currently dim only supports less than or equal to 200, if you need longer dim, please submit an issue.
  • Only dynamic_embedding APIs and relative OPs support running on GPU.
  • For GPU HashTables manage GPU memory independently, TensorFlow should be configured to allow GPU memory growth by the following:
sess_config.gpu_options.allow_growth = True

Inference

With TensorFlow Serving

Compatibility Matrix

TFRA TensorFlow Serving branch Compiler CUDA CUDNN Compute Capability
0.6.0 2.8.3 r2.8 GCC 7.3.1 11.2 8.1 6.0, 6.1, 7.0, 7.5, 8.0, 8.6
0.5.1 2.8.3 r2.8 GCC 7.3.1 11.2 8.1 6.0, 6.1, 7.0, 7.5, 8.0, 8.6
0.5.0 2.8.3 r2.8 GCC 7.3.1 11.2 8.1 6.0, 6.1, 7.0, 7.5, 8.0, 8.6
0.4.0 2.5.1 r2.5 GCC 7.3.1 11.2 8.1 6.0, 6.1, 7.0, 7.5, 8.0, 8.6
0.3.1 2.5.1 r2.5 GCC 7.3.1 11.2 8.1 6.0, 6.1, 7.0, 7.5, 8.0, 8.6
0.2.0 2.4.1 r2.4 GCC 7.3.1 11.0 8.0 6.0, 6.1, 7.0, 7.5, 8.0
0.2.0 1.15.2 r1.15 GCC 7.3.1 10.0 7.6 6.0, 6.1, 7.0, 7.5
0.1.0 2.4.1 r2.4 GCC 7.3.1 - - -

Serving TFRA-enable models by custom ops in TensorFlow Serving.

## If enable GPU OPs
export SERVING_WITH_GPU=1 

## Specifiy the branch of TFRA
export TFRA_BRANCH="master" # The `master` and `r0.6` are available.

## Create workspace, modify the directory as you prefer to.
export TFRA_SERVING_WORKSPACE=~/tfra_serving_workspace/
mkdir -p $TFRA_SERVING_WORKSPACE && cd $TFRA_SERVING_WORKSPACE

## Clone the release branches of serving and TFRA according to `Compatibility Matrix`.
git clone -b r2.8 https://github.com/tensorflow/serving.git
git clone -b $TFRA_BRANCH https://github.com/tensorflow/recommenders-addons.git

## Run config shell script
cd $TFRA_SERVING_WORKSPACE/recommenders-addons/tools
bash config_tfserving.sh $TFRA_BRANCH $TFRA_SERVING_WORKSPACE/serving $SERVING_WITH_GPU

## Build serving with TFRA OPs.
cd $TFRA_SERVING_WORKSPACE/serving
./tools/run_in_docker.sh bazel build tensorflow_serving/model_servers:tensorflow_model_server

For more detail, please refer to the shell script ./tools/config_tfserving.sh.

NOTICE

With Triton

When building the custom operations shared library it is important to use the same version of TensorFlow as is being used in Triton. You can find the TensorFlow version in the Triton Release Notes. A simple way to ensure you are using the correct version of TensorFlow is to use the NGC TensorFlow container corresponding to the Triton container. For example, if you are using the 23.05 version of Triton, use the 23.05 version of the TensorFlow container.

docker pull nvcr.io/nvidia/tritonserver:22.05-py3

export TFRA_BRANCH="master"
git clone -b $TFRA_BRANCH https://github.com/tensorflow/recommenders-addons.git
cd recommenders-addons

python configure.py
bazel build //tensorflow_recommenders_addons/dynamic_embedding/core:_cuckoo_hashtable_ops.so ##bazel 5.1.1 is well tested
mkdir /tmp/so
#you can also use the so file from pip install package file from "(PYTHONPATH)/site-packages/tensorflow_recommenders_addons/dynamic_embedding/core/_cuckoo_hashtable_ops.so"
cp bazel-bin/tensorflow_recommenders_addons/dynamic_embedding/core/_cuckoo_hashtable_ops.so /tmp/so

#tfra saved_model directory "/models/model_repository"
docker run --net=host -v /models/model_repository:/models nvcr.io/nvidia/tritonserver:22.05-py3 bash -c \
  "LD_PRELOAD=/tmp/so/_cuckoo_hashtable_ops.so:${LD_PRELOAD} tritonserver --model-repository=/models/ --backend-config=tensorflow,version=2 --strict-model-config=false"

NOTICE

  • The above LD_LIBRARY_PATH and backend-config must be set Because the default backend is tf1.

Community

Acknowledgment

We are very grateful to the maintainers of tensorflow/addons for borrowing a lot of code from tensorflow/addons to build our workflow and documentation system. We also want to extend a thank you to the Google team members who have helped with CI setup and reviews!

License

Apache License 2.0

recommenders-addons's People

Contributors

a6802739 avatar acmore avatar alionkun avatar bashimao avatar candyzone avatar ccsquare avatar dakabang avatar fuhailin avatar funsimple avatar hxfxjun avatar jq avatar lifann avatar lingelin avatar lixiang-repo avatar luliyucoordinate avatar mofheka avatar mr-nineteen avatar nov11 avatar nrailg avatar poinwater avatar pwzer avatar qqsun8819 avatar rhdong avatar smilingday avatar thorneliu avatar thushv89 avatar tracyxzh001 avatar wangshengguang avatar xiangzez avatar yuanqingsunny avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

recommenders-addons's Issues

Problem with MirroredStrategy (Single Machine Multi GPU) on demo/movielens-100k-estimator

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
    • CentOS 7
  • TensorFlow version and how it was installed (source or binary):
    • $ pip list | grep tensorflow
      tensorflow 2.4.1
      tensorflow-addons 0.13.0
      tensorflow-datasets 4.4.0
      tensorflow-estimator 2.4.0
      tensorflow-gpu 2.4.1
      tensorflow-metadata 1.2.0
      tensorflow-recommenders-addons 0.2.0
  • TensorFlow-Recommenders-Addons version and how it was installed (source or binary):
    • see above
  • Python version:
    • Python 3.6.8
  • Is GPU used? (yes/no):
    yes

Describe the bug
Hi,

I was able to run the demo (demo/movielens-100k-estimator) as it is, with no distributed strategy.

However when I have tried to make it compatible with tf.distribute.MirroredStrategy() on GPU, I was not able to run through the program.

Code to reproduce the issue
Only 3 places I have changed within the code:

  • the loss function (in func def model_fn()):
loss_obj = tf.keras.losses.MeanSquaredError(reduction=tf.keras.losses.Reduction.NONE)
loss = tf.reduce_sum(loss_obj(labels, predictions)) * (1. / GLOBAL_BATCH_SIZE)
# GLOBAL_BATCH_SIZE = 256
  • adding the distribute strategy in Estimator run_config (in func def train(model_dir, ps_num)):
def train(model_dir, ps_num):
  mirrored_strategy = tf.distribute.MirroredStrategy()
  run_config = tf.estimator.RunConfig(log_step_count_steps=100,
                                      save_summary_steps=100,
                                      save_checkpoints_steps=100,
                                      save_checkpoints_secs=None,
                                      keep_checkpoint_max=2,
                                      train_distribute=mirrored_strategy,
                                      eval_distribute=mirrored_strategy)
  • I removed the TF_CONFIG settings in main as it is not need in MirroredStrategy

Log shows error in Add_N op with dimension mismatch:
(from personal debugging process I guess the problem is at optimizer apply gradients)

.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1856, in _create_c_op
    raise ValueError(str(e))
ValueError: Dimension 0 in both shapes must be equal, but are 64 and 256. Shapes are [64,256] and [256,64].
        From merging shape 0 with other shapes. for '{{node AddN_2}} = AddN[N=2, T=DT_FLOAT](gradients/dense/MatMul_grad/tuple/control_dependency_1, replica_1/gradients/replica_1/dense_4/MatMul_grad/tuple/control_dependency_1)' with input shapes: [64,256], [256,64].

Provide a reproducible test case that is the bare minimum necessary to generate the problem.

Other info / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

saved_model_cli run demo throw error

try to test demo exported model by saved_model_cli;

saved_model_cli run --dir ./export_dir/1625044085/ --tag_set serve --signature_def serving_default --input_exprs 'user_id=np.random.rand(1,100).astype(np.int);movie_id=np.random.rand(1,100).astype(np.int);user_rating=np.random.rand(1,100).astype(np.float)'

ValueError: Tensor embedding_lookup/TrainableWrapper is not found in /data/increment_model checkpoint

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
  • TensorFlow version and how it was installed (source or binary): docker img 2.4.1 tensorflow-gpu
  • TensorFlow-Recommenders-Addons version and how it was installed (source or binary): binary
  • Python version: 3.6.9
  • Is GPU used? (yes/no): yes

Describe the bug

I was training in incremental mode. The first training did not report an error. During the second training, an error was reported when the checkpoint of the model was loaded.

Traceback (most recent call last):
File "./main.py", line 29, in
tf.app.run(main)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 303, in run
_run_main(main, args)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "./main.py", line 19, in main
model.train()
File "/**************/model.py", line ***, in train
self.estimator.train(self.train_data, steps=self.training_steps)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 349, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1175, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1208, in _train_model_default
saving_listeners)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1388, in _train_with_estimator_spec
tf.compat.v1.train.warm_start(*self._warm_start_settings)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/warm_starting_util.py", line 532, in warm_start
checkpoint_utils.init_from_checkpoint(ckpt_to_initialize_from, vocabless_vars)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/checkpoint_utils.py", line 306, in init_from_checkpoint
init_from_checkpoint_fn)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py", line 2941, in merge_call
return self._merge_call(merge_fn, args, kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py", line 2948, in _merge_call
return merge_fn(self._strategy, *args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/impl/api.py", line 572, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/checkpoint_utils.py", line 301, in
ckpt_dir_or_file, assignment_map)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/checkpoint_utils.py", line 334, in _init_from_checkpoint
tensor_name_in_ckpt, ckpt_dir_or_file, variable_map
ValueError: Tensor embedding_lookup/TrainableWrapper is not found in /data/increment_model checkpoint

EmbeddingVariable support shard

The recommendation models are usually very big, say 1T. So the single embedding variable is too small to hold so much parameters. It's a good idea to extend embedding variable to partitioned variable

https://github.com/tensorflow/recommenders-addons/blob/master/tensorflow_recommenders_addons/embedding_variable/python/ops/embedding_variable_ops.py

Relevant information

  • Are you willing to contribute it (no):
  • Are you willing to maintain it going forward? (yes):
  • Is there a relevant academic paper? (no, but the tf code itself can saver as a good example):
  • Is there already an implementation in another framework? (no):
  • Was it part of tf.contrib? (no, just in TFRA):

Which API type would this fall under (layer, metric, optimizer, etc.)

Who will benefit with this feature?
The community users will do benefit from it.

Any other info.

[BUG] eager model savemodel, Abnormal : "ValueError: 'Tensor("checkpoint_key:0", shape=(), dtype=string)_table_restore' is not a valid scope name"

Scene recurrence:
conda env:

  1. python version : 3
  2. TF related installation library:
tensorflow                     2.4.1
tensorflow-datasets            4.2.0
tensorflow-estimator           2.4.0
tensorflow-metadata            0.29.0
tensorflow-recommenders-addons 0.1.0
  1. test code:
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Dense

import tensorflow_datasets as tfds
import tensorflow_recommenders_addons as tfra
import tensorflow_recommenders_addons.dynamic_embedding as dynamic_embedding

ratings = tfds.load("movielens/100k-ratings", split="train")

ratings = ratings.map(lambda x: {
    "movie_id": tf.strings.to_number(x["movie_id"], tf.int64),
    "user_id": tf.strings.to_number(x["user_id"], tf.int64),
    "user_rating": x["user_rating"]
})

tf.random.set_seed(2021)
shuffled = ratings.shuffle(100_000, seed=2021, reshuffle_each_iteration=False)

dataset_train = shuffled.take(100_000).batch(256)


class NCFModel(tf.keras.Model):

    def __init__(self):
        super(NCFModel, self).__init__()
        self.embedding_size = 32
        self.d0 = Dense(
            256,
            activation='relu',
            kernel_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1),
            bias_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1))
        self.d1 = Dense(
            64,
            activation='relu',
            kernel_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1),
            bias_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1))
        self.d2 = Dense(
            1,
            kernel_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1),
            bias_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1))
        self.user_embeddings = tfra.dynamic_embedding.get_variable(
            name="user_dynamic_embeddings",
            dim=self.embedding_size,
            initializer=tf.keras.initializers.RandomNormal(-1, 1))
        self.movie_embeddings = tfra.dynamic_embedding.get_variable(
            name="moive_dynamic_embeddings",
            dim=self.embedding_size,
            initializer=tf.keras.initializers.RandomNormal(-1, 1))
        self.loss = tf.keras.losses.MeanSquaredError()

    def call(self, batch):
        movie_id = batch["movie_id"]
        user_id = batch["user_id"]
        rating = batch["user_rating"]

        user_id_val, user_id_idx = tf.unique(user_id)
        user_id_weights = tfra.dynamic_embedding.embedding_lookup(
            params=self.user_embeddings,
            ids=user_id_val,
            name="user-id-weights")
        user_id_weights = tf.gather(user_id_weights, user_id_idx)

        movie_id_val, movie_id_idx = tf.unique(movie_id)
        movie_id_weights = tfra.dynamic_embedding.embedding_lookup(
            params=self.movie_embeddings,
            ids=movie_id_val,
            name="movie-id-weights")
        movie_id_weights = tf.gather(movie_id_weights, movie_id_idx)

        embeddings = tf.concat([user_id_weights, movie_id_weights], axis=1)
        dnn = self.d0(embeddings)
        dnn = self.d1(dnn)
        dnn = self.d2(dnn)
        out = tf.reshape(dnn, shape=[-1])
        loss = self.loss(rating, out)
        return loss


model = NCFModel()
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
optimizer = tfra.dynamic_embedding.DynamicEmbeddingOptimizer(optimizer)


def train(epoch=1):
    for i in range(epoch):
        total_loss = np.array([])
        for (_, batch) in enumerate(dataset_train):
            with tf.GradientTape() as tape:
                loss = model(batch)
                total_loss = np.append(total_loss, loss)
            grads = tape.gradient(loss, model.trainable_variables)
            optimizer.apply_gradients(zip(grads, model.trainable_variables))
        print("epoch:", i, "mean_squared_error:", np.mean(total_loss))


if __name__ == "__main__":
    train(1)

    model.save("models/de")

  1. Error message:
Traceback (most recent call last):
  File "/Users/dxwang/repo/pycharm-repo/tensorflow-2.0-demo/com/opensource/demo/tfra/de-demo.py", line 100, in <module>
    model.save("models/de")
  File "/Users/dxwang/opt/anaconda3/envs/tfra/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 2002, in save
    signatures, options, save_traces)
  File "/Users/dxwang/opt/anaconda3/envs/tfra/lib/python3.6/site-packages/tensorflow/python/keras/saving/save.py", line 157, in save_model
    signatures, options, save_traces)
  File "/Users/dxwang/opt/anaconda3/envs/tfra/lib/python3.6/site-packages/tensorflow/python/keras/saving/saved_model/save.py", line 89, in save
    save_lib.save(model, filepath, signatures, options)
  File "/Users/dxwang/opt/anaconda3/envs/tfra/lib/python3.6/site-packages/tensorflow/python/saved_model/save.py", line 1033, in save
    obj, signatures, options, meta_graph_def)
  File "/Users/dxwang/opt/anaconda3/envs/tfra/lib/python3.6/site-packages/tensorflow/python/saved_model/save.py", line 1198, in _build_meta_graph
    return _build_meta_graph_impl(obj, signatures, options, meta_graph_def)
  File "/Users/dxwang/opt/anaconda3/envs/tfra/lib/python3.6/site-packages/tensorflow/python/saved_model/save.py", line 1147, in _build_meta_graph_impl
    _ = _SaveableView(checkpoint_graph_view, options)
  File "/Users/dxwang/opt/anaconda3/envs/tfra/lib/python3.6/site-packages/tensorflow/python/saved_model/save.py", line 195, in __init__
    self._add_saveable_objects())
  File "/Users/dxwang/opt/anaconda3/envs/tfra/lib/python3.6/site-packages/tensorflow/python/saved_model/save.py", line 254, in _add_saveable_objects
    saveable_map = saveable_object_util.trace_save_restore_functions(node)
  File "/Users/dxwang/opt/anaconda3/envs/tfra/lib/python3.6/site-packages/tensorflow/python/training/saving/saveable_object_util.py", line 386, in trace_save_restore_functions
    saveable_factory, object_to_save)
  File "/Users/dxwang/opt/anaconda3/envs/tfra/lib/python3.6/site-packages/tensorflow/python/training/saving/saveable_object_util.py", line 442, in _trace_save_and_restore_function
    concrete_restore_fn = restore_fn.get_concrete_function()
  File "/Users/dxwang/opt/anaconda3/envs/tfra/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 1299, in get_concrete_function
    concrete = self._get_concrete_function_garbage_collected(*args, **kwargs)
  File "/Users/dxwang/opt/anaconda3/envs/tfra/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 1205, in _get_concrete_function_garbage_collected
    self._initialize(args, kwargs, add_initializers_to=initializers)
  File "/Users/dxwang/opt/anaconda3/envs/tfra/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 726, in _initialize
    *args, **kwds))
  File "/Users/dxwang/opt/anaconda3/envs/tfra/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 2969, in _get_concrete_function_internal_garbage_collected
    graph_function, _ = self._maybe_define_function(args, kwargs)
  File "/Users/dxwang/opt/anaconda3/envs/tfra/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 3361, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/Users/dxwang/opt/anaconda3/envs/tfra/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 3206, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "/Users/dxwang/opt/anaconda3/envs/tfra/lib/python3.6/site-packages/tensorflow/python/framework/func_graph.py", line 990, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/Users/dxwang/opt/anaconda3/envs/tfra/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 634, in wrapped_fn
    out = weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "/Users/dxwang/opt/anaconda3/envs/tfra/lib/python3.6/site-packages/tensorflow/python/framework/func_graph.py", line 977, in wrapper
    raise e.ag_error_metadata.to_exception(e)
ValueError: in user code:

    /Users/dxwang/opt/anaconda3/envs/tfra/lib/python3.6/site-packages/tensorflow/python/training/saving/saveable_object_util.py:439 restore_fn  *
        saveable.restore(restored_tensors, restored_shapes=None)
    /Users/dxwang/opt/anaconda3/envs/tfra/lib/python3.6/site-packages/tensorflow_recommenders_addons/dynamic_embedding/python/ops/cuckoo_hashtable_ops.py:305 restore  *
        with ops.name_scope("%s_table_restore" % self.table_name):
    /Users/dxwang/opt/anaconda3/envs/tfra/lib/python3.6/site-packages/tensorflow/python/framework/ops.py:6507 __enter__
        return self._name_scope.__enter__()
    /Users/dxwang/opt/anaconda3/envs/tfra/lib/python3.6/contextlib.py:81 __enter__
        return next(self.gen)
    /Users/dxwang/opt/anaconda3/envs/tfra/lib/python3.6/site-packages/tensorflow/python/framework/ops.py:4246 name_scope
        raise ValueError("'%s' is not a valid scope name" % name)

    ValueError: 'Tensor("checkpoint_key:0", shape=(), dtype=string)_table_restore' is not a valid scope name

[Bug]Some Redis services do not support the dangerous "KEYS" command

Describe the bug

Command "KEYS" was used in the GetKeyBucketsAndOptimizerParamsWithName function by DE redis backend. But Command "KEYS" may cause lots of trouble, such as a serious service delay. Command "SCAN" should be a better choice. In fact, "KEYS" are banned in many companies' Redis services.

Other info / logs

Some user report this problem to me.

BUG: embedding shape is unknown in custom Keras Model

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): CentOS Linux release 7.2
  • TensorFlow version and how it was installed (source or binary): 2.4.1 binary
  • TensorFlow-Recommenders-Addons version and how it was installed (source or binary): 0.2.0. binary
  • Python version: 3.8.3
  • Is GPU used? (yes/no): no

Describe the bug

The model can be executed successfully when I set model.compile(run_eagerly=True).
But when I speed up my training step with model.compile(run_eagerly=True), the model raise an exception:

The last dimension of the inputs to Dense should be defined. Found None

user_id_weights.shape is unknown which reads from EmbeddingVariable.
I think it must return a tensor with correct tensor shape.

Code to reproduce the issue

import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Dense
import tensorflow_datasets as tfds
import tensorflow_recommenders_addons as tfra
import time

ratings = tfds.load("movielens/100k-ratings", split="train")

ratings = ratings.map(lambda x: {
    "movie_id": tf.strings.to_number(x["movie_id"], tf.int64),
    "user_id": tf.strings.to_number(x["user_id"], tf.int64),
    "user_rating": x["user_rating"]
})

tf.random.set_seed(2021)
shuffled = ratings.shuffle(10000, seed=2021, reshuffle_each_iteration=False)

dataset_train = shuffled.take(10000).batch(256)


class NCFModel(tf.keras.Model):

    def __init__(self):
        super(NCFModel, self).__init__()
        self.embedding_size = 32
        self.d0 = Dense(
            256,
            activation='relu',
            kernel_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1),
            bias_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1))
        self.d1 = Dense(
            64,
            activation='relu',
            kernel_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1),
            bias_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1))
        self.d2 = Dense(
            1,
            kernel_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1),
            bias_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1))
        self.user_embeddings = tfra.embedding_variable.EmbeddingVariable(
            name="user_dynamic_embeddings",
            embedding_dim=self.embedding_size,
            initializer=tf.keras.initializers.RandomNormal(-1, 1),
            ktype=tf.int64)
        self.movie_embeddings = tfra.embedding_variable.EmbeddingVariable(
            name="moive_dynamic_embeddings",
            embedding_dim=self.embedding_size,
            initializer=tf.keras.initializers.RandomNormal(-1, 1),
            ktype=tf.int64)
        self.loss = tf.keras.losses.MeanSquaredError()

    def call(self, batch):
        movie_id = batch["movie_id"]
        user_id = batch["user_id"]
        rating = batch["user_rating"]

        user_id_val = tf.unique(user_id, out_idx=tf.int64)
        user_id_weights = tf.nn.embedding_lookup(
            params=self.user_embeddings,
            ids=user_id_val,
            name="user-id-weights")

        print('----'*10)
        print(user_id_weights.shape)
        print('----'*10)

        movie_id_val = tf.unique(movie_id, out_idx=tf.int64)
        movie_id_weights = tf.nn.embedding_lookup(
            params=self.movie_embeddings,
            ids=movie_id_val,
            name="movie-id-weights")

        embeddings = tf.concat([user_id_weights, movie_id_weights], axis=1)
        dnn = self.d0(embeddings)
        dnn = self.d1(dnn)
        dnn = self.d2(dnn)
        out = tf.reshape(dnn, shape=[-1])
        loss = self.loss(rating, out)
        return loss

    def train_step(self, data):
        with tf.GradientTape() as tape:
            loss= self(data)
        trainable_vars = self.trainable_variables
        grads = tape.gradient(loss, trainable_vars )
        self.optimizer.apply_gradients(zip(grads, trainable_vars))

        return {"loss": self.metrics_loss.result()}

optimizer = tfra.embedding_variable.AdagradOptimizer(learning_rate=0.001)

model = NCFModel()
model.compile(optimizer=optimizer, run_eagerly=False, loss=tf.keras.losses.MeanSquaredError())
model.fit(dataset_train)

Other info / logs

----------------------------------------
<unknown>
----------------------------------------


ValueError: in user code:

    /home/hdp-reader-st/anaconda3/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:805 train_function  *
        return step_function(self, iterator)
    <ipython-input-3-5cd1b54d36de>:52 call  *
        dnn = self.d0(embeddings)
    /home/hdp-reader-st/anaconda3/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer.py:1008 __call__  **
        self._maybe_build(inputs)
    /home/hdp-reader-st/anaconda3/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer.py:2710 _maybe_build
        self.build(input_shapes)  # pylint:disable=not-callable
    /home/hdp-reader-st/anaconda3/lib/python3.8/site-packages/tensorflow/python/keras/layers/core.py:1182 build
        raise ValueError('The last dimension of the inputs to `Dense` '

    ValueError: The last dimension of the inputs to `Dense` should be defined. Found `None`.

KeyError: "The name 'xxxxxxxx' refers to an Operation not in the graph."

Environment description:

  1. python version : 3
  2. TF related installation library:
tensorflow                     2.4.1
tensorflow-datasets            4.2.0
tensorflow-estimator           2.4.0
tensorflow-metadata            0.29.0
tensorflow-recommenders-addons 0.1.0
  1. Test model address: https://github.com/Mr-Nineteen/models/blob/main/1617949817.zip
  2. test code:
import tensorflow as tf
import tensorflow_recommenders_addons as tfra

m = tf.saved_model.load("1617949817")
  1. Error message:
Traceback (most recent call last):
  File "/Users/dxwang/repo/pycharm-repo/tensorflow-2.0-demo/com/opensource/demo/tftrt/demo.py", line 4, in <module>
    m = tf.saved_model.load("1617949817")
  File "/Users/dxwang/opt/anaconda3/envs/tfra/lib/python3.6/site-packages/tensorflow/python/saved_model/load.py", line 859, in load
    return load_internal(export_dir, tags, options)["root"]
  File "/Users/dxwang/opt/anaconda3/envs/tfra/lib/python3.6/site-packages/tensorflow/python/saved_model/load.py", line 909, in load_internal
    root = load_v1_in_v2.load(export_dir, tags)
  File "/Users/dxwang/opt/anaconda3/envs/tfra/lib/python3.6/site-packages/tensorflow/python/saved_model/load_v1_in_v2.py", line 279, in load
    return loader.load(tags=tags)
  File "/Users/dxwang/opt/anaconda3/envs/tfra/lib/python3.6/site-packages/tensorflow/python/saved_model/load_v1_in_v2.py", line 225, in load
    signature=[])
  File "/Users/dxwang/opt/anaconda3/envs/tfra/lib/python3.6/site-packages/tensorflow/python/eager/wrap_function.py", line 628, in wrap_function
    collections={}),
  File "/Users/dxwang/opt/anaconda3/envs/tfra/lib/python3.6/site-packages/tensorflow/python/framework/func_graph.py", line 990, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/Users/dxwang/opt/anaconda3/envs/tfra/lib/python3.6/site-packages/tensorflow/python/eager/wrap_function.py", line 87, in __call__
    return self.call_with_variable_creator_scope(self._fn)(*args, **kwargs)
  File "/Users/dxwang/opt/anaconda3/envs/tfra/lib/python3.6/site-packages/tensorflow/python/eager/wrap_function.py", line 93, in wrapped
    return fn(*args, **kwargs)
  File "/Users/dxwang/opt/anaconda3/envs/tfra/lib/python3.6/site-packages/tensorflow/python/saved_model/load_v1_in_v2.py", line 93, in load_graph
    meta_graph_def)
  File "/Users/dxwang/opt/anaconda3/envs/tfra/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1485, in _import_meta_graph_with_return_elements
    **kwargs))
  File "/Users/dxwang/opt/anaconda3/envs/tfra/lib/python3.6/site-packages/tensorflow/python/framework/meta_graph.py", line 886, in import_scoped_meta_graph_with_return_elements
    ops.prepend_name_scope(value, scope_to_prepend_to_names))
  File "/Users/dxwang/opt/anaconda3/envs/tfra/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3726, in as_graph_element
    return self._as_graph_element_locked(obj, allow_tensor, allow_operation)
  File "/Users/dxwang/opt/anaconda3/envs/tfra/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3786, in _as_graph_element_locked
    "graph." % repr(name))
KeyError: "The name 'deep_dynamic_embeddings' refers to an Operation not in the graph."

Any user guide code about restrict_policies

I'd like to use tfra.dynamic_embedding.FrequencyRestrictPolicy or tfra.dynamic_embedding.TimestampRestrictPolicy ,could u help to offer some example codes about how to using these two api.And the other question is that it need to specify the num_reserved in this apis,what means the model will always the same number of parameters?

can not load file "_ev_ops.so" on TF1.15.2

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
    Darwin MacBookPro 20.6.0 Darwin Kernel Version 20.6.0: Wed Jun 23 00:26:31 PDT 2021; root:xnu-7195.141.2~5/RELEASE_X86_64 x86_64
  • TensorFlow version and how it was installed (source or binary):
    tf 1.15.2 install in conda by pip install
  • TensorFlow-Recommenders-Addons version and how it was installed (source or binary):
    tfra 0.2.0 install in conda by pip install
  • Python version:
    3.6.9
  • Is GPU used? (yes/no):
    no
    Describe the bug
    when install tfra,tf will auto update to 2.4.1,but tf1.0 was used in my code, so I reinstall tf to 1.15.2
    then I run demo with tfra.ev,error raise when tfra init

A clear and concise description of what the bug is.
Traceback (most recent call last):
File "/path/ev.py", line 5, in
import tensorflow_recommenders_addons as tfra
File "/path/py3/lib/python3.6/site-packages/tensorflow_recommenders_addons/init.py", line 31, in
from tensorflow_recommenders_addons import embedding_variable
File "/path/py3/lib/python3.6/site-packages/tensorflow_recommenders_addons/embedding_variable/init.py", line 1, in
from tensorflow_recommenders_addons.embedding_variable.python.ops.embedding_variable_ops import *
File "/path/py3/lib/python3.6/site-packages/tensorflow_recommenders_addons/embedding_variable/python/init.py", line 55, in
gen_ev_ops = _load_library("../core/_ev_ops.so")
File "/path/py3/lib/python3.6/site-packages/tensorflow_recommenders_addons/embedding_variable/python/init.py", line 52, in _load_library
"{}, from paths: {}\ncaused by: {}".format(filename, filenames, errs))
NotImplementedError: unable to open file: ../core/_ev_ops.so, from paths: ['/path/py3/lib/python3.6/site-packages/tensorflow_recommenders_addons/embedding_variable/python/../core/_ev_ops.so']
caused by: ['dlopen(/path/py3/lib/python3.6/site-packages/tensorflow_recommenders_addons/embedding_variable/python/../core/_ev_ops.so, 6): Library not loaded: @rpath/libtensorflow_framework.2.dylib\n Referenced from: /path/py3/lib/python3.6/site-packages/tensorflow_recommenders_addons/embedding_variable/core/_ev_ops.so\n Reason: image not found']

Code to reproduce the issue
import tensorflow_recommenders_addons as tfra

Provide a reproducible test case that is the bare minimum necessary to generate the problem.

Other info / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

dynamic_embedding distribute train and restore

1 train mode(Parameter Server Training)
devices = []
for ind in range(len(json.loads(os.environ['TF_CONFIG'])['cluster']['ps'])):
devices.append("/job:ps/task:%s/cpu:0"%ind)
2 restore from single local machine
psnum: the num of parameter machine
devices = ["/job:localhost/task:0"]*psnum

embeddings = tfra.dynamic_embedding.get_variable(
name = name,
dim=dim,
initializer= initializer,
devices = devices,
)

Bug: Dynamic embedding is not included in model.trainable_variables.

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): TF on Yarn.
  • TensorFlow version and how it was installed (source or binary): 2.4.1
  • TensorFlow-Recommenders-Addons version and how it was installed (source or binary):binary
  • Python version: 3.6
  • Is GPU used? (yes/no):no

Describe the bug

Using example:
https://github.com/tensorflow/recommenders-addons/blob/master/docs/tutorials/dynamic_embedding_tutorial.ipynb

After running, I print all trainable_variables in model:
self.user_embeddings and self.movie_embeddings is not included in model.trainable_variables which means tape.gradient(loss, model.trainable_variables) will not apply on these embeddings.

Code to reproduce the issue

As decribe above.

EmbeddingVariable doesn't work with tf.saved_model.load

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Manjaro
  • TensorFlow version and how it was installed (source or binary): binary
  • TensorFlow-Recommenders-Addons version and how it was installed (source or binary): binary
  • Python version: 3.7.9
  • Is GPU used? (yes/no): no

Describe the bug

tf.saved_model.load fails to load a saved model containing EmbeddingVariable

Code to reproduce the issue

class MyModel(tf.keras.models.Model):
    def __init__(self):
        super().__init__()
        self.user_embeddings = tfra.embedding_variable.EmbeddingVariable(
            name="user_embeddings",
            ktype=tf.int32,
            embedding_dim=8,
            initializer=tf.keras.initializers.RandomNormal(-1, 1)
        )

model = MyModel()
# ... arbitrary model training code
model.save("export_path", include_optimizer=False) # ok
tf.saved_model.load("export_path") # error!

Other info / logs
Error message

---------------------------------------------------------------------------
NotFoundError                             Traceback (most recent call last)
~/Projects/tfra_test/venv/lib/python3.7/site-packages/tensorflow/python/saved_model/load.py in load_internal(export_dir, tags, options, loader_cls, filters)
    889         loader = loader_cls(object_graph_proto, saved_model_proto, export_dir,
--> 890                             ckpt_options, filters)
    891       except errors.NotFoundError as err:

~/Projects/tfra_test/venv/lib/python3.7/site-packages/tensorflow/python/saved_model/load.py in __init__(self, object_graph_proto, saved_model_proto, export_dir, ckpt_options, filters)
    160     self._load_all()
--> 161     self._restore_checkpoint()
    162 

~/Projects/tfra_test/venv/lib/python3.7/site-packages/tensorflow/python/saved_model/load.py in _restore_checkpoint(self)
    486     else:
--> 487       load_status = saver.restore(variables_path, self._checkpoint_options)
    488     load_status.assert_existing_objects_matched()

~/Projects/tfra_test/venv/lib/python3.7/site-packages/tensorflow/python/training/tracking/util.py in restore(self, save_path, options)
   1336     base.CheckpointPosition(
-> 1337         checkpoint=checkpoint, proto_id=0).restore(self._graph_view.root)
   1338 

~/Projects/tfra_test/venv/lib/python3.7/site-packages/tensorflow/python/training/tracking/base.py in restore(self, trackable)
    252         # process deferred restorations for it and its dependencies.
--> 253         restore_ops = trackable._restore_from_checkpoint_position(self)  # pylint: disable=protected-access
    254         if restore_ops:

~/Projects/tfra_test/venv/lib/python3.7/site-packages/tensorflow/python/training/tracking/base.py in _restore_from_checkpoint_position(self, checkpoint_position)
    972         current_position.checkpoint.restore_saveables(
--> 973             tensor_saveables, python_saveables))
    974     return restore_ops

~/Projects/tfra_test/venv/lib/python3.7/site-packages/tensorflow/python/training/tracking/util.py in restore_saveables(self, tensor_saveables, python_saveables)
    307       new_restore_ops = functional_saver.MultiDeviceSaver(
--> 308           validated_saveables).restore(self.save_path_tensor, self.options)
    309       if not context.executing_eagerly():

~/Projects/tfra_test/venv/lib/python3.7/site-packages/tensorflow/python/training/saving/functional_saver.py in restore(self, file_prefix, options)
    344     else:
--> 345       restore_ops = restore_fn()
    346 

~/Projects/tfra_test/venv/lib/python3.7/site-packages/tensorflow/python/training/saving/functional_saver.py in restore_fn()
    320         with ops.device(device):
--> 321           restore_ops.update(saver.restore(file_prefix, options))
    322 

~/Projects/tfra_test/venv/lib/python3.7/site-packages/tensorflow/python/training/saving/functional_saver.py in restore(self, file_prefix, options)
    108       restored_tensors = io_ops.restore_v2(
--> 109           file_prefix, tensor_names, tensor_slices, tensor_dtypes)
    110     structured_restored_tensors = nest.pack_sequence_as(

~/Projects/tfra_test/venv/lib/python3.7/site-packages/tensorflow/python/ops/gen_io_ops.py in restore_v2(prefix, tensor_names, shape_and_slices, dtypes, name)
   1498           prefix, tensor_names, shape_and_slices, dtypes=dtypes, name=name,
-> 1499           ctx=_ctx)
   1500     except _core._SymbolicException:

~/Projects/tfra_test/venv/lib/python3.7/site-packages/tensorflow/python/ops/gen_io_ops.py in restore_v2_eager_fallback(prefix, tensor_names, shape_and_slices, dtypes, name, ctx)
   1536   _result = _execute.execute(b"RestoreV2", len(dtypes), inputs=_inputs_flat,
-> 1537                              attrs=_attrs, ctx=ctx, name=name)
   1538   if _execute.must_record_gradient():

~/Projects/tfra_test/venv/lib/python3.7/site-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     59     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 60                                         inputs, attrs, num_outputs)
     61   except core._NotOkStatusException as e:

NotFoundError: Key user_embeddings/.ATTRIBUTES/VARIABLE_VALUE not found in checkpoint [Op:RestoreV2]

During handling of the above exception, another exception occurred:

FileNotFoundError                         Traceback (most recent call last)
<ipython-input-150-bc5be88f8feb> in <module>
----> 1 tf.saved_model.load("export_path")

~/Projects/tfra_test/venv/lib/python3.7/site-packages/tensorflow/python/saved_model/load.py in load(export_dir, tags, options)
    857     ValueError: If `tags` don't match a MetaGraph in the SavedModel.
    858   """
--> 859   return load_internal(export_dir, tags, options)["root"]
    860 
    861 

~/Projects/tfra_test/venv/lib/python3.7/site-packages/tensorflow/python/saved_model/load.py in load_internal(export_dir, tags, options, loader_cls, filters)
    891       except errors.NotFoundError as err:
    892         raise FileNotFoundError(
--> 893             str(err) + "\n If trying to load on a different device from the "
    894             "computational device, consider using setting the "
    895             "`experimental_io_device` option on tf.saved_model.LoadOptions "

FileNotFoundError: Key user_embeddings/.ATTRIBUTES/VARIABLE_VALUE not found in checkpoint [Op:RestoreV2]
 If trying to load on a different device from the computational device, consider using setting the `experimental_io_device` option on tf.saved_model.LoadOptions to the io_device such as '/job:localhost'.

2021-08-25T15:28:56.666 session_manager.py 436 INFO Waiting for model to be ready. Ready_for_local_init_op: Variables not initialized: deep_scope/embedding/item_deep/vector, deep_scope/embedding/cid_deep/vector, deep_scope/embedding/actor_deep/vector, ready: None

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):Ubuntu 18
  • TensorFlow version and how it was installed (source or binary): 1.15.2
  • TensorFlow-Recommenders-Addons version and how it was installed (source or binary): source
  • Python version:3.6.9
  • Is GPU used? (yes/no):no

Describe the bug

chief is normal, but worker is doesn't work
2021-08-25T15:28:56.666 session_manager.py 436 INFO Waiting for model to be ready. Ready_for_local_init_op: Variables not initialized: deep_scope/embedding/item_deep/vector, deep_scope/embedding/cid_deep/vector, deep_scope/embedding/actor_deep/vector, ready: None

Code to reproduce the issue

        deep_emb = tfra.embedding_variable.EmbeddingVariable(
            name=deep_scope,
            ktype=tf.int32,
            embedding_dim=emb_dim,
            initializer=tf.keras.initializers.Zeros())

       opt = tf.train.AdagradOptimizer(lr, use_locking=True)

Provide a reproducible test case that is the bare minimum necessary to generate the problem.

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

ev example改为多个worker后,只有worker0可以训练,其它worker一直在等待初始化无法训练

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
  • TensorFlow version and how it was installed (source or binary):
  • TensorFlow-Recommenders-Addons version and how it was installed (source or binary):
  • Python version:
  • Is GPU used? (yes/no):

python ev-norm-ps.py --ps_hosts=127.0.0.1:9200 --worker_hosts=127.0.0.1:9100,127.0.0.1:9101 --task_index=0 --job_name=ps
python ev-norm-ps.py --ps_hosts=127.0.0.1:9200 --worker_hosts=127.0.0.1:9100,127.0.0.1:9101 --task_index=0 --job_name=worker
python ev-norm-ps.py --ps_hosts=127.0.0.1:9200 --worker_hosts=127.0.0.1:9100,127.0.0.1:9101 --task_index=1 --job_name=worker

A clear and concise description of what the bug is.

INFO:tensorflow:Waiting for model to be ready. Ready_for_local_init_op: Variables not initialized: var_dist, var_dist/Adagrad/EmbeddingVariable, ready: None
I0827 15:12:00.834718 4358843904 session_manager.py:436] Waiting for model to be ready. Ready_for_local_init_op: Variables not initialized: var_dist, var_dist/Adagrad/EmbeddingVariable, ready: None

Provide a reproducible test case that is the bare minimum necessary to generate the problem.

Other info / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.
@liutongxuan

Tutorial document "embedding_variable_tutorial.ipynb" raise an error

System information

  • OS Platform: Google Colab / macOS Big Sur 11.2.3
  • TensorFlow version and how it was installed (source or binary): 2.4.1 (Colab), 2.4.0-rc0 (0.a3)
  • TensorFlow-Recommenders-Addons version and how it was installed (source or binary):
    The Colab's addon was installed via !pip install -q --upgrade tensorflow-recommenders-addons
    My local system's addon was installed via binary code.
  • Python version: 3.7.10 (Colab), 3.8.8 (Local)
  • Is GPU used? (yes/no):
    I've tried None, and GPU on Colab. I don't know if the binary one uses a GPU or not.

Describe the bug
The cell after 4. Train the model raises an exception:
image

My local env:
image
image
image

FYI, I added a cell at the beginning of the ipynb file to install the addons:

!pip install -q --upgrade tensorflow-recommenders-addons
!pip install -q --upgrade tensorflow-datasets

The ipynb file with exception log:
embedding_variable_tutorial.ipynb

Other info / logs
Colab exception message:

InvalidArgumentError Traceback (most recent call last)
in ()
11
12 if name=="main":
---> 13 train(10)

10 frames
in train(epoch)
4 for (_, batch) in enumerate(dataset_train):
5 with tf.GradientTape() as tape:
----> 6 loss = model(batch)
7 total_loss = np.append(total_loss, loss)
8 grads = tape.gradient(loss, model.trainable_variables)

/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py in call(self, *args, **kwargs)
1010 with autocast_variable.enable_auto_cast_variables(
1011 self._compute_dtype_object):
-> 1012 outputs = call_fn(inputs, *args, **kwargs)
1013
1014 if self._activity_regularizer:

in call(self, batch)
37 params=self.user_embeddings,
38 ids=user_id_val,
---> 39 name="user-id-weights")
40 user_id_weights = tf.gather(user_id_weights, user_id_idx)
41

/usr/local/lib/python3.7/dist-packages/tensorflow/python/util/dispatch.py in wrapper(*args, **kwargs)
199 """Call target, and fall back on dispatchers if there is a TypeError."""
200 try:
--> 201 return target(*args, **kwargs)
202 except (TypeError, ValueError):
203 # Note: convert_to_eager_tensor currently raises a ValueError, not a

/usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/embedding_ops.py in embedding_lookup_v2(params, ids, max_norm, name)
392 ValueError: If params is empty.
393 """
--> 394 return embedding_lookup(params, ids, "div", name, max_norm=max_norm)
395
396

/usr/local/lib/python3.7/dist-packages/tensorflow/python/util/dispatch.py in wrapper(*args, **kwargs)
199 """Call target, and fall back on dispatchers if there is a TypeError."""
200 try:
--> 201 return target(*args, **kwargs)
202 except (TypeError, ValueError):
203 # Note: convert_to_eager_tensor currently raises a ValueError, not a

/usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/embedding_ops.py in embedding_lookup(params, ids, partition_strategy, name, validate_indices, max_norm)
326 name=name,
327 max_norm=max_norm,
--> 328 transform_fn=None)
329
330

/usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/embedding_ops.py in _embedding_lookup_and_transform(params, ids, partition_strategy, name, max_norm, transform_fn)
136 with ops.colocate_with(params[0]):
137 result = _clip(
--> 138 array_ops.gather(params[0], ids, name=name), ids, max_norm)
139 if transform_fn:
140 result = transform_fn(result)

/usr/local/lib/python3.7/dist-packages/tensorflow/python/util/dispatch.py in wrapper(*args, **kwargs)
199 """Call target, and fall back on dispatchers if there is a TypeError."""
200 try:
--> 201 return target(*args, **kwargs)
202 except (TypeError, ValueError):
203 # Note: convert_to_eager_tensor currently raises a ValueError, not a

/usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/array_ops.py in gather(failed resolving arguments)
4811 # TODO(apassos) find a less bad way of detecting resource variables
4812 # without introducing a circular dependency.
-> 4813 return params.sparse_read(indices, name=name)
4814 except AttributeError:
4815 return gen_array_ops.gather_v2(params, indices, axis, name=name)

/usr/local/lib/python3.7/dist-packages/tensorflow_recommenders_addons/embedding_variable/python/ops/embedding_variable_ops.py in sparse_read(self, indices, name)
425 raise errors_impl.InvalidArgumentError(
426 None, None,
--> 427 "type of indices is not match with EmbeddingVariable key type.")
428 with ops.name_scope("Gather" if name is None else name) as name:
429 resource_variable_ops.variable_accessed(self)

InvalidArgumentError: type of indices is not match with EmbeddingVariable key type.

Local exception message:

InvalidArgumentError Traceback (most recent call last)
in
11
12 if name=="main":
---> 13 train(10)

in train(epoch)
4 for (_, batch) in enumerate(dataset_train):
5 with tf.GradientTape() as tape:
----> 6 loss = model(batch)
7 total_loss = np.append(total_loss, loss)
8 grads = tape.gradient(loss, model.trainable_variables)

~/miniforge3/envs/tf24/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer.py in call(self, *args, **kwargs)
1005 with autocast_variable.enable_auto_cast_variables(
1006 self._compute_dtype_object):
-> 1007 outputs = call_fn(inputs, *args, **kwargs)
1008
1009 if self._activity_regularizer:

in call(self, batch)
34
35 user_id_val, user_id_idx = np.unique(user_id, return_inverse=True)
---> 36 user_id_weights = tf.nn.embedding_lookup(
37 params=self.user_embeddings,
38 ids=user_id_val,

~/miniforge3/envs/tf24/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py in wrapper(*args, **kwargs)
199 """Call target, and fall back on dispatchers if there is a TypeError."""
200 try:
--> 201 return target(*args, **kwargs)
202 except (TypeError, ValueError):
203 # Note: convert_to_eager_tensor currently raises a ValueError, not a

~/miniforge3/envs/tf24/lib/python3.8/site-packages/tensorflow/python/ops/embedding_ops.py in embedding_lookup_v2(params, ids, max_norm, name)
392 ValueError: If params is empty.
393 """
--> 394 return embedding_lookup(params, ids, "div", name, max_norm=max_norm)
395
396

~/miniforge3/envs/tf24/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py in wrapper(*args, **kwargs)
199 """Call target, and fall back on dispatchers if there is a TypeError."""
200 try:
--> 201 return target(*args, **kwargs)
202 except (TypeError, ValueError):
203 # Note: convert_to_eager_tensor currently raises a ValueError, not a

~/miniforge3/envs/tf24/lib/python3.8/site-packages/tensorflow/python/ops/embedding_ops.py in embedding_lookup(params, ids, partition_strategy, name, validate_indices, max_norm)
320 name=name)
321
--> 322 return _embedding_lookup_and_transform(
323 params=params,
324 ids=ids,

~/miniforge3/envs/tf24/lib/python3.8/site-packages/tensorflow/python/ops/embedding_ops.py in _embedding_lookup_and_transform(params, ids, partition_strategy, name, max_norm, transform_fn)
136 with ops.colocate_with(params[0]):
137 result = _clip(
--> 138 array_ops.gather(params[0], ids, name=name), ids, max_norm)
139 if transform_fn:
140 result = transform_fn(result)

~/miniforge3/envs/tf24/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py in wrapper(*args, **kwargs)
199 """Call target, and fall back on dispatchers if there is a TypeError."""
200 try:
--> 201 return target(*args, **kwargs)
202 except (TypeError, ValueError):
203 # Note: convert_to_eager_tensor currently raises a ValueError, not a

~/miniforge3/envs/tf24/lib/python3.8/site-packages/tensorflow/python/ops/array_ops.py in gather(failed resolving arguments)
4811 # TODO(apassos) find a less bad way of detecting resource variables
4812 # without introducing a circular dependency.
-> 4813 return params.sparse_read(indices, name=name)
4814 except AttributeError:
4815 return gen_array_ops.gather_v2(params, indices, axis, name=name)

~/miniforge3/envs/tf24/lib/python3.8/site-packages/tensorflow_recommenders_addons/embedding_variable/python/ops/embedding_variable_ops.py in sparse_read(self, indices, name)
437 """Reads the value of this variable sparsely, using gather."""
438 if indices.dtype != self._ktype:
--> 439 raise errors_impl.InvalidArgumentError(
440 None, None,
441 "type of indices is not match with EmbeddingVariable key type.")

InvalidArgumentError: type of indices is not match with EmbeddingVariable key type.

BUG: An op outside of the function building code is being passed when using dynamic_embedding in custom Keras Model

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): CentOS Linux release 7.2
  • TensorFlow version and how it was installed (source or binary): 2.4.1 binary
  • TensorFlow-Recommenders-Addons version and how it was installed (source or binary): 0.2.0. binary
  • Python version: 3.8.3
  • Is GPU used? (yes/no): no

Describe the bug

dynamic_embedding raises an exception when I speeding-up my training step with tf.function or model.compile(run_eagerly=False)

    TypeError: An op outside of the function building code is being passed
    a "Graph" tensor. It is possible to have Graph tensors
    leak out of the function building context by including a
    tf.init_scope in your function building code.
    For example, the following function will fail:
      @tf.function
      def has_init_scope():
        my_constant = tf.constant(1.)
        with tf.init_scope():
          added = my_constant * 2
    The graph tensor has name: ncf_model/user-id-weights/Unique:0

Code to reproduce the issue

import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Dense
import tensorflow_datasets as tfds
import tensorflow_recommenders_addons as tfra
import time

ratings = tfds.load("movielens/100k-ratings", split="train")

ratings = ratings.map(lambda x: {
    "movie_id": tf.strings.to_number(x["movie_id"], tf.int64),
    "user_id": tf.strings.to_number(x["user_id"], tf.int64),
    "user_rating": x["user_rating"]
})

tf.random.set_seed(2021)
shuffled = ratings.shuffle(10000, seed=2021, reshuffle_each_iteration=False)

dataset_train = shuffled.take(10000).batch(256)

class NCFModel(tf.keras.Model):
    def __init__(self):
        super(NCFModel, self).__init__()
        self.embedding_size = 32
        self.d0 = Dense(
            256,
            activation='relu',
            kernel_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1),
            bias_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1))
        self.d1 = Dense(
            64,
            activation='relu',
            kernel_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1),
            bias_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1))
        self.d2 = Dense(
            1,
            kernel_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1),
            bias_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1))
        self.user_embeddings = tfra.dynamic_embedding.get_variable(
            name="user_dynamic_embeddings",
            dim=self.embedding_size,
            initializer=tf.keras.initializers.RandomNormal(-1, 1))
        self.movie_embeddings = tfra.dynamic_embedding.get_variable(
            name="moive_dynamic_embeddings",
            dim=self.embedding_size,
            initializer=tf.keras.initializers.RandomNormal(-1, 1))
        self.loss = tf.keras.losses.MeanSquaredError()
        self.metrics_loss = tf.keras.metrics.Mean()
    
    def call(self, batch):
        movie_id = batch["movie_id"]
        user_id = batch["user_id"]
        rating = batch["user_rating"]

        user_id_weights, user_id_trainable_wrapper = tfra.dynamic_embedding.embedding_lookup_unique(
            params=self.user_embeddings,
            ids=user_id,
            name="user-id-weights",
            return_trainable=True)

        movie_id_weights, movie_id_trainable_wrapper = tfra.dynamic_embedding.embedding_lookup_unique(
            params=self.movie_embeddings,
            ids=movie_id,
            name="movie-id-weights",
            return_trainable=True)

        embeddings = tf.concat([user_id_weights, movie_id_weights], axis=1)
        dnn = self.d0(embeddings)
        dnn = self.d1(dnn)
        dnn = self.d2(dnn)
        out = tf.reshape(dnn, shape=[-1])
        loss = self.loss(rating, out)
        return loss, [user_id_trainable_wrapper, movie_id_trainable_wrapper]

    def train_step(self, data):
        with tf.GradientTape() as tape:
            loss, trainable_wrapper_list = self(data)
        trainable_vars = self.trainable_variables
        grads = tape.gradient(loss, trainable_vars + trainable_wrapper_list)
        self.optimizer.apply_gradients(zip(grads, trainable_vars + trainable_wrapper_list))

        return {"loss": self.metrics_loss.result()}

optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
optimizer = tfra.dynamic_embedding.DynamicEmbeddingOptimizer(optimizer)

model = NCFModel()
model.compile(optimizer=optimizer, run_eagerly=False, loss=tf.keras.losses.MeanSquaredError())
model.fit(dataset_train)

source build detail tutorial

1 build tfra
build other version tensorflow the tfra need update follow file.
1.1 requirements.txt
1.2 tensorflow_recommenders_addons/version.py:MIN_TF_VERSION, tensorflow_recommenders_addons/version.py:MAX_TF_VERSION

2 build tf-serving with tfra
2.1 git clone https://github.com/tensorflow/recommenders-addons.git
2.2 git clone -b r2.4 https://github.com/tensorflow/serving.git
2.3 cp -r recommenders-addons/tensorflow_recommenders_addons serving/
cp -r recommenders-addons/build_deps serving/
2.4 pip install tensorflow=={version}
2.5 cd recommenders-addons/
python recommenders-addons/configure.py
cat .bazelrc >> ../serving/.bazelrc
cat WORKSPACE >> ../serving/WORKSPACE
vim ../serving/WORKSPACE | and delete workspace(name = "tf_recommenders_addons")
update tensorflow_recommenders_addons/tensorflow_recommenders_addons.bzl

屏幕快照 2021-04-28 下午3 38 21

 cd ../

2.6 cd serving
update tensorflow_serving/model_servers/BUILD
屏幕快照 2021-04-28 下午3 49 48
cd ../
2.7 install docker
2.8 upgrade gcc,g++
2.9 cd serving/
tools/run_in_docker.sh bazel build tensorflow_serving/model_servers:tensorflow_model_server
need about four hours building.
2.10 test
tools/run_in_docker.sh -o "-p 8501:8501"
bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server
--rest_api_port=8501 --model_name=<model_name> --model_base_path=<model_base_path>

`namespace_whitelist` parameter problem

Please verify that these ops should be saved, since they must be available when loading the SavedModel. If loading from Python, you must import the library defining these ops. From C++, link the custom ops to the serving binary. Once you've confirmed this, please add the following namespaces to the namespace_whitelist argument in tf.saved_model.SaveOptions: {'TFRA'}.

Bug: Dynamic Embedding in Estimator doesn't support Warm Start.

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): TF on Yarn
  • TensorFlow version and how it was installed (source or binary):2.4.1
  • TensorFlow-Recommenders-Addons version and how it was installed (source or binary):binary
  • Python version:3.6
  • Is GPU used? (yes/no):no

Describe the bug

In continues training, for example, training models based on trained models last time periodly, warm start feature is important. Dynamic Embedding(DE) in Estimator works well in online streaming training and offline one-shot training, but can't load DE variables from Wart Start Config.

Code to reproduce the issue
Training Code:

import json
import os

import tensorflow as tf
from tensorflow.keras.layers import Dense

import tensorflow_datasets as tfds
import tensorflow_recommenders_addons as tfra

from absl import app
from absl import flags

flags.DEFINE_string('model_dir', "./ckpt", 'export_dir')
flags.DEFINE_string('warm_start_dir', "./old_ckpt", 'warm_start_dir')
flags.DEFINE_string('export_dir', "./export_dir", 'export_dir')
flags.DEFINE_string('mode', "train", 'train or export')

FLAGS = flags.FLAGS


def input_fn():
  ratings = tfds.load("movielens/100k-ratings", split="train")
  ratings = ratings.map(
      lambda x: {
          "movie_id": tf.strings.to_number(x["movie_id"], tf.int64),
          "user_id": tf.strings.to_number(x["user_id"], tf.int64),
          "user_rating": x["user_rating"]
      })
  shuffled = ratings.shuffle(1_000_000,
                             seed=2021,
                             reshuffle_each_iteration=False)
  dataset = shuffled.batch(256)
  return dataset


def model_fn(features, labels, mode, params):
  embedding_size = 32
  movie_id = features["movie_id"]
  user_id = features["user_id"]
  rating = features["user_rating"]

  is_training = (mode == tf.estimator.ModeKeys.TRAIN)

  if is_training:
    ps_list = [
        "/job:ps/replica:0/task:{}/CPU:0".format(i)
        for i in range(params["ps_num"])
    ]
    initializer = tf.keras.initializers.RandomNormal(-1, 1)
  else:
    ps_list = ["/job:localhost/replica:0/task:0/CPU:0"] * params["ps_num"]
    initializer = tf.keras.initializers.Zeros()

  user_embeddings = tfra.dynamic_embedding.get_variable(
      name="user_dynamic_embeddings",
      dim=embedding_size,
      devices=ps_list,
      initializer=initializer)
  movie_embeddings = tfra.dynamic_embedding.get_variable(
      name="moive_dynamic_embeddings",
      dim=embedding_size,
      devices=ps_list,
      initializer=initializer)

  user_id_val, user_id_idx = tf.unique(tf.concat(user_id, axis=0))
  user_id_weights, user_id_trainable_wrapper = tfra.dynamic_embedding.embedding_lookup(
      params=user_embeddings,
      ids=user_id_val,
      name="user-id-weights",
      return_trainable=True)
  user_id_weights = tf.gather(user_id_weights, user_id_idx)

  movie_id_val, movie_id_idx = tf.unique(tf.concat(movie_id, axis=0))
  movie_id_weights, movie_id_trainable_wrapper = tfra.dynamic_embedding.embedding_lookup(
      params=movie_embeddings,
      ids=movie_id_val,
      name="movie-id-weights",
      return_trainable=True)
  movie_id_weights = tf.gather(movie_id_weights, movie_id_idx)

  embeddings = tf.concat([user_id_weights, movie_id_weights], axis=1)
  d0 = Dense(256,
             activation='relu',
             kernel_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1),
             bias_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1))
  d1 = Dense(64,
             activation='relu',
             kernel_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1),
             bias_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1))
  d2 = Dense(1,
             kernel_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1),
             bias_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1))
  dnn = d0(embeddings)
  dnn = d1(dnn)
  dnn = d2(dnn)
  out = tf.reshape(dnn, shape=[-1])
  loss = tf.keras.losses.MeanSquaredError()(rating, out)
  predictions = {"out": out}

  if mode == tf.estimator.ModeKeys.EVAL:
    eval_metric_ops = {}
    return tf.estimator.EstimatorSpec(mode=mode,
                                      loss=loss,
                                      eval_metric_ops=eval_metric_ops)

  if mode == tf.estimator.ModeKeys.TRAIN:
    optimizer = tf.compat.v1.train.AdamOptimizer(learning_rate=0.001)
    optimizer = tfra.dynamic_embedding.DynamicEmbeddingOptimizer(optimizer)
    train_op = optimizer.minimize(
        loss, global_step=tf.compat.v1.train.get_or_create_global_step())
    return tf.estimator.EstimatorSpec(mode=mode,
                                      predictions=predictions,
                                      loss=loss,
                                      train_op=train_op)

  if mode == tf.estimator.ModeKeys.PREDICT:
    predictions_for_net = {"out": out}
    export_outputs = {
        "predict_export_outputs":
            tf.estimator.export.PredictOutput(outputs=predictions_for_net)
    }
    return tf.estimator.EstimatorSpec(mode,
                                      predictions=predictions_for_net,
                                      export_outputs=export_outputs)


def train(model_dir, ps_num):
  model_config = tf.estimator.RunConfig(log_step_count_steps=100,
                                        save_summary_steps=100,
                                        save_checkpoints_steps=100,
                                        save_checkpoints_secs=None,
                                        keep_checkpoint_max=2)

  estimator = tf.estimator.Estimator(model_fn=model_fn,
                                     model_dir=model_dir,
                                     warm_start_from=tf.estimator.WarmStartSettings(ckpt_to_initialize_from=FLAGS.warm_start_dir,
                                                             vars_to_warm_start='.*'),
                                     params={"ps_num": ps_num},
                                     config=model_config)

  train_spec = tf.estimator.TrainSpec(input_fn=input_fn)

  eval_spec = tf.estimator.EvalSpec(input_fn=input_fn)

  tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)


def serving_input_receiver_dense_fn():
  input_spec = {
      "movie_id": tf.constant([1], tf.int64),
      "user_id": tf.constant([1], tf.int64),
      "user_rating": tf.constant([1.0], tf.float32)
  }
  return tf.estimator.export.build_raw_serving_input_receiver_fn(input_spec)


def export_for_serving(model_dir, export_dir, ps_num):
  model_config = tf.estimator.RunConfig(log_step_count_steps=100,
                                        save_summary_steps=100,
                                        save_checkpoints_steps=100,
                                        save_checkpoints_secs=None)

  estimator = tf.estimator.Estimator(model_fn=model_fn,
                                     model_dir=model_dir,
                                     params={"ps_num": ps_num},
                                     config=model_config)

  estimator.export_saved_model(export_dir, serving_input_receiver_dense_fn())


def main(argv):
  del argv
  tf_config = json.loads(os.environ.get('TF_CONFIG') or '{}')
  task_name = tf_config.get('task', {}).get('type')
  task_idx = tf_config.get('task', {}).get('index')

  ps_num = len(tf_config["cluster"]["ps"])

  if FLAGS.mode == "train":
    train(FLAGS.model_dir, ps_num)
  if FLAGS.mode == "serving" and task_name == "chief" and int(task_idx) == 0:
    tfra.dynamic_embedding.enable_inference_mode()
    export_for_serving(FLAGS.model_dir, FLAGS.export_dir, ps_num)


if __name__ == "__main__":
  app.run(main)

Check variable dimension from ckpt Code:

save_path ='/local_path/model.ckpt-123456'
for var in tf.train.list_variables(save_path):
  print(var[0], var[1]) # name, shape

Other info / logs
Example:
First Training using 7 days sample data => Model M0: embedding shape is (100000, 64).
Second Training using 1day sample data based on model M0 => Model M1: embedding shape is (2000, 64) actually, but which may should be (101234, 64).

can't install

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
  • TensorFlow version and how it was installed (source or binary): tensorflow 1.15.0 installed by pip
  • TensorFlow-Recommenders-Addons version and how it was installed (source or binary): can't install
  • Python version: 2.7.12
  • Is GPU used? (yes/no):no

install log:
ERROR: Could not find a version that satisfies the requirement tensorflow-recommenders-addons (from versions: none)
ERROR: No matching distribution found for tensorflow-recommenders-addons

两次取到的embedding的结果不一样!!!

System information

  • OS Platform and Distribution (Linux Ubuntu 20.04): Linux Ubuntu 20.04
  • TensorFlow version and how it was installed (source or binary):1.15.2
  • TensorFlow-Recommenders-Addons version and how it was installed (source or binary):0.2.0
  • Python version: 3.7.10
  • Is GPU used? (yes/no): no
import numpy as np
import tensorflow_recommenders_addons as tfra
import tensorflow as tf

size = 8
init = tf.random_normal_initializer(0, 0.005)
id = tf.constant([1,2,3,3], dtype=tf.int64)

embedding = tfra.dynamic_embedding.get_variable(
    name = "embedding",
    dim = size,
    initializer=init)

w = tfra.dynamic_embedding.embedding_lookup(
    params=embedding,
    ids = id)

global_initializer = tf.global_variables_initializer()
sess = tf.Session()
sess.run(global_initializer)
print(sess.run(w))
print("="*50)
print(sess.run(w))

[[ 7.42965797e-03 -8.09842968e-05 -1.30431848e-02  7.14901090e-03
  -2.20308080e-03  3.40936449e-03  2.16588937e-03 -4.35747235e-04]
 [-6.44960767e-03  4.09679720e-03  3.10608768e-03  1.27214370e-02
   1.75029470e-03  1.73122180e-03  6.49260217e-03  1.06016046e-03]
 [ 2.64631119e-03  3.40706878e-03  2.43445882e-03  5.32498397e-03
  -3.39744613e-03  3.74298031e-03  4.32566088e-03  3.81937460e-03]
 [ 3.37692373e-03  1.16260266e-02 -1.30036648e-03  1.23203145e-02
  -1.03127500e-02 -2.60319980e-03  9.41354223e-03  2.59222323e-03]]
==================================================
[[-1.68357114e-03 -1.44681951e-03 -1.02912020e-02 -3.63380459e-05
  -7.16862315e-03  1.16874115e-04  3.81298619e-03  7.55132828e-03]
 [ 1.68958784e-03  3.18953977e-03  7.44672492e-04  1.11513855e-02
   2.44748266e-03  1.43962516e-03  1.11982562e-02 -1.13977687e-02]
 [-2.47357949e-03 -7.63357989e-03 -1.76260294e-03 -9.26740468e-04
   3.78448330e-03  6.51640003e-04 -2.95736268e-03  4.61211987e-03]
 [ 4.44535259e-03  8.75408703e-04 -4.96396748e-03  1.84889091e-03
   3.95822618e-03  1.26622617e-02 -5.50796371e-03  2.59953295e-03]]

这两个结果应该要一样,但是两次运行得到的w结果不一样!

undefined symbol: aio_read

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 20.04
  • TensorFlow version and how it was installed (source or binary): v2.5.1, source
  • TensorFlow-Recommenders-Addons version and how it was installed (source or binary): source
  • Python version: 3.8.10
  • Is GPU used? (yes/no): no

Describe the bug

root@7c22b3f4c4a6:/workspace# python -c "import tensorflow_recommenders_addons as tfra"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_recommenders_addons/__init__.py", line 30, in <module>
    from tensorflow_recommenders_addons import dynamic_embedding
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_recommenders_addons/dynamic_embedding/__init__.py", line 53, in <module>
    from tensorflow_recommenders_addons.dynamic_embedding.python.ops.redis_table_ops import (
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_recommenders_addons/dynamic_embedding/python/ops/redis_table_ops.py", line 37, in <module>
    redis_table_ops = LazySO("dynamic_embedding/core/_redis_table_ops.so").ops
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_recommenders_addons/utils/resource_loader.py", line 102, in ops
    self._ops = tf.load_op_library(get_path_to_datafile(self.relative_path))
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/load_library.py", line 58, in load_op_library
    lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: /usr/local/lib/python3.8/dist-packages/tensorflow_recommenders_addons/dynamic_embedding/core/_redis_table_ops.so: undefined symbol: aio_read

Building embedding_variable got "ev_ops.pic.o: unrecognized relocation"

System information

  • OS Platform and Distribution : CentOS 7
  • TensorFlow version and how it was installed (source or binary): 2.4.1
  • TensorFlow-Recommenders-Addons version and how it was installed (source or binary): Build from source
  • Python version: 3.6.8
  • Is GPU used? (yes/no): yes. (But not related to this issue)

Describe the bug
I try to add GPU operations (#55 ) for TFRA from source with environment settings:

OS: CentOS 7
gcc: 7.3.1
python: 3.6.8
CUDA: 11.0
CUDNN: 8.2
Tensorflow: 2.4.1

with building commands:

# step 1
TF_NEED_CUDA=1 CUDNN_INSTALL_PATH="/data/dev/packages/cuda" python ./configure.py

get

2021-04-29 20:09:16.033041: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0

Configuring TensorFlow Recommenders-Addons to be built from source...
> Building GPU & CPU ops

Build configurations successfully written to .bazelrc :

build --action_env TF_HEADER_DIR="/data/env/py3/lib/python3.6/site-packages/tensorflow/include"
build --action_env TF_SHARED_LIBRARY_DIR="/data/env/py3/lib/python3.6/site-packages/tensorflow"
build --action_env TF_SHARED_LIBRARY_NAME="libtensorflow_framework.so.2"
build --action_env TF_CXX11_ABI_FLAG="0"
build --spawn_strategy=standalone
build --strategy=Genrule=standalone
build -c opt
build --action_env TF_NEED_CUDA="1"
build --action_env CUDA_TOOLKIT_PATH="/usr/local/cuda"
build --action_env CUDNN_INSTALL_PATH="/data/dev/packages/cuda"
build --action_env TF_CUDA_VERSION="11.0"
build --action_env TF_CUDNN_VERSION="8"
test --config=cuda
build --config=cuda
build:cuda --define=using_cuda=true --define=using_cuda_nvcc=true
build:cuda --crosstool_top=@local_config_cuda//crosstool:toolchain

It's nice and correct. But when I run bazel build:

# step 2
bazel build --enable_runfiles build_pip_pkg

I get:

INFO: Build options --action_env, --crosstool_top, and --define have changed, discarding analysis cache.
DEBUG: /root/.cache/bazel/_bazel_root/fde3dfbc9e5ac67fd2162d1e4794bd76/external/bazel_tools/tools/cpp/lib_cc_configure.bzl:118:5: 
Auto-Configuration Warning: 'TMP' environment variable is not set, using 'C:\Windows\Temp' as default
WARNING: /root/.cache/bazel/_bazel_root/fde3dfbc9e5ac67fd2162d1e4794bd76/external/local_config_tf/BUILD:11521:1: target 'libtensorflow_framework.so.2' is both a rule and a file; please choose another name for the rule
INFO: Analyzed target //:build_pip_pkg (8 packages loaded, 162 targets configured).
INFO: Found 1 target...
INFO: From Compiling tensorflow_recommenders_addons/embedding_variable/core/kernels/ev_ops.cc:
tensorflow_recommenders_addons/embedding_variable/core/kernels/ev_ops.cc: In instantiation of 'void tensorflow::ev::EVImportOp<TKey, TValue>::Compute(tensorflow::OpKernelContext*) [with TKey = long long int; TValue = float]':
tensorflow_recommenders_addons/embedding_variable/core/kernels/ev_ops.cc:626:1:   required from here
tensorflow_recommenders_addons/embedding_variable/core/kernels/ev_ops.cc:603:26: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
     for (size_t i = 0; i < key.NumElements(); ++i) {
                        ~~^~~~~~~~~~~~~~~~~~~
tensorflow_recommenders_addons/embedding_variable/core/kernels/ev_ops.cc: In instantiation of 'void tensorflow::ev::EVImportOp<TKey, TValue>::Compute(tensorflow::OpKernelContext*) [with TKey = int; TValue = float]':
tensorflow_recommenders_addons/embedding_variable/core/kernels/ev_ops.cc:626:1:   required from here
tensorflow_recommenders_addons/embedding_variable/core/kernels/ev_ops.cc:603:26: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
tensorflow_recommenders_addons/embedding_variable/core/kernels/ev_ops.cc: In instantiation of 'void tensorflow::ev::EVExportOp<TKey, TValue>::Compute(tensorflow::OpKernelContext*) [with TKey = long long int; TValue = float]':
tensorflow_recommenders_addons/embedding_variable/core/kernels/ev_ops.cc:626:1:   required from here
tensorflow_recommenders_addons/embedding_variable/core/kernels/ev_ops.cc:563:26: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
     for (size_t i = 0; i < total_size; ++i) {
                        ~~^~~~~~~~~~~~
tensorflow_recommenders_addons/embedding_variable/core/kernels/ev_ops.cc: In instantiation of 'void tensorflow::ev::EVExportOp<TKey, TValue>::Compute(tensorflow::OpKernelContext*) [with TKey = int; TValue = float]':
tensorflow_recommenders_addons/embedding_variable/core/kernels/ev_ops.cc:626:1:   required from here
tensorflow_recommenders_addons/embedding_variable/core/kernels/ev_ops.cc:563:26: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
ERROR: /data/dev/recommenders-addons/tensorflow_recommenders_addons/embedding_variable/core/BUILD:7:1: Linking of rule '//tensorflow_recommenders_addons/embedding_variable/core:_ev_ops.so' failed (Exit 1)
/usr/bin/ld: bazel-out/k8-opt/bin/tensorflow_recommenders_addons/embedding_variable/core/_objs/_ev_ops.so/0/ev_ops.pic.o: unrecognized relocation (0x2a) in section `.text._ZN10tensorflow2ev9EVShapeOpIiifED2Ev[_ZN10tensorflow2ev9EVShapeOpIiifED5Ev]'
/usr/bin/ld: final link failed: Bad value
collect2: error: ld returned 1 exit status
Target //:build_pip_pkg failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 16.652s, Critical Path: 16.09s
INFO: 8 processes: 8 local.
FAILED: Build did NOT complete successfully

I use -std=c++14 instead of c++11 (here) as Tensorflow 2.4.1 required:

    copts = copts + select({
        "//tensorflow_recommenders_addons:windows": [
            "/DEIGEN_STRONG_INLINE=inline",
            "-DTENSORFLOW_MONOLITHIC_BUILD",
            "/D_USE_MATH_DEFINES",
            "/DPLATFORM_WINDOWS",
            "/DEIGEN_HAS_C99_MATH",
            "/DTENSORFLOW_USE_EIGEN_THREADPOOL",
            "/DEIGEN_AVOID_STL_ARRAY",
            "/Iexternal/gemmlowp",
            "/wd4018",
            "/wd4577",
            "/DNOGDI",
            "/UTF_COMPILE_LIBRARY",
        ],  
        "//conditions:default": ["-pthread", "-std=c++14", D_GLIBCXX_USE_CXX11_ABI],
    })

but get error:

(py3) [root@VM-121-23-centos /data/dev/recommenders-addons]# bazel build --enable_runfiles build_pip_pkgINFO: Build options --action_env, --crosstool_top, and --define have changed, discarding analysis cache.
DEBUG: /root/.cache/bazel/_bazel_root/fde3dfbc9e5ac67fd2162d1e4794bd76/external/bazel_tools/tools/cpp/lib_cc_configure.bzl:118:5: 
Auto-Configuration Warning: 'TMP' environment variable is not set, using 'C:\Windows\Temp' as default
WARNING: /root/.cache/bazel/_bazel_root/fde3dfbc9e5ac67fd2162d1e4794bd76/external/local_config_tf/BUILD:11521:1: target 'libtensorflow_framework.so.2' is both a rule and a file; please choose another name for the rule
INFO: Analyzed target //:build_pip_pkg (8 packages loaded, 162 targets configured).
INFO: Found 1 target...
INFO: From Compiling tensorflow_recommenders_addons/embedding_variable/core/kernels/ev_ops.cc:
tensorflow_recommenders_addons/embedding_variable/core/kernels/ev_ops.cc: In instantiation of 'void tensorflow::ev::EVImportOp<TKey, TValue>::Compute(tensorflow::OpKernelContext*) [with TKey = long long int; TValue = float]':
tensorflow_recommenders_addons/embedding_variable/core/kernels/ev_ops.cc:626:1:   required from here
tensorflow_recommenders_addons/embedding_variable/core/kernels/ev_ops.cc:603:26: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
     for (size_t i = 0; i < key.NumElements(); ++i) {
                        ~~^~~~~~~~~~~~~~~~~~~
tensorflow_recommenders_addons/embedding_variable/core/kernels/ev_ops.cc: In instantiation of 'void tensorflow::ev::EVImportOp<TKey, TValue>::Compute(tensorflow::OpKernelContext*) [with TKey = int; TValue = float]':
tensorflow_recommenders_addons/embedding_variable/core/kernels/ev_ops.cc:626:1:   required from here
tensorflow_recommenders_addons/embedding_variable/core/kernels/ev_ops.cc:603:26: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
tensorflow_recommenders_addons/embedding_variable/core/kernels/ev_ops.cc: In instantiation of 'void tensorflow::ev::EVExportOp<TKey, TValue>::Compute(tensorflow::OpKernelContext*) [with TKey = long long int; TValue = float]':
tensorflow_recommenders_addons/embedding_variable/core/kernels/ev_ops.cc:626:1:   required from here
tensorflow_recommenders_addons/embedding_variable/core/kernels/ev_ops.cc:563:26: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
     for (size_t i = 0; i < total_size; ++i) {
                        ~~^~~~~~~~~~~~
tensorflow_recommenders_addons/embedding_variable/core/kernels/ev_ops.cc: In instantiation of 'void tensorflow::ev::EVExportOp<TKey, TValue>::Compute(tensorflow::OpKernelContext*) [with TKey = int; TValue = float]':
tensorflow_recommenders_addons/embedding_variable/core/kernels/ev_ops.cc:626:1:   required from here
tensorflow_recommenders_addons/embedding_variable/core/kernels/ev_ops.cc:563:26: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
ERROR: /data/dev/recommenders-addons/tensorflow_recommenders_addons/embedding_variable/core/BUILD:7:1: Linking of rule '//tensorflow_recommenders_addons/embedding_variable/core:_ev_ops.so' failed (Exit 1)
/usr/bin/ld: bazel-out/k8-opt/bin/tensorflow_recommenders_addons/embedding_variable/core/_objs/_ev_ops.so/0/ev_ops.pic.o: unrecognized relocation (0x2a) in section `.text._ZN10tensorflow2ev9EVShapeOpIiifED2Ev[_ZN10tensorflow2ev9EVShapeOpIiifED5Ev]'
/usr/bin/ld: final link failed: Bad value
collect2: error: ld returned 1 exit status
Target //:build_pip_pkg failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 16.652s, Critical Path: 16.09s
INFO: 8 processes: 8 local.
FAILED: Build did NOT complete successfully

I'm not very familiar with the tfra.embedding_variable module. How can I solve the symbol problem?

TFRA integrates with tensorflow serving 1.15

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04
  • TensorFlow version and how it was installed (source or binary): install tensorflow 1.15.2 from binary
  • TensorFlow Serving version and how it was installed: install tensorflow serving 1.15 from source code
  • TensorFlow-Recommenders-Addons version and how it was installed (source or binary): source
  • Python version: 3.6
  • Is GPU used? (yes/no): no

Describe the bug

We can't compile tensorflow-serving 1.15 correctly with TFRA according to the description from readme file

We have compiled TRFA with Tensorflow Serving 1.15 sucessfully and the process is as follows:

  • download TFRA source code
    git clone https://github.com/tensorflow/recommenders-addons.git
  • download Tensorflow-Serving source code
    git clone -b r1.15 https://github.com/tensorflow/serving.git
  • copy tensorflow_recommenders_addons and build-deps from TFRA to tensorflow-serving
    cp -r recommenders-addons/tensorflow_recommenders_addons serving/
    cp -r recommenders-addons/build_deps serving/
  • install tensorflow
    pip install tensorflow==1.15.2
  • generate .bazelrc for TFRA
    cd recommenders-addons/ && python configure.py
    change the TF_HEADER_DIR and FOR_TF_SERVINGin .bazelrc to
    TF_HEADER_DIR=/tensorflow-recommenders-addons/build_deps/tf_header/1.15.2/tensorflow
    FOR_TF_SERVING="1"
    so, the .bazelrc file under recommenders-addons shown as below
    build --action_env TF_HEADER_DIR="/tensorflow-recommenders-addons/build_deps/tf_header/1.15.2/tensorflow"
    build --action_env TF_SHARED_LIBRARY_DIR="/usr/local/lib/python3.6/dist-packages/tensorflow"
    build --action_env TF_SHARED_LIBRARY_NAME="libtensorflow_framework.so.2"
    build --action_env TF_CXX11_ABI_FLAG="0"
    build --action_env TF_VERSION_INTEGER="1152"
    build --action_env FOR_TF_SERVING="1"
    build --spawn_strategy=standalone
    build --strategy=Genrule=standalone
    build -c opt
    build --copt=-mavx
    
  • merge .bazelrc file
    cat .bazelrc >> ../serving/.bazelrc
  • merge WORKSPACE file
    1. delete the first line of WORKSPACE file under recommenders-addons directory
      workspace(name = "tf_recommenders_addons")
    2. merge with WORKSPACE file under serving directory
      cat WORKSPACE >> ../serving/WORKSPACE
  • modify serving/tensorflow-serving/model_servers file to integrate tfra operator
    1. adding OP information (tensorflow_text will not found, so ignore it)
      image
    2. adding linkopts to avoid multiple definition error
      image
  • compiling Tensorflow-serving using bazel

Of course,we have build docker image to deploy the model trained with TFRA sucessful.

The user will be confused when using the current document. So, Would it be better to re-describe how TFRA integrates with tensorflow serving 1.15 in the document or provide a Dockerfile for tensorflow-serving 1.15 with TFRA?

embedding_variable doesn't support estimator

System information

  • OS Platform and Distribution: Linux Ubuntu 18.04.5
  • TensorFlow version and how it was installed (source or binary): 2.4.1 in docker
  • TensorFlow-Recommenders-Addons version and how it was installed (source or binary): 0.2.0 use pip
  • Python version: 3.6.9
  • Is GPU used? (yes/no): no

Describe the bug

I tried to use dynamic_embedding(de) and embedding_variable(ev) several times and I found that:

  1. de and ev both work fine in eager mode, but currently don't support tf.function, so I could not save model for model serving #104 #105
  2. de supports estimator: https://github.com/tensorflow/recommenders-addons/blob/master/demo/dynamic_embedding/movielens-100k-estimator/movielens-100k-estimator.py
  3. ev works in session mode: https://github.com/tensorflow/recommenders-addons/blob/master/demo/embedding_variable/ev-keras-graph.py

I’d like to integrate de and ev into my framework, which is a model pipeline including training and serving. So I try to use ev in estimator, but failed. Any suggestion?

Code to reproduce the issue

import tensorflow as tf
from tensorflow.keras.layers import Dense
import tensorflow_datasets as tfds
import tensorflow_recommenders_addons as tfra

from absl import app
from absl import flags

flags.DEFINE_string('model_dir', "./ckpt/ev", 'export_dir')
flags.DEFINE_string('export_dir', "./export_dir/ev", 'export_dir')
flags.DEFINE_string('mode', "train", 'train or export')

FLAGS = flags.FLAGS
tf.compat.v1.disable_eager_execution()


def input_fn():
  ratings = tfds.load("movielens/100k-ratings", split="train")
  ratings = ratings.map(
      lambda x: {
          "movie_id": tf.strings.to_number(x["movie_id"], tf.int64),
          "user_id": tf.strings.to_number(x["user_id"], tf.int64),
          "user_rating": x["user_rating"]
      })
  shuffled = ratings.shuffle(1_000_000,
                             seed=2021,
                             reshuffle_each_iteration=False)
  dataset = shuffled.batch(256)
  return dataset


def model_fn(features, labels, mode):
  embedding_size = 32
  movie_id = features["movie_id"]
  user_id = features["user_id"]
  rating = features["user_rating"]

  is_training = (mode == tf.estimator.ModeKeys.TRAIN)

  initializer = tf.keras.initializers.RandomNormal(-1, 1)

  user_embeddings = tfra.embedding_variable.EmbeddingVariable(
      name="user_variable_embeddings",
      embedding_dim=embedding_size,
      ktype=tf.int64,
      initializer=initializer)
  movie_embeddings = tfra.embedding_variable.EmbeddingVariable(
      name="movie_variable_embeddings",
      embedding_dim=embedding_size,
      ktype=tf.int64,
      initializer=initializer)

  user_id_val, user_id_idx = tf.unique(tf.concat(user_id, axis=0))
  user_id_weights = tf.nn.embedding_lookup(
      params=user_embeddings,
      ids=user_id_val,
      name="user-id-weights")
  user_id_weights = tf.gather(user_id_weights, user_id_idx)

  movie_id_val, movie_id_idx = tf.unique(tf.concat(movie_id, axis=0))
  movie_id_weights = tf.nn.embedding_lookup(
      params=movie_embeddings,
      ids=movie_id_val,
      name="movie-id-weights")
  movie_id_weights = tf.gather(movie_id_weights, movie_id_idx)

  embeddings = tf.concat([user_id_weights, movie_id_weights], axis=1)
  d0 = Dense(256,
             activation='relu',
             kernel_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1),
             bias_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1))
  d1 = Dense(64,
             activation='relu',
             kernel_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1),
             bias_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1))
  d2 = Dense(1,
             kernel_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1),
             bias_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1))
  dnn = d0(embeddings)
  dnn = d1(dnn)
  dnn = d2(dnn)
  out = tf.reshape(dnn, shape=[-1])
  loss = tf.keras.losses.MeanSquaredError()(rating, out)
  predictions = {"out": out}

  if mode == tf.estimator.ModeKeys.EVAL:
    eval_metric_ops = {}
    return tf.estimator.EstimatorSpec(mode=mode,
                                      loss=loss,
                                      eval_metric_ops=eval_metric_ops)

  if mode == tf.estimator.ModeKeys.TRAIN:
    optimizer = tfra.embedding_variable.AdamOptimizer(learning_rate=0.001)
    train_op = optimizer.minimize(
        loss, global_step=tf.compat.v1.train.get_or_create_global_step())
    return tf.estimator.EstimatorSpec(mode=mode,
                                      predictions=predictions,
                                      loss=loss,
                                      train_op=train_op)

  if mode == tf.estimator.ModeKeys.PREDICT:
    predictions_for_net = {"out": out}
    export_outputs = {
        "predict_export_outputs":
            tf.estimator.export.PredictOutput(outputs=predictions_for_net)
    }
    return tf.estimator.EstimatorSpec(mode,
                                      predictions=predictions_for_net,
                                      export_outputs=export_outputs)


def train(model_dir):
  print("in eager mode: ", tf.executing_eagerly())
  model_config = tf.estimator.RunConfig(log_step_count_steps=100,
                                        save_summary_steps=100,
                                        save_checkpoints_steps=100,
                                        save_checkpoints_secs=None,
                                        keep_checkpoint_max=2)

  estimator = tf.estimator.Estimator(model_fn=model_fn,
                                     model_dir=model_dir,
                                     config=model_config)

  train_spec = tf.estimator.TrainSpec(input_fn=input_fn)

  eval_spec = tf.estimator.EvalSpec(input_fn=input_fn)

  tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)


def serving_input_receiver_dense_fn():
  input_spec = {
      "movie_id": tf.constant([1], tf.int64),
      "user_id": tf.constant([1], tf.int64),
      "user_rating": tf.constant([1.0], tf.float32)
  }
  return tf.estimator.export.build_raw_serving_input_receiver_fn(input_spec)


def export_for_serving(model_dir, export_dir, ps_num):
  model_config = tf.estimator.RunConfig(log_step_count_steps=100,
                                        save_summary_steps=100,
                                        save_checkpoints_steps=100,
                                        save_checkpoints_secs=None)

  estimator = tf.estimator.Estimator(model_fn=model_fn,
                                     model_dir=model_dir,
                                     config=model_config)

  estimator.export_saved_model(export_dir, serving_input_receiver_dense_fn())


def main(argv):
  del argv
  train(FLAGS.model_dir)

if __name__ == "__main__":
  app.run(main)

Other info / logs
INFO:tensorflow:Create CheckpointSaverHook.
I1028 09:39:51.577457 140539056662336 basic_session_run_hooks.py:546] Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
I1028 09:39:51.706991 140539056662336 monitored_session.py:246] Graph was finalized.
Traceback (most recent call last):
File "tfra_variable_estimator.py", line 162, in
app.run(main)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 303, in run
_run_main(main, args)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "tfra_variable_estimator.py", line 159, in main
train(FLAGS.model_dir)
File "tfra_variable_estimator.py", line 132, in train
tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/training.py", line 505, in train_and_evaluate
return executor.run()
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/training.py", line 646, in run
return self.run_local()
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/training.py", line 747, in run_local
saving_listeners=saving_listeners)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 349, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1175, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1208, in _train_model_default
saving_listeners)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1510, in _train_with_estimator_spec
save_graph_def=self._config.checkpoint_save_graph_def) as mon_sess:
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 604, in MonitoredTrainingSession
stop_grace_period_secs=stop_grace_period_secs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 1038, in init
stop_grace_period_secs=stop_grace_period_secs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 749, in init
self._sess = _RecoverableSession(self._coordinated_creator)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 1231, in init
_WrappedSession.init(self, self._create_session())
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 1236, in _create_session
return self._sess_creator.create_session()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 902, in create_session
self.tf_sess = self._session_creator.create_session()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 669, in create_session
init_fn=self._scaffold.init_fn)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/session_manager.py", line 311, in prepare_session
msg))
RuntimeError: Init operations did not make model ready for local_init. Init op: group_deps, init fn: None, error: Variables not initialized: user_variable_embeddings, movie_variable_embeddings, user_variable_embeddings/Adam/EmbeddingVariable, user_variable_embeddings/Adam_1/EmbeddingVariable, movie_variable_embeddings/Adam/EmbeddingVariable, movie_variable_embeddings/Adam_1/EmbeddingVariable

how can i use RestrictPolicy in estimator to eliminate features in variable follow the `oldest-out-first` rule?

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):Ubuntu 18.04.6 LTS
  • TensorFlow version and how it was installed (source or binary):2.5.1
  • TensorFlow-Recommenders-Addons version and how it was installed (source or binary):source
  • Python version:Python 3.6.9
  • Is GPU used? (yes/no):no

Describe the bug

When i use RestrictPolicy in estimator, how can i use apply_update and apply_restriction to eliminate features in variable follow the oldest-out-first rule?

user_embeddings and movie_embeddings were created in model_fn, how can i get these two embedding weight and
apply apply_restriction?

i try to use get_variable to get user_embeddings or movie_embeddings in SessionRunHook, but get ValueError: Variable user_dynamic_embeddings already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope?

anyone can give me some suggest?

Code to reproduce the issue

import json
import os

import tensorflow as tf
from tensorflow.keras.layers import Dense

import tensorflow_datasets as tfds
import tensorflow_recommenders_addons as tfra

from absl import app
from absl import flags

flags.DEFINE_string('model_dir', "./incrm_ckpt", 'export_dir')
flags.DEFINE_string('export_dir', "./export_dir", 'export_dir')
flags.DEFINE_string('mode', "train", 'train or export')

FLAGS = flags.FLAGS

try:
    _SessionRunHook = tf.estimator.SessionRunHook
except AttributeError:
    try:
        _SessionRunHook = tf.train.SessionRunHook
    except AttributeError:
        _SessionRunHook = None

if _SessionRunHook is not None:
  class UpdateEmbeddingHook(_SessionRunHook):
    def begin(self):
      print(" === begin")
      
    def after_run(self, run_context, run_values):
      print(" === after_run")
      self.user_embeddings = tfra.dynamic_embedding.get_variable(name="user_dynamic_embeddings")
      print(self.user_embeddings)
      keys, tstp = run_context.session.run(self.user_embeddings.restrict_policy.status.export())
      k, v = run_context.session.run(self.user_embeddings.export())
      print(' === keys   :{}\n === tstp  :{}\n === values :\n{}'.format(keys, tstp, v))

def input_fn():
  ratings = tfds.load("movielens/100k-ratings", split="train")
  ratings = ratings.map(
      lambda x: {
          "movie_id": tf.strings.to_number(x["movie_id"], tf.int64),
          "user_id": tf.strings.to_number(x["user_id"], tf.int64),
          "user_rating": x["user_rating"]
      })
  shuffled = ratings.shuffle(1_000_000, seed=2021, reshuffle_each_iteration=False)
  dataset = shuffled.batch(256)
  return dataset

def model_fn(features, labels, mode, params):
  embedding_size = 32
  movie_id = features["movie_id"]
  user_id = features["user_id"]
  rating = features["user_rating"]

  is_training = (mode == tf.estimator.ModeKeys.TRAIN)

  if is_training:
    initializer = tf.keras.initializers.RandomNormal(-1, 1)
  else:
    initializer = tf.keras.initializers.Zeros()

  # from tensorflow.python.ops import variable_scope
  # with variable_scope.variable_scope("test", reuse=variable_scope.AUTO_REUSE):
  #   # Create embedding variable by `tfra.dynamic_embedding` API.
  user_embeddings = tfra.dynamic_embedding.get_variable(
      name="user_dynamic_embeddings",
      dim=embedding_size,
      initializer=initializer,
      trainable=True,
      restrict_policy=tfra.dynamic_embedding.TimestampRestrictPolicy)
  movie_embeddings = tfra.dynamic_embedding.get_variable(
      name="moive_dynamic_embeddings",
      dim=embedding_size,
      initializer=initializer,
      trainable=True,
      restrict_policy=tfra.dynamic_embedding.TimestampRestrictPolicy)

  user_id_val, user_id_idx = tf.unique(tf.concat(user_id, axis=0))
  user_id_weights, user_id_trainable_wrapper = tfra.dynamic_embedding.embedding_lookup(
      params=user_embeddings,
      ids=user_id_val,
      name="user-id-weights",
      return_trainable=True)
  user_id_weights = tf.gather(user_id_weights, user_id_idx)

  movie_id_val, movie_id_idx = tf.unique(tf.concat(movie_id, axis=0))
  movie_id_weights, movie_id_trainable_wrapper = tfra.dynamic_embedding.embedding_lookup(
      params=movie_embeddings,
      ids=movie_id_val,
      name="movie-id-weights",
      return_trainable=True)
  movie_id_weights = tf.gather(movie_id_weights, movie_id_idx)

  embeddings = tf.concat([user_id_weights, movie_id_weights], axis=1)
  d0 = Dense(256,
            activation='relu',
            kernel_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1),
            bias_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1))
  d1 = Dense(64,
            activation='relu',
            kernel_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1),
            bias_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1))
  d2 = Dense(1,
            kernel_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1),
            bias_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1))
  dnn = d0(embeddings)
  dnn = d1(dnn)
  dnn = d2(dnn)
  out = tf.reshape(dnn, shape=[-1])
  loss = tf.keras.losses.MeanSquaredError()(rating, out)
  predictions = {"out": out}

  if mode == tf.estimator.ModeKeys.EVAL:
    eval_metric_ops = {}
    return tf.estimator.EstimatorSpec(mode=mode,
                                      loss=loss,
                                      eval_metric_ops=eval_metric_ops)

  if mode == tf.estimator.ModeKeys.TRAIN:
    optimizer = tf.compat.v1.train.AdamOptimizer(learning_rate=0.001)
    optimizer = tfra.dynamic_embedding.DynamicEmbeddingOptimizer(optimizer)
    train_op = optimizer.minimize(loss, global_step=tf.compat.v1.train.get_or_create_global_step())
    return tf.estimator.EstimatorSpec(mode=mode,
                                      predictions=predictions,
                                      loss=loss,
                                      train_op=train_op)

  if mode == tf.estimator.ModeKeys.PREDICT:
    predictions_for_net = {"out": out}
    export_outputs = {
        "predict_export_outputs":
            tf.estimator.export.PredictOutput(outputs=predictions_for_net)
    }
    return tf.estimator.EstimatorSpec(mode,
                                      predictions=predictions_for_net,
                                      export_outputs=export_outputs)


def train(model_dir):
  model_config = tf.estimator.RunConfig(log_step_count_steps=100,
                                        save_summary_steps=100,
                                        save_checkpoints_steps=100,
                                        save_checkpoints_secs=None,
                                        keep_checkpoint_max=2)

  estimator = tf.estimator.Estimator(model_fn=model_fn,
                                     model_dir=model_dir,
                                     config=model_config)

  train_spec = tf.estimator.TrainSpec(input_fn=input_fn)

  eval_spec = tf.estimator.EvalSpec(input_fn=input_fn)

  # tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
  updateEmbeddingHook = UpdateEmbeddingHook()
  estimator.train(input_fn=input_fn, hooks=[updateEmbeddingHook])
    


def serving_input_receiver_dense_fn():
  input_spec = {
      "movie_id": tf.constant([1], tf.int64),
      "user_id": tf.constant([1], tf.int64),
      "user_rating": tf.constant([1.0], tf.float32)
  }
  return tf.estimator.export.build_raw_serving_input_receiver_fn(input_spec)


def export_for_serving(model_dir, export_dir):
  model_config = tf.estimator.RunConfig(log_step_count_steps=100,
                                        save_summary_steps=100,
                                        save_checkpoints_steps=100,
                                        save_checkpoints_secs=None)

  estimator = tf.estimator.Estimator(model_fn=model_fn,
                                     model_dir=model_dir,
                                     config=model_config)

  estimator.export_saved_model(export_dir, serving_input_receiver_dense_fn())

def main(argv):
  if FLAGS.mode == "train":
    train(FLAGS.model_dir)
  if FLAGS.mode == "serving" and task_name == "chief" and int(task_idx) == 0:
    tfra.dynamic_embedding.enable_inference_mode()
    export_for_serving(FLAGS.model_dir, FLAGS.export_dir)

if __name__ == "__main__":
  app.run(main)

Other info / logs

I1125 02:36:43.168426 140397124024128 basic_session_run_hooks.py:626] Calling checkpoint listeners after saving checkpoint 0...
2021-11-25 02:36:43.306006: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
 === after_run
Traceback (most recent call last):
  File "movielens-100k-estimator_sgl.py", line 198, in <module>
    app.run(main)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 303, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "movielens-100k-estimator_sgl.py", line 191, in main
    train(FLAGS.model_dir)
  File "movielens-100k-estimator_sgl.py", line 163, in train
    estimator.train(input_fn=input_fn, hooks=[updateEmbeddingHook])
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 349, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1175, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1208, in _train_model_default
    saving_listeners)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1514, in _train_with_estimator_spec
    _, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 779, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 1284, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 1385, in run
    raise six.reraise(*original_exc_info)
  File "/usr/local/lib/python3.6/dist-packages/six.py", line 703, in reraise
    raise value
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 1370, in run
    return self._sess.run(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 1451, in run
    run_metadata=run_metadata))
  File "movielens-100k-estimator_sgl.py", line 40, in after_run
    self.user_embeddings = tfra.dynamic_embedding.get_variable(name="user_dynamic_embeddings")
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_recommenders_addons/dynamic_embedding/python/ops/dynamic_embedding_variable.py", line 729, in get_variable
    raise ValueError(err_msg)
ValueError: Variable user_dynamic_embeddings already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope?

Ev can support FTRL optimizer?

Describe the feature and the current behavior/state.

Ev can support FTRL optimizer?

Relevant information

  • Are you willing to contribute it (yes/no):no
  • Are you willing to maintain it going forward? (yes/no):yes
  • Is there a relevant academic paper? (if so, where):
  • Is there already an implementation in another framework? (if so, where):
  • Was it part of tf.contrib? (if so, where):

Which API type would this fall under (layer, metric, optimizer, etc.)

Who will benefit with this feature?

Any other info.

Integrating tfrs with tfx

I am trying to integrate a tfrs model with tfx, and I see that it's in the scope of recommender-addons ("End-to-end pipeline: how to train continuously, e.g. integrate with platforms like TFX"). However, I have not been able to find any docs or example pipelines to accomplish this. I am hoping to implement best practices, and in particular looking for help integrating the model with tft.

Any help would be much appreciated!

dynamic embedding doesn't work with tf.distribute.experimental.ParameterServerStrategy

System information

  • OS Platform and Distribution Manjaro:
  • TensorFlow version and how it was installed (source or binary): 2.5.1 binary
  • TensorFlow-Recommenders-Addons version and how it was installed (source or binary): 0.3.0 binary
  • Python version: 3.7.9
  • Is GPU used? (yes/no): no

Describe the bug

dynamic embedding doesn't work with tf.distribute.experimental.ParameterServerStrategy.

Related issue #167 because dynamic embedding also doesn't work with @tf.function.

Code to reproduce the issue
Code is collected from https://www.tensorflow.org/tutorials/distribute/parameter_server_training and https://github.com/tensorflow/recommenders-addons/blob/master/docs/tutorials/dynamic_embedding_tutorial.ipynb. It works fine when use_de=False.

import os
import multiprocessing

import tensorflow as tf
import tensorflow_datasets as tfds
import tensorflow_recommenders_addons as tfra
import numpy as np
import portpicker

from tensorflow.keras.layers import Dense


class NCFModel(tf.keras.Model):
    def __init__(self, use_de):
        super(NCFModel, self).__init__()
        self.embedding_size = 32
        self.use_de = use_de
        self.d0 = Dense(
            256,
            activation='relu',
            kernel_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1),
            bias_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1))
        self.d1 = Dense(
            64,
            activation='relu',
            kernel_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1),
            bias_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1))
        self.d2 = Dense(
            1,
            kernel_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1),
            bias_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1))
        if use_de:
            self.user_embeddings = tfra.dynamic_embedding.get_variable(
                name="user_dynamic_embeddings",
                dim=self.embedding_size,
                initializer=tf.keras.initializers.RandomNormal(-1, 1),
                key_dtype=tf.int64)
            self.movie_embeddings = tfra.dynamic_embedding.get_variable(
                name="moive_dynamic_embeddings",
                dim=self.embedding_size,
                initializer=tf.keras.initializers.RandomNormal(-1, 1),
                key_dtype=tf.int64)
        else:
            self.user_embeddings = self.add_weight(
                name=f"user_embeddings",
                shape=(10000, self.embedding_size),
                dtype=tf.float32,
                initializer=tf.keras.initializers.RandomNormal(-1, 1),
                trainable=True,
            )
            self.movie_embeddings = self.add_weight(
                name=f"movie_embeddings",
                shape=(10000, self.embedding_size),
                dtype=tf.float32,
                initializer=tf.keras.initializers.RandomNormal(-1, 1),
                trainable=True,
            )


    def call(self, batch):
        movie_id = batch["movie_id"]
        user_id = batch["user_id"]
        
        trainable_wrappers = []
        if self.use_de:
            user_id_weights, user_id_trainable_wrapper = tfra.dynamic_embedding.embedding_lookup_unique(
                params=self.user_embeddings,
                ids=user_id,
                name="user-id-weights",
                return_trainable=True

            )
            movie_id_weights, movie_id_trainable_wrapper = tfra.dynamic_embedding.embedding_lookup_unique(
                params=self.movie_embeddings,
                ids=movie_id,
                name="movie-id-weights",
                return_trainable=True
            )
            trainable_wrappers = [user_id_trainable_wrapper, movie_id_trainable_wrapper]
        else:
            user_id_weights = tf.gather(self.user_embeddings, user_id)
            movie_id_weights = tf.gather(self.movie_embeddings, movie_id)

        embeddings = tf.concat([user_id_weights, movie_id_weights], axis=1)
        dnn = self.d0(embeddings)
        dnn = self.d1(dnn)
        dnn = self.d2(dnn)
        out = tf.reshape(dnn, shape=[-1])
        return out, trainable_wrappers


def create_in_process_cluster(num_workers, num_ps):
  """Creates and starts local servers and returns the cluster_resolver."""
  worker_ports = [portpicker.pick_unused_port() for _ in range(num_workers)]
  ps_ports = [portpicker.pick_unused_port() for _ in range(num_ps)]

  cluster_dict = {}
  cluster_dict["worker"] = ["localhost:%s" % port for port in worker_ports]
  if num_ps > 0:
    cluster_dict["ps"] = ["localhost:%s" % port for port in ps_ports]

  cluster_spec = tf.train.ClusterSpec(cluster_dict)

  # Workers need some inter_ops threads to work properly.
  worker_config = tf.compat.v1.ConfigProto()
  if multiprocessing.cpu_count() < num_workers + 1:
    worker_config.inter_op_parallelism_threads = num_workers + 1

  for i in range(num_workers):
    tf.distribute.Server(
        cluster_spec,
        job_name="worker",
        task_index=i,
        config=worker_config,
        protocol="grpc")

  for i in range(num_ps):
    tf.distribute.Server(
        cluster_spec,
        job_name="ps",
        task_index=i,
        protocol="grpc")

  cluster_resolver = tf.distribute.cluster_resolver.SimpleClusterResolver(
      cluster_spec, rpc_layer="grpc")
  return cluster_resolver



os.environ["GRPC_FAIL_FAST"] = "use_caller"

NUM_WORKERS = 2
NUM_PS = 1
cluster_resolver = create_in_process_cluster(NUM_WORKERS, NUM_PS)

strategy = tf.distribute.experimental.ParameterServerStrategy(
    cluster_resolver,
    variable_partitioner=None)

use_de = True  # code works fine if use_de=False
with strategy.scope():
    model = NCFModel(use_de)
    optimizer = tf.keras.optimizers.Adam()
    if use_de:
        optimizer = tfra.dynamic_embedding.DynamicEmbeddingOptimizer(optimizer)

@tf.function
def step_fn(iterator):
    def replica_fn(batch):
        with tf.GradientTape() as tape:
            pred, trainable_wrappers = model(batch, training=True)
            rating = batch['user_rating']
            per_example_loss = (pred - rating)**2
            loss = tf.nn.compute_average_loss(per_example_loss)
        gradients = tape.gradient(loss, model.trainable_variables + trainable_wrappers)
        optimizer.apply_gradients(zip(gradients, model.trainable_variables + trainable_wrappers))
        return loss

    batch_data = next(iterator)
    losses = strategy.run(replica_fn, args=(batch_data,))
    sum_loss = strategy.reduce(tf.distribute.ReduceOp.SUM, losses, axis=None)
    return sum_loss


def get_dataset_fn(input_context):
    global_batch_size = 256
    batch_size = input_context.get_per_replica_batch_size(global_batch_size)
    ratings = tfds.load("movielens/100k-ratings", split="train")
    ratings = ratings.map(lambda x: {
        "movie_id": tf.strings.to_number(x["movie_id"], tf.int64),
        "user_id": tf.strings.to_number(x["user_id"], tf.int64),
        "user_rating": x["user_rating"]
    })
    wid = input_context.input_pipeline_id
    shuffled = ratings.shuffle(100_000, seed=wid, reshuffle_each_iteration=False)
    dataset_train = shuffled.take(100_000).batch(batch_size).repeat()
    return dataset_train


@tf.function
def per_worker_dataset_fn():
    return strategy.distribute_datasets_from_function(get_dataset_fn)


coordinator = tf.distribute.experimental.coordinator.ClusterCoordinator(strategy)
per_worker_dataset = coordinator.create_per_worker_dataset(per_worker_dataset_fn)
per_worker_iterator = iter(per_worker_dataset)
num_epoches = 20
steps_per_epoch = 100
for i in range(num_epoches):
    total_loss = []
    for _ in range(steps_per_epoch):
        remote = coordinator.schedule(step_fn, args=(per_worker_iterator,))
        total_loss.append(remote.fetch())
    coordinator.join()
    print("epoch", i, "loss", np.mean(total_loss))

Other info / logs

Error:

Traceback (most recent call last):
  File "ps_test.py", line 193, in <module>
    remote = coordinator.schedule(step_fn, args=(per_worker_iterator,))
  File "/home/npbool/Projects/tfra_test/venv/lib/python3.7/site-packages/tensorflow/python/distribute/coordinator/cluster_coordinator.py", line 1150, in schedule
    remote_value = self._cluster.schedule(fn, args=args, kwargs=kwargs)
  File "/home/npbool/Projects/tfra_test/venv/lib/python3.7/site-packages/tensorflow/python/distribute/coordinator/cluster_coordinator.py", line 977, in schedule
    kwargs=kwargs)
  File "/home/npbool/Projects/tfra_test/venv/lib/python3.7/site-packages/tensorflow/python/distribute/coordinator/cluster_coordinator.py", line 363, in __init__
    **nest.map_structure(_maybe_as_type_spec, replica_kwargs))
  File "/home/npbool/Projects/tfra_test/venv/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 1367, in get_concrete_function
    concrete = self._get_concrete_function_garbage_collected(*args, **kwargs)
  File "/home/npbool/Projects/tfra_test/venv/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 1273, in _get_concrete_function_garbage_collected
    self._initialize(args, kwargs, add_initializers_to=initializers)
  File "/home/npbool/Projects/tfra_test/venv/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 764, in _initialize
    *args, **kwds))
  File "/home/npbool/Projects/tfra_test/venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3050, in _get_concrete_function_internal_garbage_collected
    graph_function, _ = self._maybe_define_function(args, kwargs)
  File "/home/npbool/Projects/tfra_test/venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3444, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/home/npbool/Projects/tfra_test/venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3289, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "/home/npbool/Projects/tfra_test/venv/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 999, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/home/npbool/Projects/tfra_test/venv/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 672, in wrapped_fn
    out = weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "/home/npbool/Projects/tfra_test/venv/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 986, in wrapper
    raise e.ag_error_metadata.to_exception(e)
AttributeError: in user code:

    ps_test.py:156 replica_fn  *
        optimizer.apply_gradients(zip(gradients, model.trainable_variables + trainable_wrappers))
    /home/npbool/Projects/tfra_test/venv/lib/python3.7/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:636 apply_gradients  **
        self._create_all_weights(var_list)
    /home/npbool/Projects/tfra_test/venv/lib/python3.7/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:823 _create_all_weights
        self._create_slots(var_list)
    /home/npbool/Projects/tfra_test/venv/lib/python3.7/site-packages/tensorflow/python/keras/optimizer_v2/adam.py:124 _create_slots
        self.add_slot(var, 'm')
    /home/npbool/Projects/tfra_test/venv/lib/python3.7/site-packages/tensorflow_recommenders_addons/dynamic_embedding/python/ops/dynamic_embedding_optimizer.py:177 add_slot
        with strategy.extended.colocate_vars_with(var):
    /home/npbool/Projects/tfra_test/venv/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py:2217 colocate_vars_with
        self._validate_colocate_with_variable(colocate_with_variable)
    /home/npbool/Projects/tfra_test/venv/lib/python3.7/site-packages/tensorflow/python/distribute/parameter_server_strategy.py:339 _validate_colocate_with_variable
        distribute_utils.validate_colocate(colocate_with_variable, self)
    /home/npbool/Projects/tfra_test/venv/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_utils.py:246 validate_colocate
        _validate_colocate_extended(v, extended)
    /home/npbool/Projects/tfra_test/venv/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_utils.py:226 _validate_colocate_extended
        if variable_strategy.extended is not extended:

    AttributeError: 'NoneType' object has no attribute 'extended'

Python leakage when calling tf.function backward

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): CentOS 7
  • TensorFlow version and how it was installed (source or binary): Binary
  • TensorFlow-Recommenders-Addons version and how it was installed (source or binary): Source
  • Python version: python3.6
  • Is GPU used? (yes/no): no

Describe the bug
When wrapping the graph function into tf.function in eager mode, the Python leakage shows up while compute backward.

Code to reproduce the issue

import tensorflow as tf

from tensorflow.python.training import adam
from tensorflow_recommenders_addons import dynamic_embedding as de

var = de.get_variable('my_var', key_dtype=tf.int64, value_dtype=tf.float32, dim=2, initializer=0.1)
opt = adam.AdamOptimizer(0.1)
opt = de.DynamicEmbeddingOptimizer(opt)

@tf.function
def foo(var, ids):
  embd, tw = de.embedding_lookup(var, ids, return_trainable=True)
  loss = tf.math.reduce_sum(embd)
  return loss, tw

tw_list = []

ids = tf.constant([1,2,3], dtype=tf.int64)

def loss_fn(var, ids):
  loss, tw = foo(var, ids)
  tw_list = [tw]
  return loss

opt.minimize(lambda: loss_fn(var, ids), tw_list)

print(var.export())
print(tw_list)

And it will get some code leak in function. Traceback:

Traceback (most recent call last):
  File "test.py", line 27, in <module>
    opt.minimize(lambda: loss_fn(var, ids), tw_list)
  File "/data/env/py3/lib/python3.8/site-packages/tensorflow/python/training/optimizer.py", line 412, in minimize
    return self.apply_gradients(grads_and_vars, global_step=global_step,
  File "/data/env/py3/lib/python3.8/site-packages/tensorflow/python/training/optimizer.py", line 597, in apply_gradients
    self._create_slots(var_list)
  File "/data/env/py3/lib/python3.8/site-packages/tensorflow/python/training/adam.py", line 138, in _create_slots
    self._zeros_slot(v, "m", self._name)
  File "/data/env/py3/lib/python3.8/site-packages/tensorflow_recommenders_addons/dynamic_embedding/python/ops/dynamic_embedding_optimizer.py", line 270, in _zeros_slot
    new_slot_variable = de.create_slots(var, 0.0, slot_name, op_name,
  File "/data/env/py3/lib/python3.8/site-packages/tensorflow_recommenders_addons/dynamic_embedding/python/ops/dynamic_embedding_optimizer.py", line 325, in create_slots
    _, slot_trainable = de.embedding_lookup(
  File "/data/env/py3/lib/python3.8/site-packages/tensorflow_recommenders_addons/dynamic_embedding/python/ops/dynamic_embedding_ops.py", line 566, in embedding_lookup
    embeddings = array_ops.identity(trainable_)
  File "/data/env/py3/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py", line 201, in wrapper
    return target(*args, **kwargs)
  File "/data/env/py3/lib/python3.8/site-packages/tensorflow/python/ops/array_ops.py", line 286, in identity
    input = ops.convert_to_tensor(input)
  File "/data/env/py3/lib/python3.8/site-packages/tensorflow/python/profiler/trace.py", line 163, in wrapped
    return func(*args, **kwargs)
  File "/data/env/py3/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 1540, in convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/data/env/py3/lib/python3.8/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 1992, in _dense_var_to_tensor
    return var._dense_var_to_tensor(dtype=dtype, name=name, as_ref=as_ref)  # pylint: disable=protected-access
  File "/data/env/py3/lib/python3.8/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 1393, in _dense_var_to_tensor
    return self.value()
  File "/data/env/py3/lib/python3.8/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 565, in value
    return self._read_variable_op()
  File "/data/env/py3/lib/python3.8/site-packages/tensorflow_recommenders_addons/dynamic_embedding/python/ops/dynamic_embedding_ops.py", line 381, in _read_variable_op
    self.prefetch_values(),
  File "/data/env/py3/lib/python3.8/site-packages/tensorflow_recommenders_addons/dynamic_embedding/python/ops/dynamic_embedding_ops.py", line 95, in prefetch_values
    self.prefetch_values_op = self.transform(self.params.lookup(self.ids))
  File "/data/env/py3/lib/python3.8/site-packages/tensorflow_recommenders_addons/dynamic_embedding/python/ops/dynamic_embedding_variable.py", line 534, in lookup
    partition_index = self.partition_fn(keys, self.shard_num)
  File "/data/env/py3/lib/python3.8/site-packages/tensorflow_recommenders_addons/dynamic_embedding/python/ops/dynamic_embedding_variable.py", line 118, in default_partition_fn
    keys_int32 = math_ops.cast(bitwise_ops.bitwise_and(keys_op, mask),
  File "/data/env/py3/lib/python3.8/site-packages/tensorflow/python/ops/gen_bitwise_ops.py", line 69, in bitwise_and
    return bitwise_and_eager_fallback(
  File "/data/env/py3/lib/python3.8/site-packages/tensorflow/python/ops/gen_bitwise_ops.py", line 108, in bitwise_and_eager_fallback
    _result = _execute.execute(b"BitwiseAnd", 1, inputs=_inputs_flat,
  File "/data/env/py3/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 75, in quick_execute
    raise e
  File "/data/env/py3/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
TypeError: An op outside of the function building code is being passed
a "Graph" tensor. It is possible to have Graph tensors
leak out of the function building context by including a
tf.init_scope in your function building code.
For example, the following function will fail:
  @tf.function
  def has_init_scope():
    my_constant = tf.constant(1.)
    with tf.init_scope():
      added = my_constant * 2
The graph tensor has name: ids:0

It seems that I cannot create a variable inside the tf.function and use it outside. The tw is used to present local trainable variables as projection to the sparse domain. If I create tw before the tf.function has been defined, then the tw need to be passed to the func, which is against the embedding_lookup API's semantic.

Is there any method to handle this situation?

dynamic_embedding does not return the same embeddings for the same id

System information

  • OS Platform and Distribution: Linux Ubuntu 18.04.5
  • TensorFlow version and how it was installed (source or binary): 2.4.1 in docker
  • TensorFlow-Recommenders-Addons version and how it was installed (source or binary): 0.2.0 use pip
  • Python version: 3.6.9
  • Is GPU used? (yes/no): no

Describe the bug

dynamic_embedding.embedding_lookup does not return the same embeddings for the same id.

import tensorflow as tf
import tensorflow_recommenders_addons as tfra
x = tf.constant([1, 2, 3], dtype=tf.int64)
embedding_dict = tfra.dynamic_embedding.get_variable(
    name="x_embeddings", 
    dim=3, 
    initializer=tf.keras.initializers.RandomNormal(-1, 1))

print(tfra.dynamic_embedding.embedding_lookup(
    params=embedding_dict,
    ids=x,
    name="x_embedding"))
print(tfra.dynamic_embedding.embedding_lookup(
    params=embedding_dict,
    ids=x,
    name="x_embedding"))

result is:

tf.Tensor( [[-0.6480133 -1.7569897 -1.7143806] [ 0.9366411 0.2751621 -2.882576 ] [-3.657474 -0.5267197 -1.1529406]], shape=(3, 3), dtype=float32)
tf.Tensor( [[-0.48164272 -0.32765645 -2.0944896 ] [-1.5339627 -1.8370364 0.68307126] [-0.36742425 0.06579518 -0.31739855]], shape=(3, 3), dtype=float32)

I also tried EmbeddingVariable in the same manner, it works fine.

So what's the problem when I use dynamic_embedding?

Thanks for your help

EV is_initialize_op不起作用

class VarIsInitializedOp : public OpKernel {
 public:
  explicit VarIsInitializedOp(OpKernelConstruction* c) : OpKernel(c) {}

  void Compute(OpKernelContext* context) override {
    Tensor* output = nullptr;
    OP_REQUIRES_OK(context,
                   context->allocate_output(0, TensorShape({}), &output));
    auto output_tensor = output->tensor<bool, 0>();
    core::RefCountPtr<Var> variable;
    Status s = LookupResource(context, HandleFromInput(context, 0), &variable);
    std::cout << "VarIsInitializedOp: Status" << s.error_message() << std::endl;
    std::cout << "VarIsInitializedOp: Status" << s.ok() << HandleFromInput(context, 0).DebugString() << std::endl;
    if (!s.ok()) {
      output_tensor() = false;
      return;
    }
    mutex_lock ml(*variable->mu());
    output_tensor() = variable->is_initialized;
  }
};
template <typename T, bool use_dynamic_cast>
Status LookupResource(OpKernelContext* ctx, const ResourceHandle& p,
                      T** value) {
  TF_RETURN_IF_ERROR(internal::ValidateDeviceAndType<T>(ctx, p));
  return ctx->resource_manager()->Lookup<T, use_dynamic_cast>(p.container(),
                                                              p.name(), value);
}

template <typename T>
Status LookupResource(OpKernelContext* ctx, const ResourceHandle& p,
                      core::RefCountPtr<T>* value) {
  T* raw_ptr = nullptr;
  TF_RETURN_IF_ERROR(LookupResource<T, false>(ctx, p, &raw_ptr));
  value->reset(raw_ptr);

  return Status::OK();
}
Status ValidateDeviceAndType(OpKernelContext* ctx, const ResourceHandle& p) {
  TF_RETURN_IF_ERROR(internal::ValidateDevice(ctx, p));
  auto type_index = MakeTypeIndex<T>();
  if (type_index.hash_code() != p.hash_code()) {
    return errors::InvalidArgument(
        "Trying to access resource using the wrong type. Expected ",
        p.maybe_type_name(), " got ", type_index.name());
  }
  return Status::OK();
}

Cannot convert a symbolic Tensor (user_id:0) to a numpy array

Now I'm trying to run through the example provided by the community:

https://github.com/tensorflow/recommenders-addons/blob/master/docs/tutorials/dynamic_embedding_tutorial.ipynb

When I add

tf.saved_model.save(model, "/my_savepath/")

to save my model, this following exception occurred :

NotImplementedError: Cannot convert a symbolic Tensor (user_id:0) to a numpy array. This error may indicate that you're trying to pass a Tensor to a NumPy call, which is not supported

It may cuse by trainable_wrappers...

How can I save my model without using compat.v1 ?
please help me, thank you!

Add Support for Apple Silicon (M1)

Hardware info
Machine: MacBook Air
OS: macOS BigSur 11.2.3
Chip: Apple M1 (8cpu, 8gpu)
RAM: 16GB

Software info
Python:3.8
PIP: 21.0.1
Miniforge: 4.10.1
Tensorflow-macos: 2.4.0-rc0 (0.1a3)
Bazel: 4.0.0-homebrew
Homebrew: 3.1.3

Description
I couldn't use pip install tensorflow-recommenders-addons to install the addon on my system, I tried Rosseta x86 terminal, but the 2.4.1 tensorflow can't run in Rosseta. So I tried to compile the source code by Bazel.

To compile the Apple Silicon version, I changed:
./.bazelversion
from 3.1.0
to 4.0.0

./build_deps/build_pip_pkg.sh
from
BUILD_CMD="${BUILD_CMD} --plat-name macosx_10_13_x86_64"
to
BUILD_CMD="${BUILD_CMD} --plat-name macosx_11_0_arm64

./requirement.txt
delete tensorflow=2.4.1

./tensorflow_recommenders_addons/version.py
from
MIN_TF_VERSION = "2.4.1"
MAX_TF_VERSION = "2.4.1"
to
MIN_TF_VERSION = "2.4.0-rc0"
MAX_TF_VERSION = "2.4.0-rc0"

Compiling Process

git clone https://github.com/tensorflow/recommenders-addons.git
cd recommenders-addons

# This script links project with TensorFlow dependency
python3 ./configure.py

bazel build --enable_runfiles build_pip_pkg
bazel-bin/build_pip_pkg artifacts

pip install artifacts/tensorflow_recommenders_addons-*.whl

Here is the wheel file:
tensorflow_recommenders_addons-0.0.1.dev0-cp38-cp38-macosx_11_0_arm64.whl

The Forked Repo:
tf-recommender-m1
Release Page

Current issue
I still get some warning messages while running the tutorial ipynb file.
The dynamic_embedding_tutorial.ipynb works properly except for some warning messages, but the embedding_variable_tutorial.ipynb raises an error as follows:
InvalidArgumentError: type of indices is not match with EmbeddingVariable key type.
I also tried to use Colab to run the embedding_variable_tutorial.ipynb, and I got the same error. I think it might not be my problem.

model size

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
  • TensorFlow version and how it was installed (source or binary): bianry
  • TensorFlow-Recommenders-Addons version and how it was installed (source or binary): binary
  • Python version: 3.6
  • Is GPU used? (yes/no): yes

Describe the bug

recommenders-addons/demo/movielens-100k-estimator/movielens-100k-estimator.py

I used the code in the demo to execute on a single-machine GPU, and found that the size of the variables in the savemodel last saved using tfra would be much smaller than that of just tf without tfra。 The dataset used is input_fn, and the follow-up is the eatimator framework used in the demo。

Is the size of the final model related to the storage of the estimator?

use tf.keras.layers.Embedding:
2G model.ckpt-1415.data-00000-of-00001
9.8K model.ckpt-1415.index
4.5M model.ckpt-1415.meta

use tfra:
83M model.ckpt-1133.data-00001-of-00002
28K model.ckpt-1133.index
6.5M model.ckpt-1133.meta

save code follow:

def export_for_serving():
model_config = tf.estimator.RunConfig(
save_summary_steps=200,
save_checkpoints_steps=1000,
keep_checkpoint_max=2,
log_step_count_steps=1000,
save_checkpoints_secs=None
)
model_dir = ""
estimator = tf.estimator.Estimator(model_fn=model_fn, model_dir=model_dir, config=model_config)

_features = _serving_features()
_receiver_fn = tf.estimator.export.build_raw_serving_input_receiver_fn(_features)
model_dir = estimator.export_saved_model(os.path.join(model_dir, 'export', 'best'),
                                            serving_input_receiver_fn=_receiver_fn)

Other info / logs

The log is as follows and has been repeating the following

2021-08-19 11:17:46.933190: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/device:GPU:0) -> physical GPU (device: 0)
WARNING:tensorflow:From neww.py:292: auc (from tensorflow.python.ops.metrics_impl) is deprecated and will be removed in a future version.
Instructions for updating:
The value of AUC returned by this may race with the update so this is deprecated. Please use tf.keras.metrics.AUC instead.
2021-08-19 11:17:54.983175: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-08-19 11:17:54.984881: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
name: computeCapability: 7.0
2021-08-19 11:17:54.984927: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-08-19 11:17:54.984966: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-08-19 11:17:54.984981: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-08-19 11:17:54.984991: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-08-19 11:17:54.985000: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-08-19 11:17:54.985009: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-08-19 11:17:54.985018: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-08-19 11:17:54.985027: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-08-19 11:17:54.987827: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-08-19 11:17:54.987895: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-08-19 11:17:54.987915: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0
2021-08-19 11:17:54.987921: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N
2021-08-19 11:17:54.990781: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0) -> physical GPU ()
2021-08-19 11:17:55.153587: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2021-08-19 11:17:55.394280: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: Hz
2021-08-19 11:18:03.587310: I tensorflow_recommenders_addons/dynamic_embedding/core/kernels/cuckoo_hashtable_op_gpu.cu.cc:81] HashTable on GPU is created successfully: K=x, V=f, max_size=8192, min_size=8192
2021-08-19 11:18:03.588022: I tensorflow_recommenders_addons/dynamic_embedding/core/kernels/cuckoo_hashtable_op_gpu.cu.cc:81] HashTable on GPU is created successfully: K=x, V=f, max_size=8192, min_size=8192
2021-08-19 11:18:03.588581: I tensorflow_recommenders_addons/dynamic_embedding/core/kernels/cuckoo_hashtable_op_gpu.cu.cc:81] HashTable on GPU is created successfully: K=x, V=f, max_size=8192, min_size=8192
2021-08-19 11:18:03.589058: I tensorflow_recommenders_addons/dynamic_embedding/core/kernels/cuckoo_hashtable_op_gpu.cu.cc:81] HashTable on GPU is created successfully: K=x, V=f, max_size=8192, min_size=8192
2021-08-19 11:18:03.589577: I tensorflow_recommenders_addons/dynamic_embedding/core/kernels/cuckoo_hashtable_op_gpu.cu.cc:81] HashTable on GPU is created successfully: K=x, V=f, max_size=8192, min_size=8192
2021-08-19 11:53:51.361084: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-08-19 11:53:54.617948: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-08-19 11:53:54.961718: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-08-19 11:54:16.238819: I tensorflow_recommenders_addons/dynamic_embedding/core/kernels/cuckoo_hashtable_op_gpu.cu.cc:187] HashTable on GPU changes to new status: [].
2021-08-19 11:54:16.240621: I tensorflow_recommenders_addons/dynamic_embedding/core/kernels/cuckoo_hashtable_op_gpu.cu.cc:187] HashTable on GPU changes to new status: [].
2021-08-19 11:54:17.617531: I tensorflow_recommenders_addons/dynamic_embedding/core/kernels/cuckoo_hashtable_op_gpu.cu.cc:187] HashTable on GPU changes to new status: [].
2021-08-19 11:54:17.619626: I tensorflow_recommenders_addons/dynamic_embedding/core/kernels/cuckoo_hashtable_op_gpu.cu.cc:187] HashTable on GPU changes to new status: [].
2021-08-19 11:54:17.621196: I tensorflow_recommenders_addons/dynamic_embedding/core/kernels/cuckoo_hashtable_op_gpu.cu.cc:187] HashTable on GPU changes to new status: [].
2021-08-19 11:54:19.485414: I tensorflow_recommenders_addons/dynamic_embedding/core/kernels/cuckoo_hashtable_op_gpu.cu.cc:187] HashTable on GPU changes to new status: [].
2021-08-19 11:54:19.487182: I tensorflow_recommenders_addons/dynamic_embedding/core/kernels/cuckoo_hashtable_op_gpu.cu.cc:187] HashTable on GPU changes to new status: [].
2021-08-19 11:54:19.488715: I tensorflow_recommenders_addons/dynamic_embedding/core/kernels/cuckoo_hashtable_op_gpu.cu.cc:187] HashTable on GPU changes to new status: [].

keras save model error

System information

  • OS Platform and Distribution
    Fedora 34
  • TensorFlow version and how it was installed (source or binary):
    2.5.1 CPU, build from source
  • TensorFlow-Recommenders-Addons version and how it was installed (source or binary):
    0.3.0
  • Python version:
    3.8.3
  • Is GPU used? (yes/no):
    no

Describe the bug

save model error with keras api.

Code to reproduce the issue

embedding define (in a self-define layer):

          self.user_embeddings = tfra.dynamic_embedding.get_variable(
            name="user_dynamic_embeddings",
            dim=8,
            devices="CPU:0",
            initializer=tf.keras.initializers.Zeros())

embedding call:

        user_id_hashed = tf.strings.to_hash_bucket_fast(features["f003.uid"], 500000000)
        user_id_val, user_id_idx = tf.unique(tf.concat(user_id_hashed, axis=0))
        user_id_weights, user_id_trainable_wrapper = tfra.dynamic_embedding.embedding_lookup(
            params=self.user_embeddings,
            ids=user_id_val,
            name="user-id-weights",
            return_trainable=True)
        user_id_weights = tf.gather(user_id_weights, user_id_idx)

model save

model.save_weights(os.path.join(self.model_dir, 'weights')) #  ok
model.save(saving_dir, signatures=signatures, include_optimizer=False, options=tf.saved_model.SaveOptions(namespace_whitelist=['TFRA']) # error

*error

AssertionError: Called a function referencing variables which have been deleted. This likely means that function-local variables were created and not referenced elsewhere in the program. This is generally a mistake; consider storing variables in an object attribute on first call.

Other info / logs

2021-10-17 12:35:26,129 : model weights saved !
WARNING:tensorflow:Skipping full serialization of Keras layer <tensorflow.python.keras.layers.embeddings.Embedding object at 0x7f5b00aae250>, because it is not built.
2021-10-17 12:35:32,714 : Skipping full serialization of Keras layer <tensorflow.python.keras.layers.embeddings.Embedding object at 0x7f5b00aae250>, because it is not built.
Traceback (most recent call last):
File "/home/chenliguo/tools/anaconda3/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/chenliguo/tools/anaconda3/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/carlos/Downloads/models82/wdl/train_base.py", line 151, in
fire.Fire(main)
File "/home/chenliguo/tools/anaconda3/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/chenliguo/tools/anaconda3/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/chenliguo/tools/anaconda3/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/carlos/Downloads/models82/wdl/train_base.py", line 146, in main
base_trainer.work()
File "/home/carlos/Downloads/models82/wdl/train_base.py", line 71, in work
self.train_by_file_names(file_names1)
File "/home/carlos/Downloads/models82/wdl/train_base.py", line 97, in train_by_file_names
self.train(file_names[-1:])
File "/home/carlos/Downloads/models82/wdl/utils/rank_models.py", line 111, in train
model.save(saving_dir, signatures=signatures, include_optimizer=False)
File "/home/chenliguo/tools/anaconda3/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 2106, in save
save.save_model(self, filepath, overwrite, include_optimizer, save_format,
File "/home/chenliguo/tools/anaconda3/lib/python3.8/site-packages/tensorflow/python/keras/saving/save.py", line 150, in save_model
saved_model_save.save(model, filepath, overwrite, include_optimizer,
File "/home/chenliguo/tools/anaconda3/lib/python3.8/site-packages/tensorflow/python/keras/saving/saved_model/save.py", line 89, in save
saved_nodes, node_paths = save_lib.save_and_return_nodes(
File "/home/chenliguo/tools/anaconda3/lib/python3.8/site-packages/tensorflow/python/saved_model/save.py", line 1103, in save_and_return_nodes
_build_meta_graph(obj, signatures, options, meta_graph_def,
File "/home/chenliguo/tools/anaconda3/lib/python3.8/site-packages/tensorflow/python/saved_model/save.py", line 1290, in _build_meta_graph
return _build_meta_graph_impl(obj, signatures, options, meta_graph_def,
File "/home/chenliguo/tools/anaconda3/lib/python3.8/site-packages/tensorflow/python/saved_model/save.py", line 1212, in _build_meta_graph_impl
signature_serialization.validate_saveable_view(checkpoint_graph_view)
File "/home/chenliguo/tools/anaconda3/lib/python3.8/site-packages/tensorflow/python/saved_model/signature_serialization.py", line 301, in validate_saveable_view
for name, dep in saveable_view.list_dependencies(
File "/home/chenliguo/tools/anaconda3/lib/python3.8/site-packages/tensorflow/python/saved_model/save.py", line 120, in list_dependencies
extra_dependencies = self.list_extra_dependencies(obj)
File "/home/chenliguo/tools/anaconda3/lib/python3.8/site-packages/tensorflow/python/saved_model/save.py", line 148, in list_extra_dependencies
return obj._list_extra_dependencies_for_serialization( # pylint: disable=protected-access
File "/home/chenliguo/tools/anaconda3/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer.py", line 3012, in _list_extra_dependencies_for_serialization
return (self._trackable_saved_model_saver
File "/home/chenliguo/tools/anaconda3/lib/python3.8/site-packages/tensorflow/python/keras/saving/saved_model/base_serialization.py", line 76, in list_extra_dependencies_for_serialization
return self.objects_to_serialize(serialization_cache)
File "/home/chenliguo/tools/anaconda3/lib/python3.8/site-packages/tensorflow/python/keras/saving/saved_model/layer_serialization.py", line 69, in objects_to_serialize
return (self._get_serialized_attributes(
File "/home/chenliguo/tools/anaconda3/lib/python3.8/site-packages/tensorflow/python/keras/saving/saved_model/layer_serialization.py", line 89, in _get_serialized_attributes
object_dict, function_dict = self._get_serialized_attributes_internal(
File "/home/chenliguo/tools/anaconda3/lib/python3.8/site-packages/tensorflow/python/keras/saving/saved_model/model_serialization.py", line 53, in _get_serialized_attributes_internal
super(ModelSavedModelSaver, self)._get_serialized_attributes_internal(
File "/home/chenliguo/tools/anaconda3/lib/python3.8/site-packages/tensorflow/python/keras/saving/saved_model/layer_serialization.py", line 99, in _get_serialized_attributes_internal
functions = save_impl.wrap_layer_functions(self.obj, serialization_cache)
File "/home/chenliguo/tools/anaconda3/lib/python3.8/site-packages/tensorflow/python/keras/saving/saved_model/save_impl.py", line 204, in wrap_layer_functions
fn.get_concrete_function()
File "/home/chenliguo/tools/anaconda3/lib/python3.8/contextlib.py", line 120, in exit
next(self.gen)
File "/home/chenliguo/tools/anaconda3/lib/python3.8/site-packages/tensorflow/python/keras/saving/saved_model/save_impl.py", line 367, in tracing_scope
fn.get_concrete_function(*args, **kwargs)
File "/home/chenliguo/tools/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 1367, in get_concrete_function
concrete = self._get_concrete_function_garbage_collected(*args, **kwargs)
File "/home/chenliguo/tools/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 1284, in _get_concrete_function_garbage_collected
concrete = self._stateful_fn._get_concrete_function_garbage_collected( # pylint: disable=protected-access
File "/home/chenliguo/tools/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3100, in _get_concrete_function_garbage_collected
graph_function, _ = self._maybe_define_function(args, kwargs)
File "/home/chenliguo/tools/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3444, in _maybe_define_function
graph_function = self._create_graph_function(args, kwargs)
File "/home/chenliguo/tools/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3279, in _create_graph_function
func_graph_module.func_graph_from_py_func(
File "/home/chenliguo/tools/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py", line 999, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File "/home/chenliguo/tools/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 672, in wrapped_fn
out = weak_wrapped_fn().wrapped(*args, **kwds)
File "/home/chenliguo/tools/anaconda3/lib/python3.8/site-packages/tensorflow/python/keras/saving/saved_model/save_impl.py", line 599, in wrapper
ret = method(*args, **kwargs)
File "/home/chenliguo/tools/anaconda3/lib/python3.8/site-packages/tensorflow/python/keras/saving/saved_model/utils.py", line 165, in wrap_with_training_arg
return control_flow_util.smart_cond(
File "/home/chenliguo/tools/anaconda3/lib/python3.8/site-packages/tensorflow/python/keras/utils/control_flow_util.py", line 109, in smart_cond
return smart_module.smart_cond(
File "/home/chenliguo/tools/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/smart_cond.py", line 56, in smart_cond
return false_fn()
File "/home/chenliguo/tools/anaconda3/lib/python3.8/site-packages/tensorflow/python/keras/saving/saved_model/utils.py", line 167, in
lambda: replace_training_and_call(False))
File "/home/chenliguo/tools/anaconda3/lib/python3.8/site-packages/tensorflow/python/keras/saving/saved_model/utils.py", line 163, in replace_training_and_call
return wrapped_call(*args, **kwargs)
File "/home/chenliguo/tools/anaconda3/lib/python3.8/site-packages/tensorflow/python/keras/saving/saved_model/save_impl.py", line 681, in call
return call_and_return_conditional_losses(inputs, *args, **kwargs)[0]
File "/home/chenliguo/tools/anaconda3/lib/python3.8/site-packages/tensorflow/python/keras/saving/saved_model/save_impl.py", line 639, in call
return self.wrapped_call(*args, **kwargs)
File "/home/chenliguo/tools/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 889, in call
result = self._call(*args, **kwds)
File "/home/chenliguo/tools/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 924, in _call
results = self._stateful_fn(*args, **kwds)
File "/home/chenliguo/tools/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3023, in call
return graph_function._call_flat(
File "/home/chenliguo/tools/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1915, in _call_flat
for v in self._func_graph.variables:
File "/home/chenliguo/tools/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py", line 495, in variables
return tuple(deref(v) for v in self._weak_variables)
File "/home/chenliguo/tools/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py", line 495, in
return tuple(deref(v) for v in self._weak_variables)
File "/home/chenliguo/tools/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py", line 487, in deref
raise AssertionError(
AssertionError: Called a function referencing variables which have been deleted. This likely means that function-local variables were created and not referenced elsewhere in the program. This is generally a mistake; consider storing variables in an object attribute on first call.

Build GPU operation got bzl error.

I'm trying to build a GPU operation for TFRA. here

And use compilation procedure as follows:

# set env options:
CUDNN_INSTALL_PATH="/data/dev/cuda"  TF_NEED_CUDA=1 python configure.py

I get the correct environment settings:

Configuring TensorFlow Recommenders-Addons to be built from source...
> Building GPU & CPU ops

Build configurations successfully written to .bazelrc :

build --action_env TF_HEADER_DIR="/data/env/py3/lib64/python3.6/site-packages/tensorflow/include"
build --action_env TF_SHARED_LIBRARY_DIR="/data/env/py3/lib64/python3.6/site-packages/tensorflow"
build --action_env TF_SHARED_LIBRARY_NAME="libtensorflow_framework.so.2"
build --action_env TF_CXX11_ABI_FLAG="0"
build --spawn_strategy=standalone
build --strategy=Genrule=standalone
build -c opt
build --action_env TF_NEED_CUDA="1"
build --action_env CUDA_TOOLKIT_PATH="/usr/local/cuda"
build --action_env CUDNN_INSTALL_PATH="/data/dev/cuda"
build --action_env TF_CUDA_VERSION="10.1"
build --action_env TF_CUDNN_VERSION="7"
test --config=cuda
build --config=cuda
build:cuda --define=using_cuda=true --define=using_cuda_nvcc=true
build:cuda --crosstool_top=@local_config_cuda//crosstool:toolchain

But when I run the building process:

# start building:
bazel build  //tensorflow_recommenders_addons/dynamic_embedding/core:_segment_reduction_ops.so

I got errors:

INFO: Repository local_config_cuda instantiated at:
  no stack (--record_rule_instantiation_callstack not enabled)
Repository rule cuda_configure defined at:
  /data/dev/recommenders-addons/build_deps/toolchains/gpu/cuda_configure.bzl:1083:18: in <toplevel>
ERROR: An error occurred during the fetch of repository 'local_config_cuda':
   Traceback (most recent call last):
        File "/data/dev/recommenders-addons/build_deps/toolchains/gpu/cuda_configure.bzl", line 1081
                _create_local_cuda_repository(<1 more arguments>)
        File "/data/dev/recommenders-addons/build_deps/toolchains/gpu/cuda_configure.bzl", line 1017, in _create_local_cuda_repository
                one_line.split(":")[1]
index out of range (index is 1, but sequence has 1 elements)
ERROR: Skipping '//tensorflow_recommenders_addons/dynamic_embedding/core:_segment_reduction_ops.so': no such package '@local_config_cuda//cuda': Traceback (most recent call last):
        File "/data/dev/recommenders-addons/build_deps/toolchains/gpu/cuda_configure.bzl", line 1081
                _create_local_cuda_repository(<1 more arguments>)
        File "/data/dev/recommenders-addons/build_deps/toolchains/gpu/cuda_configure.bzl", line 1017, in _create_local_cuda_repository
                one_line.split(":")[1]
index out of range (index is 1, but sequence has 1 elements)
WARNING: Target pattern parsing failed.
ERROR: no such package '@local_config_cuda//cuda': Traceback (most recent call last):
        File "/data/dev/recommenders-addons/build_deps/toolchains/gpu/cuda_configure.bzl", line 1081
                _create_local_cuda_repository(<1 more arguments>)
        File "/data/dev/recommenders-addons/build_deps/toolchains/gpu/cuda_configure.bzl", line 1017, in _create_local_cuda_repository
                one_line.split(":")[1]
index out of range (index is 1, but sequence has 1 elements)
INFO: Elapsed time: 0.284s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded)
    currently loading: tensorflow_recommenders_addons/dynamic_embedding/core

It seems there some BUG here for GPU building process.

The bazel toolchain seems to be a large module for me. Are there any suggestions to solve this problem?

pip install wrong

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04
  • TensorFlow version and how it was installed (source or binary): pip install tensorflow-recommenders-addons[tensorflow]
  • TensorFlow-Recommenders-Addons version and how it was installed (source or binary): binary
  • Python version: 3.7.9
  • Is GPU used? (yes/no): yes

Describe the bug

A clear and concise description of what the bug is.

Code to reproduce the issue

Provide a reproducible test case that is the bare minimum necessary to generate the problem.

Other info / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

I have installed recommenders-addons via "pip install tensorflow-recommenders-addons[tensorflow]"> However, it tells me the versions don't match.

image

TFRA use in tf.keras.layers

System information

  • OS Ubantu20.04
  • TensorFlow version 2.4.1
  • TensorFlow-Recommenders-Addons version 0.2.0
  • Python version 3.8
  • Is GPU used no

I defined TFRA as a keras custom layer likes that:

class redis_embedding_layer(tf.keras.layers.Layer):
  def __init__(self, embedding_lenth,layer_name):
    super(radis_embedding_layer, self).__init__()
    self.embedding_lenth = embedding_lenth
    self.layer_name = layer_name

  def build(self, input_shape):
    self.embedding_table = tfra.dynamic_embedding.get_variable(
        name=f"{self.layer_name}_embedding_table",
        devices=['/job:localhost/replica:0/task:0/device:CPU:0'],
        dim=self.embedding_lenth,
        initializer=tf.keras.initializers.RandomNormal(-1, 1))
    
  def call(self, input):
    fea_emb, trainable= tfra.dynamic_embedding.embedding_lookup(
                                        params=self.embedding_table,
                                        ids=input,
                                        name=f"{self.layer_name}_embedding_lookup",
                                        return_trainable=True)
    return fea_emb,trainable

This custom layer can be built in tf.keras successfully, but ‘trainable’ can not be add in Optimizers var_list

how can I add TrainableWrapper when I use tf.keras.layers , I can't fix it , so I turned to the community ask for help...
thank you very much.

[bug]Unable to connect one more RedIs, the inference service does not host model instances that connect one more Redis.无法连接两个redis,推理服务不承载连接两个redis的模型实例。

System information
Any system

Describe the bug

无法连接两个redis,推理服务不承载连接两个redis的模型实例。
Unable to connect one more RedIs, the inference service does not host model instances that connect one more Redis. Singleton mode in Redis backend cause this problem.

Code to reproduce the issue

static redis_client = ...

Other info / logs

Already fix in this PR #174 .

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.