Coder Social home page Coder Social logo

nvidia-merlin / merlin Goto Github PK

View Code? Open in Web Editor NEW
687.0 33.0 110.0 37.54 MB

NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.

License: Apache License 2.0

HCL 5.53% Shell 7.51% Python 86.96%
deep-learning end-to-end gpu-acceleration machine-learning recommendation-system recommender-system

merlin's Introduction

GitHub tag (latest SemVer) GitHub License Documentation

NVIDIA Merlin is an open source library that accelerates recommender systems on NVIDIA GPUs. The library enables data scientists, machine learning engineers, and researchers to build high-performing recommenders at scale. Merlin includes tools to address common feature engineering, training, and inference challenges. Each stage of the Merlin pipeline is optimized to support hundreds of terabytes of data, which is all accessible through easy-to-use APIs. For more information, see NVIDIA Merlin on the NVIDIA developer web site.

Benefits

NVIDIA Merlin is a scalable and GPU-accelerated solution, making it easy to build recommender systems from end to end. With NVIDIA Merlin, you can:

  • Transform data (ETL) for preprocessing and engineering features.
  • Accelerate your existing training pipelines in TensorFlow, PyTorch, or FastAI by leveraging optimized, custom-built data loaders.
  • Scale large deep learning recommender models by distributing large embedding tables that exceed available GPU and CPU memory.
  • Deploy data transformations and trained models to production with only a few lines of code.

Components of NVIDIA Merlin

NVIDIA Merlin consists of the following open source libraries:

NVTabular PyPI version shields.io  Documentation
NVTabular is a feature engineering and preprocessing library for tabular data. The library can quickly and easily manipulate terabyte-size datasets that are used to train deep learning based recommender systems. The library offers a high-level API that can define complex data transformation workflows. With NVTabular, you can:

  • Prepare datasets quickly and easily for experimentation so that you can train more models.
  • Process datasets that exceed GPU and CPU memory without having to worry about scale.
  • Focus on what to do with the data and not how to do it by using abstraction at the operation level.

HugeCTR  Documentation
HugeCTR is a GPU-accelerated training framework that can scale large deep learning recommendation models by distributing training across multiple GPUs and nodes. HugeCTR contains optimized data loaders with GPU-acceleration and provides strategies for scaling large embedding tables beyond available memory. With HugeCTR, you can:

  • Scale embedding tables over multiple GPUs or nodes.
  • Load a subset of an embedding table into a GPU in a coarse-grained, on-demand manner during the training stage.

Merlin Models PyPI version shields.io  Documentation
The Merlin Models library provides standard models for recommender systems with an aim for high-quality implementations that range from classic machine learning models to highly-advanced deep learning models. With Merlin Models, you can:

  • Accelerate your ranking model training by up to 10x by using performant data loaders for TensorFlow, PyTorch, and HugeCTR.
  • Iterate rapidly on featuring engineering and model exploration by mapping datasets created with NVTabular into a model input layer automatically. The model input layer enables you to change either without impacting the other.
  • Assemble connectable building blocks for common RecSys architectures so that you can create of new models quickly and easily.

Transformers4Rec PyPI version shields.io  Documentation
The Transformers4Rec library provides sequential and session-based recommendation. The library provides modular building blocks that are compatible with standard PyTorch modules. You can use the building blocks to design custom architectures such as multiple towers, multiple heads and tasks, and losses. With Transformers4Rec, you can:

  • Build sequential and session-based recommenders from any sequential tabular data.
  • Take advantage of the integration with NVTabular for seamless data preprocessing and feature engineering.
  • Perform next-item prediction as well as classic binary classification or regression tasks.

Merlin Systems PyPI version shields.io  Documentation
Merlin Systems provides tools for combining recommendation models with other elements of production recommender systems like feature stores, nearest neighbor search, and exploration strategies into end-to-end recommendation pipelines that can be served with Triton Inference Server. With Merlin Systems, you can:

  • Start with an integrated platform for serving recommendations built on Triton Inference Server.
  • Create graphs that define the end-to-end process of generating recommendations.
  • Benefit from existing integrations with popular tools that are commonly found in recommender system pipelines.

Merlin Core PyPI version shields.io  Documentation
Merlin Core provides functionality that is used throughout the Merlin ecosystem. With Merlin Core, you can:

  • Use a standard dataset abstraction for processing large datasets across multiple GPUs and nodes.
  • Benefit from a common schema that identifies key dataset features and enables Merlin to automate routine modeling and serving tasks.
  • Simplify your code by using a shared API for constructing graphs of data transformation operators.

Installation

The simplest way to use Merlin is to run a docker container. NVIDIA GPU Cloud (NGC) provides containers that include all the Merlin component libraries, dependencies, and receive unit and integration testing. For more information, see the Containers page.

To develop and contribute to Merlin, review the installation documentation for each component library. The development environment for each Merlin component is easily set up with conda or pip:

Component Installation Steps
HugeCTR https://nvidia-merlin.github.io/HugeCTR/master/hugectr_contributor_guide.html
Merlin Core https://github.com/NVIDIA-Merlin/core/blob/stable/README.md#installation
Merlin Models https://github.com/NVIDIA-Merlin/models/blob/stable/README.md#installation
Merlin Systems https://github.com/NVIDIA-Merlin/systems/blob/stable/README.md#installation
NVTabular https://github.com/NVIDIA-Merlin/NVTabular/blob/stable/README.md#installation
Transformers4Rec https://github.com/NVIDIA-Merlin/Transformers4Rec/blob/stable/README.md#installation

Example Notebooks and Tutorials

A collection of end-to-end examples are available in the form of Jupyter notebooks. The example notebooks demonstrate how to:

  • Download and prepare a dataset.
  • Use preprocessing and engineering features.
  • Train deep-learning recommendation models with TensorFlow, PyTorch, FastAI, HugeCTR or Merlin Models.
  • Deploy the models to production with Triton Inference Server.

These examples are based on different datasets and provide a wide range of real-world use cases.

Merlin Is Built On

RAPIDS cuDF
Merlin relies on cuDF for GPU-accelerated DataFrame operations used in feature engineering.

Dask
Merlin relies on Dask to distribute and scale feature engineering and preprocessing within NVTabular and to accelerate dataloading in Merlin Models and HugeCTR.

Triton Inference Server
Merlin leverages Triton Inference Server to provide GPU-accelerated serving for recommender system pipelines.

Feedback and Support

To report bugs or get help, please open an issue.

merlin's People

Contributors

albert17 avatar andompesta avatar ashishsardana avatar ayodeawe avatar bashimao avatar benfred avatar bschifferer avatar dependabot[bot] avatar edknv avatar emmaqiaoch avatar evenoldridge avatar fdecayed avatar gabrielspmoreira avatar jayrodge avatar jershi425 avatar jlinford avatar jperez999 avatar karlhigley avatar lgardenhire avatar mikemckiernan avatar nv-alaiacano avatar nvidia-merlin-bot avatar oliverholworthy avatar radekosmulski avatar rnyak avatar rvk007 avatar ryanrussell avatar shijieliu avatar xiaoleishi-nv avatar zehuanw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

merlin's Issues

[RMP] Container size reduction

  • #90

  • #100

  • Reporting size per component

  • - Trying Tritonserver smaller base images: tf/pyt

  • - (NO NEED) Rapids from internal image (Note: This is not preferred. This may hamper customers who build their own conatiners)

  • - (NO NEED) HugeCTR dependencies as optional (Needs product approval)

[RMP] Deep Ranking Models (TF)

Models ( Data Scientist )

[RMP] Ranking - TF : Wide and Deep, DLRM, DeepFM

NVTabular (Data scientist / Product Engineer)

N/A?

  • check if there are any gaps for NVTabular ( End to End example validates no gaps. Tracked below )

Systems (Data scientist / Product Engineer)

Aha! Link: https://nvaiinfa.aha.io/features/MERLIN-823

[QST] Triton complains about extra columns in the workflow output

Hi,

I'm trying to reproduce some of the notebooks but the tritonserver is complaining about extra columns in the workflow's output. I'm confused about the error, as those as the columns outputted by the nvtabular workflow which I'm then using to train the sequential recommendation engine.

+-------------+-------------------------------------------------------------------------+--------+                                                                              
| Backend     | Path                                                                    | Config |                                                                              
+-------------+-------------------------------------------------------------------------+--------+                                                                              
| pytorch     | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so                 | {}     |                                                                              
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so         | {}     |                                                                              
| openvino    | /opt/tritonserver/backends/openvino_2021_2/libtriton_openvino_2021_2.so | {}     |                                                                              
| tensorflow  | /opt/tritonserver/backends/tensorflow1/libtriton_tensorflow1.so         | {}     |                                                                              
| python      | /opt/tritonserver/backends/python/libtriton_python.so                   | {}     |                                                                              
+-------------+-------------------------------------------------------------------------+--------+                                                                              
                                                                                                                                                                                
I0309 02:34:41.725660 115 server.cc:589]                                                                                                                                        
+-----------------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------+ 
| Model           | Version | Status                                                                                                                                          | 
+-----------------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------+ 
| t4r_pytorch_nvt | 1       | UNAVAILABLE: Internal: ValueError: The following extra columns were found in the workflow's output: {'week_index', 'et_dayofweek_cos-list_seq', | 
|                 |         |  'item_id-count', 'session_id', 'product_recency_days_log_norm-list_seq', 'et_dayofweek_sin-list_seq', 'item_id-list_seq'}                      | 
|                 |         |                                                                                                                                                 | 
|                 |         | At:                                                                                                                                             | 
|                 |         |   /usr/local/lib/python3.8/dist-packages/nvtabular/inference/workflow/base.py(71): __init__                                                     | 
|                 |         |   /usr/local/lib/python3.8/dist-packages/nvtabular/inference/workflow/tensorflow.py(33): __init__                                               | 
|                 |         |   /models/t4r_pytorch_nvt/1/model.py(85): initialize                                                                                            | 
| t4r_pytorch_pt  | 1       | READY                                                                                                                                           | 
+-----------------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------+ 
                                                                                                                                                                                
I0309 02:34:41.725789 115 tritonserver.cc:1865]  
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------[0/1772]
| Option                           | Value                                                                                                                                    |
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                   |
| server_version                   | 2.18.0                                                                                                                                   |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cu |
|                                  | da_shared_memory binary_tensor_data statistics                                                                                           |
| model_repository_path[0]         | /models                                                                                                                                  |
| model_control_mode               | MODE_NONE                                                                                                                                |
| strict_model_config              | 1                                                                                                                                        |
| rate_limit                       | OFF                                                                                                                                      |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                |
| cuda_memory_pool_byte_size{0}    | 67108864                                                                                                                                 |
| response_cache_byte_size         | 0                                                                                                                                        |
| min_supported_compute_capability | 6.0                                                                                                                                      |
| strict_readiness                 | 1                                                                                                                                        |
| exit_timeout                     | 30                                                                                                                                       |
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+

I0309 02:34:41.725820 115 server.cc:249] Waiting for in-flight requests to complete.
I0309 02:34:41.725828 115 model_repository_manager.cc:1026] unloading: t4r_pytorch_pt:1
I0309 02:34:41.725873 115 server.cc:264] Timeout 30: Found 1 live models and 0 in-flight non-inference requests                                                                 
I0309 02:34:42.726033 115 server.cc:264] Timeout 29: Found 1 live models and 0 in-flight non-inference requests                                                                 
I0309 02:34:43.318535 115 model_repository_manager.cc:1132] successfully unloaded 't4r_pytorch_pt' version 1                                                                    
I0309 02:34:43.726237 115 server.cc:264] Timeout 28: Found 0 live models and 0 in-flight non-inference requests 
error: creating server: Internal - failed to load all models

Do you have any idea what might be causing the issue?

I'm using triton from the Docker image nvcr.io/nvidia/merlin/merlin-inference:22.02.

Aha! Link: https://nvaiinfa.aha.io/features/MERLIN-813

merlin-core not installed in latest merlin-inference images

Hi,

I'm a big fan of your work (papers/code/tutorials)! Congrats!

I'm going through the tutorials in the transformers4rec repo and encountered this issue when trying to load any pre-trained ensembled model (nvtabular + pytorch).

I0305 04:19:13.236369 182 grpc_server.cc:4190] Started GRPCInferenceService at 0.0.0.0:8001
I0305 04:19:13.236600 182 http_server.cc:2857] Started HTTPService at 0.0.0.0:8000
I0305 04:19:13.278305 182 http_server.cc:167] Started Metrics Service at 0.0.0.0:8002
I0305 04:19:22.913046 182 model_repository_manager.cc:994] loading: t4r_pytorch_nvt:1
I0305 04:19:23.020783 182 python.cc:1905] TRITONBACKEND_ModelInstanceInitialize: t4r_pytorch_nvt (GPU device 0)
/nvtabular/nvtabular/workflow/workflow.py:339: UserWarning: Loading workflow generated with nvtabular version 0.11.0 - but we are running nvtabular 0.9.0+1.g31f9350. This might cause issues
  warnings.warn(
/nvtabular/nvtabular/workflow/workflow.py:339: UserWarning: Loading workflow generated with cudf version 21.12.00a+293.g0930f712e6 - but we are running cudf 21.10.00a+345.ge05bd4bf3c.dirty. This might cause issues
  warnings.warn(
0305 04:19:25.945803 210 pb_stub.cc:369] Failed to initialize Python stub: ModuleNotFoundError: No module named 'merlin'

At:
  /nvtabular/nvtabular/workflow/workflow.py(358): load
  /root/models/t4r_pytorch_nvt/1/model.py(60): initialize

E0305 04:19:25.947090 182 model_repository_manager.cc:1152] failed to load 't4r_pytorch_nvt' version 1: Internal: ModuleNotFoundError: No module named 'merlin'

At:
  /nvtabular/nvtabular/workflow/workflow.py(358): load
  /root/models/t4r_pytorch_nvt/1/model.py(60): initialize

It seems that the merlin-core library is not installed. I've also tried to open a python interpreter in the docker container and import some of the libraries (nvtabular, transformers4rec). They all seem to work but merlin.

For comparison, when using the merlin-pytorch-training:22.03 image, I can correctly import merlin from a python interpreter.

However, looking at the docker image I can't see anything wrong. I'm not very familiar with the merlin-core codebase though, so perhaps I'm missing something.

Both 22.02 and nightly images seem to be affected by this issue.

[RMP] Support Tree Ranking Models (like XGBoost) in Merlin Models and Systems

Problem

Gradient-boosted decision trees (GBDTs) are commonly used in the industry as part of the scoring phase of recommender systems. Supporting serving of these models and integrating with the Merlin ecosystem will help facilitate usage of these models in these systems.

The Triton Inference Server has a backend called FIL (Forest Inference Library) to facilitate GPU accelerated serving of these models.

Random forests (RF) and gradient-boosted decision trees (GBDTs) have become workhorse models of applied machine learning. XGBoost and LightGBM, popular packages implementing GBDT models, consistently rank among the most commonly used tools by data scientists on the Kaggle platform. We see similar interest in forest-based models in industry, where they are applied to problems ranging from inventory forecasting, to ad ranking, to medical diagnostics.

RAPIDS Forest Inference Library: Prediction at 100 million rows per second

Goals

  • Enable the use of Tree based models (e.g. GBDTs, Random Forests) in a Merlin Systems ensemble.
  • Support the training of XGBoost models from a Merlin Dataset.

Constraints

Starting Point

Merlin-models (Data Scientist)

NVTabular (Data Scientist)

  • [NA] Operators for batch prediction with these models
  • Note: Batch prediction is not in scope for this development

Merlin-systems (Product Engineer)

Examples and Docs (Everyone)

Aha! Link: https://nvaiinfa.aha.io/features/MERLIN-828

InferenceServerException: explicit model load / unload is not allowed if polling is enabled

Hello everybody! When I run the follow code from 04a-Triton-Inference-with-TF.ipynb

%%time

triton_client.load_model(model_name="criteo")

There is a error

POST /v2/repository/models/criteo/load, headers None

<HTTPSocketPoolResponse status=400 headers={'content-type': 'application/json', 'content-length': '77'}>
---------------------------------------------------------------------------
InferenceServerException                  Traceback (most recent call last)
<timed eval> in <module>

/usr/local/lib/python3.8/dist-packages/tritonclient/http/__init__.py in load_model(self, model_name, headers, query_params)
    620                               headers=headers,
    621                               query_params=query_params)
--> 622         _raise_if_error(response)
    623         if self._verbose:
    624             print("Loaded model '{}'".format(model_name))

/usr/local/lib/python3.8/dist-packages/tritonclient/http/__init__.py in _raise_if_error(response)
     62     error = _get_error(response)
     63     if error is not None:
---> 64         raise error
     65 
     66 

InferenceServerException: explicit model load / unload is not allowed if polling is enabled

By the way the available models in the repositories:

triton_client.get_model_repository_index()
POST /v2/repository/index, headers None

<HTTPSocketPoolResponse status=200 headers={'content-type': 'application/json', 'content-length': '152'}>
bytearray(b'[{"name":"criteo","version":"1","state":"READY"},{"name":"criteo_nvt","version":"1","state":"READY"},{"name":"criteo_tf","version":"1","state":"READY"}]')
[{'name': 'criteo', 'version': '1', 'state': 'READY'},
 {'name': 'criteo_nvt', 'version': '1', 'state': 'READY'},
 {'name': 'criteo_tf', 'version': '1', 'state': 'READY'}]

How should do to solve the problem?

[BUG] Cannot load a exported deepfm model with NGC 22.03 inference container

run into following errors

I0318 00:00:18.082645 172 hugectr.cc:1926] TRITONBACKEND_ModelInstanceInitialize: deepfm_0 (device 0)
I0318 00:00:18.082694 172 hugectr.cc:1566] Triton Model Instance Initialization on device 0
I0318 00:00:18.082792 172 hugectr.cc:1576] Dense Feature buffer allocation:
I0318 00:00:18.083026 172 hugectr.cc:1583] Categorical Feature buffer allocation:
I0318 00:00:18.083095 172 hugectr.cc:1601] Categorical Row Index buffer allocation:
I0318 00:00:18.083143 172 hugectr.cc:1611] Predict result buffer allocation:
I0318 00:00:18.083203 172 hugectr.cc:1939] ******Loading HugeCTR Model******
I0318 00:00:18.083217 172 hugectr.cc:1631] The model origin json configuration file path is: /ensemble_models/deepfm/1/deepfm.json
[HCTR][00:00:18][INFO][RK0][main]: Global seed is 1305961709
[HCTR][00:00:19][WARNING][RK0][main]: Peer-to-peer access cannot be fully enabled.
[HCTR][00:00:19][INFO][RK0][main]: Start all2all warmup
[HCTR][00:00:19][INFO][RK0][main]: End all2all warmup
[HCTR][00:00:19][INFO][RK0][main]: Create inference session on device: 0
[HCTR][00:00:19][INFO][RK0][main]: Model name: deepfm
[HCTR][00:00:19][INFO][RK0][main]: Use mixed precision: False
[HCTR][00:00:19][INFO][RK0][main]: Use cuda graph: True
[HCTR][00:00:19][INFO][RK0][main]: Max batchsize: 64
[HCTR][00:00:19][INFO][RK0][main]: Use I64 input key: True
[HCTR][00:00:19][INFO][RK0][main]: start create embedding for inference
[HCTR][00:00:19][INFO][RK0][main]: sparse_input name data1
[HCTR][00:00:19][INFO][RK0][main]: create embedding for inference success
[HCTR][00:00:19][INFO][RK0][main]: Inference stage skip BinaryCrossEntropyLoss layer, replaced by Sigmoid layer
I0318 00:00:19.826815 172 hugectr.cc:1639] ******Loading HugeCTR model successfully
I0318 00:00:19.827763 172 model_repository_manager.cc:1149] successfully loaded 'deepfm' version 1
E0318 00:00:19.827767 172 model_repository_manager.cc:1152] failed to load 'deepfm_nvt' version 1: Internal: TypeError: 'NoneType' object is not subscriptable

At:
  /ensemble_models/deepfm_nvt/1/model.py(91): _set_output_dtype
  /ensemble_models/deepfm_nvt/1/model.py(76): initialize

E0318 00:00:19.827960 172 model_repository_manager.cc:1332] Invalid argument: ensemble 'deepfm_ens' depends on 'deepfm_nvt' which has no loaded version
I0318 00:00:19.828048 172 server.cc:522]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0318 00:00:19.828117 172 server.cc:549]
+---------+---------------------------------------------------------+-----------------------------------------------+
| Backend | Path                                                    | Config                                        |
+---------+---------------------------------------------------------+-----------------------------------------------+
| hugectr | /opt/tritonserver/backends/hugectr/libtriton_hugectr.so | {"cmdline":{"ps":"/ensemble_models/ps.json"}} |
+---------+---------------------------------------------------------+-----------------------------------------------+

I0318 00:00:19.828209 172 server.cc:592]
+------------+---------+--------------------------------------------------------------------------+
| Model      | Version | Status                                                                   |
+------------+---------+--------------------------------------------------------------------------+
| deepfm     | 1       | READY                                                                    |
| deepfm_nvt | 1       | UNAVAILABLE: Internal: TypeError: 'NoneType' object is not subscriptable |
|            |         |                                                                          |
|            |         | At:                                                                      |
|            |         |   /ensemble_models/deepfm_nvt/1/model.py(91): _set_output_dtype          |
|            |         |   /ensemble_models/deepfm_nvt/1/model.py(76): initialize                 |
+------------+---------+--------------------------------------------------------------------------+

I0318 00:00:19.845925 172 metrics.cc:623] Collecting metrics for GPU 0: Tesla T4
I0318 00:00:19.846404 172 tritonserver.cc:1932]
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                              |
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                             |
| server_version                   | 2.19.0                                                                                                                             |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_mem |
|                                  | ory cuda_shared_memory binary_tensor_data statistics trace                                                                         |
| model_repository_path[0]         | /ensemble_models                                                                                                                   |
| model_control_mode               | MODE_NONE                                                                                                                          |
| strict_model_config              | 1                                                                                                                                  |
| rate_limit                       | OFF                                                                                                                                |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                          |
| cuda_memory_pool_byte_size{0}    | 67108864                                                                                                                           |
| response_cache_byte_size         | 0                                                                                                                                  |
| min_supported_compute_capability | 6.0                                                                                                                                |
| strict_readiness                 | 1                                                                                                                                  |
| exit_timeout                     | 30                                                                                                                                 |
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------+


Aha! Link: https://nvaiinfa.aha.io/features/MERLIN-818

Docker images don't work if $HOME gets set to a different path

Hi all,

We noticed a small issue where if you have a small docker image like this:

FROM nvcr.io/nvidia/merlin/merlin-pytorch-training:21.09

ENV HOME=/usr/local

WORKDIR ${HOME}

nvtabular and many other libraries can't be imported.
image

And the reasoning seems to be because we are installing nvtabular in the user's $HOME here: https://github.com/NVIDIA-Merlin/Merlin/blob/main/docker/dockerfile.torch#L141-L144

Given that this image is meant to be based off of, is it really a good idea to do a user-level install? Is that --user even needed? Downstream images should be allowed to set the $HOME variable IMO :)

For anyone running into the issue, the fix is to simply not set $HOME in your downstream dockerfile :)

[RMP] Support for simple ML/CF Models (like Implicit) in Merlin Models and Systems

Problem:

Latent factor models enable discovery of the underlying structure between interactions and items. These approaches have been popular over the years to leverage implicit feedback data. Customers who are using simple models via Implicit and LightFM want to be able to deploy those models within the Merlin ecosystem.

Goal:

  • Provide training support for implicit and lightfm based models in Merlin Models
  • Provide inference support for implicit based models in Merlin Systems
  • NVIDIA-Merlin/systems#200

Constraints:

Systems

  • Serve Implicit/LightFM as a self-contained op with everything required to serve within the exported triton model directory
    Requires installing the python package in the tritonserver environment where it will run.
    Decomposing the serving of these into different operators (retrieval with nearest neighbour search through embedding space)

Blocking issues

  • Inference is blocked w/ issues on serialization of model w/ Implicit

Starting Point:

Merlin-models

wrap Implicit and lightFM in High-level model API:

NVTabular

N/A

Merlin-systems

Examples and Docs (To happen in 22.09)

Merlin 21.12 NGC release broke extended forward compatibility on older NVIDIA drivers

If user invoke a command without calling bash in Docker, the forward compatibility is broken. If user invoke a command with bash, the forward compatibility works as expected.

observed with T4/A100 on driver 450.119.04

docker run --gpus '"device=2,3,4,5"' -it --rm --network host \
>     --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 \
>     -v ~/workspace_dong:/scripts nvcr.io/nvidia/merlin/merlin-training:21.12 nvidia-smi
Tue Jan  4 17:55:26 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.119.04   Driver Version: 450.119.04   CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:08:00.0 Off |                    0 |
| N/A   28C    P8    10W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla T4            Off  | 00000000:09:00.0 Off |                    0 |
| N/A   29C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  Tesla T4            Off  | 00000000:84:00.0 Off |                    0 |
| N/A   27C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  Tesla T4            Off  | 00000000:85:00.0 Off |                    0 |
| N/A   28C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

lab@login01:~/workspace_dong/merlin$ docker run --gpus '"device=2,3,4,5"' -it --rm --network host \                                                                                                         >     --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 \
>     -v ~/workspace_dong:/scripts nvcr.io/nvidia/merlin/merlin-training:21.12
root@login01:/workspace# nvidia-smi
Tue Jan  4 17:57:16 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.119.04   Driver Version: 450.119.04   CUDA Version: 11.5     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:08:00.0 Off |                    0 |
| N/A   28C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla T4            Off  | 00000000:09:00.0 Off |                    0 |
| N/A   29C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  Tesla T4            Off  | 00000000:84:00.0 Off |                    0 |
| N/A   27C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  Tesla T4            Off  | 00000000:85:00.0 Off |                    0 |
| N/A   28C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

[RMP] Triton Serving Ensemble API

Models

NVTabular

Systems

Aha! Link: https://nvaiinfa.aha.io/features/MERLIN-831

Cuda 11.0 support for Merlin Images as the current image doesn't work on AWS P3s

Hi, I am a huge fan of Merlin and a lot of what you all are doing here!

I tried to pull the newest Merlin image to use cudf on an AWS P3 instance, and it seems like the image requires cuda 11.2 and above, something that is not available on AWS P3 instances (at least out of the box).

image

It looks like the underlying Nvidia driver version is 450.51.* and cuda 11.2 expects >= 450.81.*
https://docs.nvidia.com/deploy/cuda-compatibility/index.html
image

I was wondering if there would be any support for cuda 11.0 anytime soon. :)

Merlin Inference Container Build

Currently getting an error when building inference in internal CI system because of nvml.

[ 99%] Linking CUDA device code CMakeFiles/inference_test.dir/cmake_device_link.o
[100%] Linking CXX executable ../../../bin/inference_test
[91m/usr/bin/ld: warning: libnvidia-ml.so.1, needed by ../../../lib/libhugectr_inference.so, not found (try using -rpath or -rpath-link)
/usr/bin/ld: ../../../lib/libhugectr_inference.so: [0m[91mundefined reference to `nvmlErrorString'
/usr/bin/ld: ../../../lib/libhugectr_inference.so: undefined reference to `nvmlInit_v2'
[0m[91mcollect2: error: ld returned 1 exit status
[0m[91mmake[2]: *** [test/utest/inference/CMakeFiles/inference_test.dir/build.make:219: bin/inference_test] Error 1
[0m[91mmake[1]: *** [CMakeFiles/Makefile2:1195: test/utest/inference/CMakeFiles/inference_test.dir/all] Error 2

HugeCTR put a fix in internal repo. Need to check that it works and modify CI process.

[RMP] Embedding-based Retrieval Models (TF)

We want to support classic 2-stage recommender systems. To enable that we need retrieval models across the merlin ecosystem. Add support for training and serving retrieval models and examples that demonstrate how to build them and connect them in a 2-stage system. This is a precursor for 4-stage recommenders which are one of the goals of Merlin.

Depends on #102

Models ( Data Scientist )

Two-tower model

Youtube DNN

User/Item tower model export (ideally 22.03)

In order to build an ensemble that can serve a retrieval model, we'll need both:

  • A model that computes a user vector from user features (i.e. the user tower)
  • [x ] A set of item vectors we can load into a nearest neighbor search index (i.e. batch-predictions from the item tower)

Note: User/Item tower model export is covered by item vector export from model

Item vector export from model

Negative sampling

Contrastive losses

Evaluation Metrics (Precision/Recall/?)

Systems (Data scientist / Product Engineer)

[RMP] Documentation for 1.0

Merlin:

Core:

  • Set up automated docs builds

Models:

NVTabular:

  • README revisions?
    Note: There are some un related items in the readme such as container build process for other repos. Should this be cleaned ? There is documentaiton regarding examples for other repos. Should this be moved ?

Systems:

Transformers4Rec:

HugeCTR

  • Convert HCTR doc to sphinx keeping SOK as separate doc
  • Note: Michael was able to resolve the issue. Will have SOK and HCTR under one Sphinx doc. Will have it for 22.04
  • Make all our repos to have same headline (repo name) | (doc link)
  • Note: Will have it for 22.04
  • Centralized doc will follow next release
  • Will target 22.05
    Individual repo doc updates
    Expecting inputs from engineering team. Michael to share updates on how it affects his other works

Support matrix

  • Create One Support Matrix for Merlin

    • #174
    • #177
    • Change the title to ‘Merlin Support Matrix’
    • Update for all 6 containers
    • Place newer information first
    • Table per container
    • Page for each year is nice-to-have
  • [ ]Support matrix - Combining tables to Training and Inference instead of 6 ( after 22.04)

    • Michael to draft POC and will review

Aha! Link: https://nvaiinfa.aha.io/features/MERLIN-822

[RMP] Bug bashes for v1.0

gaierror: [Errno -2] Name or service not known

When i run 04a-Triton-Inference-with-TF.ipynb on local computer (without docker),i meet some error in triton_client.is_server_live() ,there is the error

"

Empty Traceback (most recent call last)
~/anaconda3/envs/merlin/lib/python3.7/site-packages/geventhttpclient/connectionpool.py in get_socket(self)
162 try:
--> 163 return self._socket_queue.get(block=False)
164 except gevent.queue.Empty:

~/anaconda3/envs/merlin/lib/python3.7/site-packages/gevent/_gevent_cqueue.cpython-37m-x86_64-linux-gnu.so in gevent._gevent_cqueue.Queue.get()

~/anaconda3/envs/merlin/lib/python3.7/site-packages/gevent/_gevent_cqueue.cpython-37m-x86_64-linux-gnu.so in gevent._gevent_cqueue.Queue.get()

~/anaconda3/envs/merlin/lib/python3.7/site-packages/gevent/_gevent_cqueue.cpython-37m-x86_64-linux-gnu.so in gevent._gevent_cqueue.Queue._Queue__get_or_peek()

Empty:

During handling of the above exception, another exception occurred:

gaierror Traceback (most recent call last)
/tmp/ipykernel_10740/1716762689.py in
----> 1 triton_client.is_server_live()

~/anaconda3/envs/merlin/lib/python3.7/site-packages/tritonclient/http/init.py in is_server_live(self, headers, query_params)
341 response = self._get(request_uri=request_uri,
342 headers=headers,
--> 343 query_params=query_params)
344
345 return response.status_code == 200

~/anaconda3/envs/merlin/lib/python3.7/site-packages/tritonclient/http/init.py in _get(self, request_uri, headers, query_params)
264 response = self._client_stub.get(request_uri, headers=headers)
265 else:
--> 266 response = self._client_stub.get(request_uri)
267
268 if self._verbose:

~/anaconda3/envs/merlin/lib/python3.7/site-packages/geventhttpclient/client.py in get(self, request_uri, headers)
264
265 def get(self, request_uri, headers={}):
--> 266 return self.request(METHOD_GET, request_uri, headers=headers)
267
268 def head(self, request_uri, headers=None):

~/anaconda3/envs/merlin/lib/python3.7/site-packages/geventhttpclient/client.py in request(self, method, request_uri, body, headers)
224
225 while 1:
--> 226 sock = self._connection_pool.get_socket()
227 try:
228 _request = request.encode()

~/anaconda3/envs/merlin/lib/python3.7/site-packages/geventhttpclient/connectionpool.py in get_socket(self)
164 except gevent.queue.Empty:
165 try:
--> 166 return self._create_socket()
167 except:
168 self._semaphore.release()

~/anaconda3/envs/merlin/lib/python3.7/site-packages/geventhttpclient/connectionpool.py in _create_socket(self)
100 or set tcp/socket options
101 """
--> 102 sock_infos = self._resolve()
103 first_error = None
104 for sock_info in sock_infos:

~/anaconda3/envs/merlin/lib/python3.7/site-packages/geventhttpclient/connectionpool.py in _resolve(self)
74 info = gevent.socket.getaddrinfo(self._connection_host,
75 self._connection_port,
---> 76 family, 0, gevent.socket.SOL_TCP)
77 # family, socktype, proto, canonname, sockaddr = info[0]
78 return info

~/anaconda3/envs/merlin/lib/python3.7/site-packages/gevent/_socketcommon.py in getaddrinfo(host, port, family, type, proto, flags)
245 # Our lower-level resolvers, including the thread and blocking, which use _socket,
246 # function simply with integers.
--> 247 addrlist = get_hub().resolver.getaddrinfo(host, port, family, type, proto, flags)
248 result = [
249 (_intenum_converter(af, AddressFamily),

~/anaconda3/envs/merlin/lib/python3.7/site-packages/gevent/resolver/thread.py in getaddrinfo(self, *args, **kwargs)
61
62 def getaddrinfo(self, *args, **kwargs):
---> 63 return self.pool.apply(_socket.getaddrinfo, args, kwargs)
64
65 def gethostbyaddr(self, *args, **kwargs):

~/anaconda3/envs/merlin/lib/python3.7/site-packages/gevent/pool.py in apply(self, func, args, kwds)
159 if self._apply_immediately():
160 return func(*args, **kwds)
--> 161 return self.spawn(func, *args, **kwds).get()
162
163 def __map(self, func, iterable):

~/anaconda3/envs/merlin/lib/python3.7/site-packages/gevent/_gevent_cevent.cpython-37m-x86_64-linux-gnu.so in gevent._gevent_cevent.AsyncResult.get()

~/anaconda3/envs/merlin/lib/python3.7/site-packages/gevent/_gevent_cevent.cpython-37m-x86_64-linux-gnu.so in gevent._gevent_cevent.AsyncResult.get()

~/anaconda3/envs/merlin/lib/python3.7/site-packages/gevent/_gevent_cevent.cpython-37m-x86_64-linux-gnu.so in gevent._gevent_cevent.AsyncResult.get()

~/anaconda3/envs/merlin/lib/python3.7/site-packages/gevent/_gevent_cevent.cpython-37m-x86_64-linux-gnu.so in gevent._gevent_cevent.AsyncResult._raise_exception()

~/anaconda3/envs/merlin/lib/python3.7/site-packages/gevent/_compat.py in reraise(t, value, tb)
63 def reraise(t, value, tb=None): # pylint:disable=unused-argument
64 if value.traceback is not tb and tb is not None:
---> 65 raise value.with_traceback(tb)
66 raise value
67 def exc_clear():

~/anaconda3/envs/merlin/lib/python3.7/site-packages/gevent/threadpool.py in __run_task()
165 self._before_run_task(func, args, kwargs, thread_result)
166 try:
--> 167 thread_result.set(func(*args, **kwargs))
168 except: # pylint:disable=bare-except
169 thread_result.handle_error((self, func), self._exc_info())

gaierror: [Errno -2] Name or service not known

"

[RMP] Merlin-core: schemas, standardization, namespace

Merlin-core

Merlin-models

NVTabular

Examples and Docs

Aha! Link: https://nvaiinfa.aha.io/features/MERLIN-830

timeout: timed out

Hi ,sir.when i run the follow code from 04-Triton-Inference-with-TF.ipynb

%%time

triton_client.load_model(model_name="criteo")

There are some error


POST /v2/repository/models/criteo/load, headers None

---------------------------------------------------------------------------
timeout                                   Traceback (most recent call last)
<timed eval> in <module>

/usr/local/lib/python3.8/dist-packages/tritonclient/http/__init__.py in load_model(self, model_name, headers, query_params)
    616         """
    617         request_uri = "v2/repository/models/{}/load".format(quote(model_name))
--> 618         response = self._post(request_uri=request_uri,
    619                               request_body="",
    620                               headers=headers,

/usr/local/lib/python3.8/dist-packages/tritonclient/http/__init__.py in _post(self, request_uri, request_body, headers, query_params)
    306                                               headers=headers)
    307         else:
--> 308             response = self._client_stub.post(request_uri=request_uri,
    309                                               body=request_body)
    310 

/usr/local/lib/python3.8/dist-packages/geventhttpclient/client.py in post(self, request_uri, body, headers)
    270 
    271     def post(self, request_uri, body=u'', headers=None):
--> 272         return self.request(METHOD_POST, request_uri, body=body, headers=headers)
    273 
    274     def put(self, request_uri, body=u'', headers=None):

/usr/local/lib/python3.8/dist-packages/geventhttpclient/client.py in request(self, method, request_uri, body, headers)
    251 
    252             try:
--> 253                 response = HTTPSocketPoolResponse(sock, self._connection_pool,
    254                                                   block_size=self.block_size, method=method.upper(), headers_type=self.headers_type)
    255             except HTTPConnectionClosed as e:

/usr/local/lib/python3.8/dist-packages/geventhttpclient/response.py in __init__(self, sock, pool, **kw)
    296     def __init__(self, sock, pool, **kw):
    297         self._pool = pool
--> 298         super(HTTPSocketPoolResponse, self).__init__(sock, **kw)
    299 
    300     def release(self):

/usr/local/lib/python3.8/dist-packages/geventhttpclient/response.py in __init__(self, sock, block_size, method, **kw)
    168         self._sock = sock
    169         self.block_size = block_size
--> 170         self._read_headers()
    171 
    172     def release(self):

/usr/local/lib/python3.8/dist-packages/geventhttpclient/response.py in _read_headers(self)
    188             while not self.headers_complete:
    189                 try:
--> 190                     data = self._sock.recv(self.block_size)
    191                     self.feed(data)
    192                     # depending on gevent version we get a conn reset or no data

/usr/local/lib/python3.8/dist-packages/gevent/_socketcommon.py in recv(self, *args)
    661                 # QQQ without clearing exc_info test__refcount.test_clean_exit fails
    662                 exc_clear() # Python 2
--> 663             self._wait(self._read_event)
    664 
    665     def recvfrom(self, *args):

/usr/local/lib/python3.8/dist-packages/gevent/_gevent_c_hub_primitives.cpython-38-x86_64-linux-gnu.so in gevent._gevent_c_hub_primitives.wait_on_socket()

/usr/local/lib/python3.8/dist-packages/gevent/_gevent_c_hub_primitives.cpython-38-x86_64-linux-gnu.so in gevent._gevent_c_hub_primitives.wait_on_socket()

/usr/local/lib/python3.8/dist-packages/gevent/_gevent_c_hub_primitives.cpython-38-x86_64-linux-gnu.so in gevent._gevent_c_hub_primitives._primitive_wait()

/usr/local/lib/python3.8/dist-packages/gevent/_gevent_c_hub_primitives.cpython-38-x86_64-linux-gnu.so in gevent._gevent_c_hub_primitives._primitive_wait()

/usr/local/lib/python3.8/dist-packages/gevent/_gevent_c_hub_primitives.cpython-38-x86_64-linux-gnu.so in gevent._gevent_c_hub_primitives.WaitOperationsGreenlet.wait()

/usr/local/lib/python3.8/dist-packages/gevent/_gevent_c_hub_primitives.cpython-38-x86_64-linux-gnu.so in gevent._gevent_c_hub_primitives.WaitOperationsGreenlet.wait()

/usr/local/lib/python3.8/dist-packages/gevent/_gevent_c_hub_primitives.cpython-38-x86_64-linux-gnu.so in gevent._gevent_c_hub_primitives.WaitOperationsGreenlet.wait()

/usr/local/lib/python3.8/dist-packages/gevent/_gevent_c_waiter.cpython-38-x86_64-linux-gnu.so in gevent._gevent_c_waiter.Waiter.get()

/usr/local/lib/python3.8/dist-packages/gevent/_gevent_c_greenlet_primitives.cpython-38-x86_64-linux-gnu.so in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch()

/usr/local/lib/python3.8/dist-packages/gevent/_gevent_c_greenlet_primitives.cpython-38-x86_64-linux-gnu.so in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch()

/usr/local/lib/python3.8/dist-packages/gevent/_gevent_c_greenlet_primitives.cpython-38-x86_64-linux-gnu.so in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch()

src/gevent/_gevent_c_greenlet_primitives.pxd in gevent._gevent_c_greenlet_primitives._greenlet_switch()

timeout: timed out

And when i run thi code
triton_client.get_model_repository_index()
The output is

POST /v2/repository/index, headers None

<HTTPSocketPoolResponse status=200 headers={'content-type': 'application/json', 'content-length': '152'}>
bytearray(b'[{"name":"criteo","version":"1","state":"READY"},{"name":"criteo_nvt","version":"1","state":"READY"},{"name":"criteo_tf","version":"1","state":"READY"}]')
[{'name': 'criteo', 'version': '1', 'state': 'READY'},
 {'name': 'criteo_nvt', 'version': '1', 'state': 'READY'},
 {'name': 'criteo_tf', 'version': '1', 'state': 'READY'}]

Aha! Link: https://nvaiinfa.aha.io/features/MERLIN-836

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.