nvidia-merlin / hugectr Goto Github PK
View Code? Open in Web Editor NEWHugeCTR is a high efficiency GPU framework designed for Click-Through-Rate (CTR) estimating training
License: Apache License 2.0
HugeCTR is a high efficiency GPU framework designed for Click-Through-Rate (CTR) estimating training
License: Apache License 2.0
Hi Hugectr experts,
I have a question on backward computation. Take the localized slot as example,
I notice that hugectr perform the all-to-all after the forward propagation. And in the backward, it performs the all-to-all again before the backward propagation. Why there is two all-to-all operations between the forward and backward?
Description:
Description:
Adding some threshold for integration test like DLRM WDL DCN.
We will not be able to close this bug before the close of #165
Comments:
I followed instructions given in /hugectr/tutorials/dump_to_tf/ReadMe.
But when running "python3 main.py
../../samples/dcn/dcn_bin.json
../../samples/dcn/train/0.data
../../samples/dcn/_dense_9999.model
../../samples/dcn/0_sparse_9999.model", I am getting memory exception. Please refer attached screenshot for actual error.
Note: I have used nvtabular with binary format to preprocess and train with hugectr. Hence config file used in above command
is dcn_bin.json.
Description:
Comments:
When I read source code, I found data collector is supposed to
/**************************************
Currently, there are only one tutorial about transfering hugectr model to tensorflow model: https://github.com/NVIDIA/HugeCTR/tree/master/tutorial/dump_to_tf . And the tutorial code is not well architectured , and seems to be a specific example , but not a common reuse modular.
My question is: what's the plan hugectr team to develop a common python moduar, which should have the following behaviors:
Here is the command
docker build --build-arg ENABLE_MULTINODES=ON -t hugectr:devel -f ./tools/dockerfiles/build.Dockerfile .
and got errors below:
In file included from /HugeCTR/HugeCTR/include/gpu_resource.hpp:19:0,
from /HugeCTR/HugeCTR/src/gpu_resource.cpp:17:
/HugeCTR/HugeCTR/include/common.hpp:29:10: fatal error: mpi.h: No such file or directory
#include <mpi.h>
^~~~~~~
compilation terminated.
I think the dockerfile does not meet the requirements of multi-nodes.
Want to run hugectr on device with cuda 10.1.
Change docker config in tools/dockerfiles/build.Dockerfile or dev.a100.Dockerfile
FROM nvidia/cuda:11.0-cudnn8-devel-ubuntu18.04 --> ROM nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04
And all the thing build ok and I got hugeCTR binary files.
But then the driver seems to break down, nothing could be run. Run nvidia-smi and got
Failed to initialize NVML: Driver/library version mismatch
Try to debug I find driver broke down after libarrow-cuda-dev install.
This line : apt update && apt install -y libarrow-dev=0.17.1-1 libarrow-cuda-dev=0.17.1-1
It installs another libnvidia-compute-435. After installed libnvidia-compute-435, the driver could not work correctly.
Any way to solve it?
Saving hugectr model with the batch normalization layer, we can get gamma, beta
but not offset, scale
, which should be estimator of the training data:
And when we transfer hugectr model to tensorflow model, we need to set offset and scale in tf.nn.batch_normalization( x, mean, variance, offset, scale, variance_epsilon, name=None )
.
Can hugectr adds offset and scale parameters to be saved to the binary model?
Description:
The GFN dataset is pre-processed with NVTabular which resulted in 8 parquet files for training. I'm using just 1 parquet file for testing HugeCTR. I've modified _metadata.json to include only 1 filename (corresponding to the 1 parquet file).
While training DLRM with the least possible embedding_vec_size=1
, I'm getting the following error:
terminate called after throwing an instance of 'rmm::bad_alloc'
what(): std::bad_alloc: CNMEM error at: /opt/conda/envs/rapids/include/rmm/mr/device/cnmem_memory_resource.hpp168: CNMEM_STATUS_OUT_OF_MEMORY
The error is discussed in detail here
The dataset is available on NGC Batch (dataset id: 68926) which contains:
The docker image was built using this script and is available on NGC Batch as nvidian/tme-gfnmerlin/hugectr_rel:1
Attached is the config used for training - dlrm_fp32_256_local.json
A NGC Batch job can be run using -
ngc batch run --name "gfn-hugectr" --preempt RUNONCE --ace nv-us-west-2 --instance dgx1v.32g.8.norm --commandline "bash -c 'source activate rapids && pip install gdown && jupyter notebook --allow-root --ip 0.0.0.0 --no-browser --NotebookApp.token='admin' --NotebookApp.allow_origin='*' --notebook-dir=/'" --result /results --image "nvidian/tme-gfnmerlin/hugectr_rel:1" --org nvidian --team sae --port 8786 --port 8787 --port 8888 --datasetid 68926:/gfn-merlin/data/preprocessed/preprocessed-53-jan-sept-parquet/
Please add your workspace and change your team.
Attached is the error trace after running huge_ctr --train dlrm_fp32_256_local.json
- error.log
Comments:
We find that setting cache_size_ >1 in DataCollector , the train loss is almost zero . In DataCollector.hpp :
template <typename TypeKey>
void DataCollector<TypeKey>::collect() {
if (counter_ < cache_size_ || cache_size_ == 0) {
collect_();
} else {
collect_blank_();
}
}
counter_ is increment , and will never less than cache_size_ once it's bigger than cache_size_ . And not collect is running, so the train data is old version , and model is overfit , loss is almost zero .
The correct code is supposed to be
template <typename TypeKey>
void DataCollector<TypeKey>::collect() {
if (counter_ % internal_buffers_.size() < cache_size_ || cache_size_ == 0) {
collect_();
} else {
collect_blank_();
}
}
Hi HugeCTR experts:
in master/test/utest/layers/fully_connected_layer_test.cpp
107:for (size_t i = 0; i < k * n; ++i) h_weight[i] = (float)(rand() % 100);
108:for (size_t i = 0; i < m * k; ++i) h_in[i] = (float)(rand() % 100);
when I use decimal:
107:for (size_t i = 0; i < k * n; ++i) h_weight[i] = (float)((rand() % 100) * 0.1);
108:for (size_t i = 0; i < m * k; ++i) h_in[i] = (float)((rand() % 100)* 0.1);
the test is failed, the max_diff of CPU and GPU is > 0.1 (for example: 0.3125), why?
Hi HugeCTR experts,
I want to implement a custom model on HugeCTR. So far, I could not find docs that show how to import layers/optimizers to build a custom model. Or is there anything I miss?
I wonder if you guys have or will release documentations that show how to build custom model?
Thanks
Would you please let me know where can I find the grammer for input json files?
Description:
Comments:
Description:
add scikit-learn python module (Dmitry)
cudf 0.16 (Chirayu)
Plan B:
Four docker containers in total:
build.tfplugin.dockerfile + dev.tfplugin.dockerfile
build.dockerfile + dev.dockerfile
Hi!
I am not sure if starting the norm dataset file with the number of files in the list is the best option.
IMO, that value should not be needed, because it might be easily calculated by the parser. It might also be a source of future errors if the parser doesn't double-check that the number specified corresponds to the number of files detailed in the norm dataset file.
Therefore, I would suggest to make that value optional.
https://github.com/NVIDIA/HugeCTR/blob/master/docs/hugectr_user_guide.md#file-list
Hope it helps!
Description:
Comments:
Follow by dlrm_fp32_64k.json,
we test DLRM in our data: label_dim=1, dense_dim=5 slot_num=75 . And got error in the first fully connected layer.
What's wrong with my data or model config? Or there are some bugs in hugectr?
log
[10d12h27m36s][HUGECTR][INFO]: end_lr is not specified using default: 0.000000
[6421.34, init_end, ]
[6421.35, run_start, ]
HugeCTR training start:
[6421.36, train_epoch_start, 0, ]
[HCDEBUG][ERROR] Runtime error: cublas_status_not_supported /tmp/HugeCTR/HugeCTR/src/layers/fully_connected_layer.cu:143
[HCDEBUG][ERROR] Runtime error: operation not permitted when stream is capturing /tmp/HugeCTR/HugeCTR/src/session.cpp:451
[HCDEBUG][ERROR] Runtime error: cublas_status_not_supported /tmp/HugeCTR/HugeCTR/src/layers/fully_connected_layer.cu:143
Terminated with error
{
"solver": {
"lr_policy": "fixed",
"display": 1,
"max_iter": 2,
"gpu": [
0
],
"batchsize": 32,
"snapshot": 1,
"snapshot_prefix": "./tmp/daw",
"eval_interval": 1,
"batchsize_eval":32,
"eval_metrics": [
"AUC:0.9",
"AverageLoss"
],
"eval_batches": 1,
"input_key_type": "I64"
},
"optimizer": {
"type": "Adam",
"global_update": false,
"adam_hparam": {
"learning_rate": 0.0001,
"beta1": 0.9,
"beta2": 0.999,
"epsilon": 1e-08
}
},
"layers": [
{
"name": "data",
"type": "Data",
"source": "./tmp/file_list.txt",
"eval_source": "./tmp/file_list_test.txt",
"check": "Sum",
"label": {
"top": "label",
"label_dim": 1
},
"dense": {
"top": "dense",
"dense_dim": 5
},
"sparse": [
{
"top": "data1",
"type": "DistributedSlot",
"max_feature_num_per_sample": 180,
"slot_num": 75
}
]
},
{
"name": "sparse_embedding1",
"type": "DistributedSlotSparseEmbeddingHash",
"bottom": "data1",
"top": "sparse_embedding1",
"sparse_embedding_hparam": {
"max_vocabulary_size_per_gpu": 24000000,
"load_factor": 0.75,
"embedding_vec_size": 16,
"combiner": 1
}
},
{
"name": "fc1",
"type": "InnerProduct",
"bottom": "dense",
"top": "fc1",
"fc_param": {
"num_output": 512
}
},
{
"name": "relu1",
"type": "ReLU",
"bottom": "fc1",
"top": "relu1"
},
{
"name": "fc2",
"type": "InnerProduct",
"bottom": "relu1",
"top": "fc2",
"fc_param": {
"num_output": 256
}
},
{
"name": "relu2",
"type": "ReLU",
"bottom": "fc2",
"top": "relu2"
},
{
"name": "fc3",
"type": "InnerProduct",
"bottom": "relu2",
"top": "fc3",
"fc_param": {
"num_output": 16
}
},
{
"name": "relu3",
"type": "ReLU",
"bottom": "fc3",
"top": "relu3"
},
{
"name": "interaction1",
"type": "Interaction",
"bottom": ["relu3", "sparse_embedding1"],
"top": "interaction1"
},
{
"name": "fc4",
"type": "InnerProduct",
"bottom": "interaction1",
"top": "fc4",
"fc_param": {
"num_output": 1024
}
},
{
"name": "relu4",
"type": "ReLU",
"bottom": "fc4",
"top": "relu4"
},
{
"name": "fc5",
"type": "InnerProduct",
"bottom": "relu4",
"top": "fc5",
"fc_param": {
"num_output": 1024
}
},
{
"name": "relu5",
"type": "ReLU",
"bottom": "fc5",
"top": "relu5"
},
{
"name": "fc6",
"type": "InnerProduct",
"bottom": "relu5",
"top": "fc6",
"fc_param": {
"num_output": 512
}
},
{
"name": "relu6",
"type": "ReLU",
"bottom": "fc6",
"top": "relu6"
},
{
"name": "fc7",
"type": "InnerProduct",
"bottom": "relu6",
"top": "fc7",
"fc_param": {
"num_output": 256
}
},
{
"name": "relu7",
"type": "ReLU",
"bottom": "fc7",
"top": "relu7"
},
{
"name": "fc8",
"type": "InnerProduct",
"bottom": "relu7",
"top": "fc8",
"fc_param": {
"num_output": 1
}
},
{
"name": "loss",
"type": "BinaryCrossEntropyLoss",
"bottom": ["fc8","label"],
"top": "loss"
}
]
}
We run v2.2 in v100 with cuda10.1, and has some error:
[HCDEBUG][ERROR] Runtime error: GeneralBuffer is empty /tmp/HugeCTR/HugeCTR/include/general_buffer.hpp:136
Our config is:
{
"solver": {
"lr_policy": "fixed",
"display": 100,
"max_iter": 1000,
"gpu": [0],
"input_key_type":"I64",
"batchsize": 4096,
"batchsize_eval":4096,
"snapshot": 10000000,
"snapshot_prefix": "./",
"eval_interval": 100,
"eval_metrics": ["AUC:0.9","AverageLoss"],
"eval_batches": 500
},
"optimizer": {
"type": "Adam",
"global_update": true,
"adam_hparam": {
"learning_rate": 0.001,
"alpha": 0.001,
"beta1": 0.9,
"beta2": 0.999,
"epsilon": 0.00000001
}
},
"layers": [
{
"name": "data",
"type": "Data",
"source": "./file_list.txt",
"eval_source": "./file_list_test.txt",
"check": "Sum",
"label": {
"top": "label",
"label_dim": 1
},
"dense": {
"top": "dense",
"dense_dim": 0
},
"sparse": [
{
"top": "data1",
"type": "DistributedSlot",
"max_feature_num_per_sample": 100,
"slot_num": 75
}
]
},
{
"name": "sparse_embedding1",
"type": "DistributedSlotSparseEmbeddingHash",
"bottom": "data1",
"top": "sparse_embedding1",
"sparse_embedding_hparam": {
"max_vocabulary_size_per_gpu": 20000000,
"load_factor": 0.75,
"embedding_vec_size": 16,
"combiner": 1
}
},
{
"name": "reshape1",
"type": "Reshape",
"bottom": "sparse_embedding1",
"top": "reshape1",
"leading_dim": 1200
},
{
"name": "concat1",
"type": "Concat",
"bottom": ["reshape1","dense"],
"top": "concat1"
},
{
"name": "slice1",
"type": "Slice",
"bottom": "concat1",
"ranges": [[0,1200], [0,1200]],
"top": ["slice11", "slice12"]
},
{
"name": "multicross1",
"type": "MultiCross",
"bottom": "slice11",
"top": "multicross1",
"mc_param": {
"num_layers": 3
}
},
{
"name": "fc1",
"type": "InnerProduct",
"bottom": "slice12",
"top": "fc1",
"fc_param": {
"num_output": 256
}
},
{
"name": "relu1",
"type": "ReLU",
"bottom": "fc1",
"top": "relu1"
},
{
"name": "dropout1",
"type": "Dropout",
"rate": 0.5,
"bottom": "relu1",
"top": "dropout1"
},
{
"name": "fc2",
"type": "InnerProduct",
"bottom": "dropout1",
"top": "fc2",
"fc_param": {
"num_output": 128
}
},
{
"name": "relu2",
"type": "ReLU",
"bottom": "fc2",
"top": "relu2"
},
{
"name": "dropout2",
"type": "Dropout",
"rate": 0.5,
"bottom": "relu2",
"top": "dropout2"
},
{
"name": "fc3",
"type": "InnerProduct",
"bottom": "dropout2",
"top": "fc3",
"fc_param": {
"num_output": 64
}
},
{
"name": "relu3",
"type": "ReLU",
"bottom": "fc3",
"top": "relu3"
},
{
"name": "dropout3",
"type": "Dropout",
"rate": 0.5,
"bottom": "relu3",
"top": "dropout3"
},
{
"name": "concat2",
"type": "Concat",
"bottom": ["dropout3","multicross1"],
"top": "concat2"
},
{
"name": "fc4",
"type": "InnerProduct",
"bottom": "concat2",
"top": "fc4",
"fc_param": {
"num_output": 1
}
},
{
"name": "loss",
"type": "BinaryCrossEntropyLoss",
"bottom": ["fc4","label"],
"top": "loss"
}
]
}
I am using HugeCTR docker image: https://ngc.nvidia.com/catalog/containers/nvidia:hugectr.
When training datasets which are preprocessed using nvtabular with parquet format as mentioned in example for criteo, huge_ctr fails saying Illeagal memory access.
These are the machine configurations:
HugeCTR-2.1_beta/cub/cub/device/dispatch/../../agent/../thread/../util_ptx.cuh(276): error: identifier "__syncwarp" is undefined
HugeCTR-2.1_beta/cub/cub/device/dispatch/../../agent/../thread/../util_ptx.cuh(287): error: identifier "__any_sync" is undefined
HugeCTR-2.1_beta/cub/cub/device/dispatch/../../agent/../thread/../util_ptx.cuh(300): error: identifier "__all_sync" is undefined
HugeCTR-2.1_beta/cub/cub/device/dispatch/../../agent/../thread/../util_ptx.cuh(313): error: identifier "__ballot_sync" is undefined
4 errors detected in the compilation of "/tmp/tmpxft_0000e907_00000000-6_embedding_creator.cpp1.ii".
HugeCTR/src/CMakeFiles/huge_ctr_static.dir/build.make:101: recipe for target 'HugeCTR/src/CMakeFiles/huge_ctr_static.dir/embedding_creator.cu.o' failed
make[2]: *** [HugeCTR/src/CMakeFiles/huge_ctr_static.dir/embedding_creator.cu.o] Error 1
CMakeFiles/Makefile2:156: recipe for target 'HugeCTR/src/CMakeFiles/huge_ctr_static.dir/all' failed
make[1]: *** [HugeCTR/src/CMakeFiles/huge_ctr_static.dir/all] Error 2
Makefile:129: recipe for target 'all' failed
make: *** [all] Error 2
Is there any performance benchmark result for the latest HugeCTR release on A100 GPUs? https://github.com/NVIDIA/HugeCTR/releases/tag/v2.1_a100_update
Currently , the hugectr main process supports: [--train] [--help] [--version] . However, it's a common scenario that when training is done we predict the test data and print the result on the screen which can redirect to file.
With the predict result, we can :
Can hugectr add predict command?
Hi, thanks for the nice work. I viewed the code and meet the following quesitons.
822 = hash('10')%100
But what I can find is only hasbTable to store the mapping <10, 822>. I want to know the part it generates the 822.
This is a trivial bug report.
It looks like the last argument at https://github.com/NVIDIA/HugeCTR/blob/master/tools/dockerfiles/build.Dockerfile#L36, -DNCCL_A2A=NCCL_A2A
, should be replaced with -DNCCL_A2A=$NCCL_A2A
.
Description:
Finish a draft version by contributors by 9th Nov
Reorganize start from 9th Nov (PIC Lamont)
Comments:
build v2.2 with command : mkdir build && cd build && cmake -DCMAKE_BUILD_TYPE=Release -DNCCL_A2A=ON -DSM=70 .. && make -j
got error:
[ 3%] Building CUDA object HugeCTR/src/CMakeFiles/huge_ctr_static.dir/layers/batch_norm_layer.cu.o
nvcc fatal : Value 'all-warnings' is not defined for option 'Werror'
HugeCTR/src/CMakeFiles/huge_ctr_static.dir/build.make:134: recipe for target 'HugeCTR/src/CMakeFiles/huge_ctr_static.dir/layers/batch_norm_layer.cu.o' failed
make[2]: *** [HugeCTR/src/CMakeFiles/huge_ctr_static.dir/layers/batch_norm_layer.cu.o] Error 1
CMakeFiles/Makefile2:124: recipe for target 'HugeCTR/src/CMakeFiles/huge_ctr_static.dir/all' failed
make[1]: *** [HugeCTR/src/CMakeFiles/huge_ctr_static.dir/all] Error 2
Makefile:129: recipe for target 'all' failed
make: *** [all] Error 2
and
data/HugeCTR/test/utest/layers/multi_cross_layer_test.cpp:276:25: note: suggested alternative: 'compare_array_approx'
/data/HugeCTR/test/utest/layers/multi_cross_layer_test.cpp:276:7: error: expected primary-expression before '(' token
ASSERT_TRUE(test::compare_array_approx_with_ratio(
Solution:
we fix the error by deleting some conf in CMakeLists.txt,
delete the "--Werror all-warnings" and test modular.
Does the v2.2 testing use the dockerfile?
After processing Criteo dataset with NVTabular and generating the output parquet files, I get Runtime error: an illegal memory access
when I try to train using HugeCTR and DLRM model.
[06d20h48m42s][HUGECTR][INFO]: Iter: 14000 Time(1000 iters): 51.684892s Loss: 0.131229 lr:24.000000
[HCDEBUG][ERROR] Runtime error: an illegal memory access was encountered /HugeCTR/HugeCTR/src/embeddings/update_params_functor.cu:571
[HCDEBUG][ERROR] Runtime error: an illegal memory access was encountered /HugeCTR/HugeCTR/src/embeddings/update_params_functor.cu:571
[HCDEBUG][ERROR] Runtime error: an illegal memory access was encountered /HugeCTR/HugeCTR/src/session.cpp:427
terminate called after throwing an instance of 'HugeCTR::internal_runtime_error'
what(): [HCDEBUG][ERROR] Runtime error: an illegal memory access was encountered /HugeCTR/HugeCTR/include/general_buffer2.hpp:37
Currently HugeCTR does not support cudf 0.16. It keeps throwing the following error.
/home/rapids/hugectr/HugeCTR/include/data_readers/parquet_data_reader_worker.hpp:29:10: fatal error: cudf/io/functions.hpp: No such file or directory
#include <cudf/io/functions.hpp>
cudf 0.16 has refactored some code, and the functions.hpp does not exist anymore. Includes in HugeCTR have to be updated.
set seed in config file: "solver": {"seed": 100}
set maxiter=2 , eval_interval=1
set file_list.txt with 1 file set file_list_with.txt with 1 file
set train and eval reader chunk_size with 1 : data_reader.reset(new DataReader(source_data, batch_size, label_dim, dense_dim,
check_type, data_reader_sparse_param_array,
gpu_resource_group, 1, use_mixed_precision));
run the train process ./huge_ctr --train model.json twice
AverageLoss(1.200125 and 1.18949 ) are too far
the first train log:
[05d17h49m17s][HUGECTR][INFO]: Iter: 1 Time(1 iters): 0.101207s Loss: 1.211278 lr:0.000100
[05d17h49m18s][HUGECTR][INFO]: Evaluation, AUC: 0.501446
[05d17h49m18s][HUGECTR][INFO]: Evaluation, AverageLoss: 1.200125
the second train log:
[05d17h51m37s][HUGECTR][INFO]: Iter: 1 Time(1 iters): 0.093456s Loss: 1.200530 lr:0.000100
[05d17h51m37s][HUGECTR][INFO]: Evaluation, AUC: 0.397724
[05d17h51m37s][HUGECTR][INFO]: Evaluation, AverageLoss: 1.18949
Hi, when I was trying to run DLRM with terabyte dataset with one GPU, I got a runtime error message like this. My guess is I ran out of my GPU memory. I've also tried to decrease the mini batch size or batchsize_eval but still get this error. Does anyone know how to solve this issue?
I was running the following command:
./huge_ctr --train ./dlrm_fp16_64k.json
And the solver in my dlrm_fp16_64k.json looks like this:
"solver": {
"lr_policy": "fixed",
"display": 1000,
"max_iter":64013,
"gpu": [0],
"batchsize": 1024,
"batchsize_eval": 131072,
"snapshot": 10000000,
"snapshot_prefix": "./",
"eval_interval": 3200,
"eval_batches": 681,
"mixed_precision": 1024,
"eval_metrics": ["AUC:0.8025"]
}
Currently the embedding layer, supports mean or sum pooling for the variable length. In the deep learning word, using LSTM 、 Attention, is normal. For example , DIN model uses attention layer to merge user behavior sequence.
Can Hugectr supports Sequence Model, such as LSTM 、 GRU 、 Attention ,etc?
Hi,
I read the V2.2 code, when the hash type is LocalizedSlotSparseEmbeddingOneHot,why the local_id = feature_ids[k] + slot_offset_[k],what's the meaning of this?
if (params_.size() == 1 && params_[0].type == DataReaderSparse_t::Localized && !slot_offset_.empty()) { auto& param = params_[0]; for (int k = 0; k < param.slot_num; k++) { int dev_id = k % csr_chunk->get_num_devices(); T local_id = feature_ids[k] + slot_offset_[k]; csr_chunk->get_csr_buffer(param_id, dev_id).push_back_new_row(local_id);** } }
Description:
We may need a general command line options parser for ./huge_ctr, ./data_generator et cetera.
Comments:
run nvprof and all run in the same stream
Hi professionals,
We tried the steps in the HugeCTR tutorial and picked DeepFM for trail and successfully started the training, but nothing happened after the 'HugeCTR training start' text (we had waited for several days).
We tried several network configs, which however focusing on the max_iter meaning that network architecture was not changed, same problem.
System: Ubuntu 18.04.4 LTS
GPU: GeForce RTX 2080 Ti
Driver Version: 440.44
CUDA Version: 10.2
I watched Zehuan Wang's share "HugeCTR - 端到端点击率预估训练解决方案介绍",
in this ppt, at the "PERFORMANCE" slide, use the 8-GPU performance is only 17.8ms per iter.
only 17.8ms per iter is too fast, Is this some error?
Description:
Comments:
Is there any whole tutorial about running hugectr with multi nodes ?
I have try this:
Follow the examples(https://github.com/NVIDIA/HugeCTR/tree/master/samples/dcn2nodes) , what have done is:
Build an mutlinode support images:
Run hugectr with two NVlink supported 8*V100(32G) phyical machines.
start_dist.sh:
set -x
mpirun --bind-to none --allow-run-as-root -np
ssh_resolver.sh:
#!/bin/bash
HOSTNAME=$1
shift
ARGS=$*
ssh -p "$SSH_PORT" "$HOSTNAME" "$ARGS"
Is my mpirun command is correct ? Should I specfic ucx in mpirun?How hugectr use the ucx 、hwloc ? And how can I user Inifiniband \ RDMA to accelerate hugectr?
For example ,the ucx command looks like:
mpirun -np 2 -mca pml ucx -x UCX_NET_DEVICES=mlx5_0:1 ./app
I was trying to run HugeCTR on the Criteo Kaggle dataset. When I was converting the original Kaggle dataset to HugeCTR format using Criteo2hugeCTR_legacy tool, I was running the command line as following:
$ ./criteo2hugectr_legacy 1 ../../tools/criteo_script_legacy/train.out criteo/sparse_embedding file_list.txt
$ ./criteo2hugectr_legacy 1 ../../tools/criteo_script_legacy/test.out criteo_test/sparse_embedding file_list_test.txt
However, I'm not able to get the file_list.txt abd file_list_test.txt from these scripts. I'm not sure what I did wrong here, since I pretty much followed the readme online from the beginning.
I also did some trails and realized that the problem might be in criteo2hugectr_legacy.cpp, since I wasn't able to read the eof of txt_file (line 95).
I'd really appreciate it if you guys could explain this a bit. Thank you very much!
Description:
DataReader::set_source(){
worker_group_.reset(new xxx_data_reader_worker_group);
}
# and no need to explicit call of start()
repeat
from DataReader when it is ready, e.g., enable set_source
for Raw
n_batches
. It may require the change to Metrics as well. (@minseokl will see how TF and Pytorch tackle this issue)Hi there,
I tried running HugeCTR Docker example of DeepFM with NVTabular preprocessing, but after running the command on the title, it shows errors and stops at the training start. Is there any bug?Thx.
System: Ubuntu 18.04.4 LTS
GPU: GeForce RTX 2080 Ti
Driver Version: 440.44
CUDA Version: 10.2
[0.001, init_start, ]
HugeCTR Version: 2.2.1
Config file: ./deepfm_bin.json
[21d09h02m26s][HUGECTR][INFO]: batchsize_eval is not specified using default: 512
[21d09h02m26s][HUGECTR][INFO]: Default evaluation metric is AUC without threshold value
[21d09h02m26s][HUGECTR][INFO]: algorithm_search is not specified using default: 1
[21d09h02m26s][HUGECTR][INFO]: Algorithm search: ON
[21d09h02m26s][HUGECTR][INFO]: cuda_graph is not specified using default: 1
[21d09h02m26s][HUGECTR][INFO]: CUDA Graph: ON
[21d09h02m26s][HUGECTR][INFO]: Initial seed is 3545387129
[21d09h02m28s][HUGECTR][INFO]: Peer-to-peer access cannot be fully enabled.
Device 0: GeForce RTX 2080 Ti
[21d09h02m30s][HUGECTR][INFO]: cache_eval_data is not specified using default: 0
[21d09h02m30s][HUGECTR][INFO]: max_nnz is not specified using default: 30
[21d09h02m30s][HUGECTR][INFO]: num_internal_buffers 1
[21d09h02m30s][HUGECTR][INFO]: num_internal_buffers 1
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp[HCDEBUG][ERROR] DataHeaderError:58
[HCDEBUG][ERROR] DataHeaderError[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp[HCDEBUG][ERROR] :[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58[HCDEBUG][ERROR]
DataHeaderError [HCDEBUG][ERROR] /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] [HCDEBUG][ERROR] [HCDEBUG][ERROR] 58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp[HCDEBUG][ERROR] DataHeaderErrorDataHeaderErrorDataHeaderError [HCDEBUG][ERROR] /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp58DataHeaderError/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp::DataHeaderErrorDataHeaderError:58
58 [HCDEBUG][ERROR]
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp::[HCDEBUG][ERROR] 58[HCDEBUG][ERROR] /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
DataHeaderError[HCDEBUG][ERROR] DataHeaderError58[HCDEBUG][ERROR]
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp[HCDEBUG][ERROR] [HCDEBUG][ERROR] :DataHeaderError58 /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp
DataHeaderError58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
DataHeaderError[HCDEBUG][ERROR] 58
DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp[HCDEBUG][ERROR] DataHeaderError :58
:58DataHeaderError[HCDEBUG][ERROR] DataHeaderError[HCDEBUG][ERROR] DataHeaderError DataHeaderError [HCDEBUG][ERROR] /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp [HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp
::58:5858[HCDEBUG][ERROR] DataHeaderErrorDataHeaderError58: 58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp[HCDEBUG][ERROR] :58
58 [HCDEBUG][ERROR] /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp
:58
[HCDEBUG][ERROR] DataHeaderError[HCDEBUG][ERROR] DataHeaderError [HCDEBUG][ERROR] DataHeaderError: /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp58:
DataHeaderError[HCDEBUG][ERROR] /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp::DataHeaderError58
[HCDEBUG][ERROR] DataHeaderError:[HCDEBUG][ERROR] DataHeaderError/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp[HCDEBUG][ERROR] 58/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp 58
[HCDEBUG][ERROR] DataHeaderError58[HCDEBUG][ERROR] DataHeaderError:/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp::58
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp58
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
DataHeaderError[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp::58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hppDataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp[HCDEBUG][ERROR] DataHeaderError :[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError[HCDEBUG][ERROR] DataHeaderError [HCDEBUG][ERROR] DataHeaderError/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp::5858[HCDEBUG][ERROR]
[HCDEBUG][ERROR] /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[21d09h02m30s][HUGECTR][INFO]: max_vocabulary_size_per_gpu_=1737709
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
58
:[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp: 5858
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58[HCDEBUG][ERROR]
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp[HCDEBUG][ERROR] [HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp58:
DataHeaderError 5858DataHeaderError58[HCDEBUG][ERROR] 58DataHeaderError
58:58
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp[HCDEBUG][ERROR] DataHeaderError:[HCDEBUG][ERROR] DataHeaderErrorDataHeaderError
58
[HCDEBUG][ERROR] [HCDEBUG][ERROR] DataHeaderError [HCDEBUG][ERROR] DataHeaderError [HCDEBUG][ERROR] /hugectr/HugeCTR/include/data_readers/data_reader_worker.hppDataHeaderError
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hppDataHeaderError58/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp [HCDEBUG][ERROR] /hugectr/HugeCTR/include/data_readers/data_reader_worker.hppDataHeaderError:[HCDEBUG][ERROR] 58 ::
[HCDEBUG][ERROR] DataHeaderError:/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp[HCDEBUG][ERROR] :
58
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp: 58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58DataHeaderError
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58:/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:[HCDEBUG][ERROR] [HCDEBUG][ERROR] DataHeaderErrorDataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
:58
[HCDEBUG][ERROR] [HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError
58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
58
58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError [HCDEBUG][ERROR] [HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:[HCDEBUG][ERROR]
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
DataHeaderError 58
[HCDEBUG][ERROR] [HCDEBUG][ERROR] DataHeaderErrorDataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
:58
[HCDEBUG][ERROR] /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError :58
[HCDEBUG][ERROR] DataHeaderError 58/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:[HCDEBUG][ERROR] DataHeaderError58
DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
58[HCDEBUG][ERROR]
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError [HCDEBUG][ERROR] DataHeaderError58
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] [HCDEBUG][ERROR] DataHeaderError
[HCDEBUG][ERROR] [HCDEBUG][ERROR] [HCDEBUG][ERROR] DataHeaderErrorDataHeaderError[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:DataHeaderError/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
58
[HCDEBUG][ERROR] /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp58/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp
:58[HCDEBUG][ERROR]
DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
DataHeaderError[HCDEBUG][ERROR] DataHeaderErrorDataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
58 DataHeaderError [HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError :58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderErrorDataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] [HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hppDataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] [HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
:58
:[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
58
[HCDEBUG][ERROR] DataHeaderError[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
:[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
:58
[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
58[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:DataHeaderErrorDataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
:[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError 58
[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp
[HCDEBUG][ERROR] DataHeaderError:58:
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp58
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] [HCDEBUG][ERROR] DataHeaderError DataHeaderError 58
[HCDEBUG][ERROR] DataHeaderError :58[HCDEBUG][ERROR]
DataHeaderError[HCDEBUG][ERROR] DataHeaderError/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
:58/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp: [HCDEBUG][ERROR]
58/hugectr/HugeCTR/include/data_readers/data_reader_worker.hppDataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp::
[HCDEBUG][ERROR] 5858
[HCDEBUG][ERROR] DataHeaderErrorDataHeaderError
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp: /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp [HCDEBUG][ERROR] DataHeaderError[HCDEBUG][ERROR] /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp::58
:[HCDEBUG][ERROR] 58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58DataHeaderError58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
DataHeaderError58/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError:[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
58
58 /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
[HCDEBUG][ERROR] DataHeaderError [HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:
[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
58
[HCDEBUG][ERROR] 58[HCDEBUG][ERROR] DataHeaderError[HCDEBUG][ERROR] DataHeaderError
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError [HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError DataHeaderError[HCDEBUG][ERROR] /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
DataHeaderError[HCDEBUG][ERROR] /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
DataHeaderError[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
/hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
: /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
58
[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] DataHeaderError /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:58
[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
[HCDEBUG][ERROR] Runtime error: failed to read a file /hugectr/HugeCTR/include/data_readers/data_reader_worker.hpp:69
[21d09h02m30s][HUGECTR][INFO]: gpu0 start to init embedding
[21d09h02m30s][HUGECTR][INFO]: gpu0 init embedding done
[21d09h02m30s][HUGECTR][INFO]: warmup_steps is not specified using default: 1
[21d09h02m30s][HUGECTR][INFO]: decay_start is not specified using default: 0
[21d09h02m30s][HUGECTR][INFO]: decay_steps is not specified using default: 1
[21d09h02m30s][HUGECTR][INFO]: decay_power is not specified using default: 2.000000
[21d09h02m30s][HUGECTR][INFO]: end_lr is not specified using default: 0.000000
[3538.92, init_end, ]
[3538.94, run_start, ]
HugeCTR training start:
[3538.95, train_epoch_start, 0, ]
Description:
Currently, our parser doesn't check if a specified layer "name" is already being used by a preceding layer.
As a result, the following erroneous layers can be silently inserted into network.
Without any safety measure, this kind of config bug can result in a disconnected network, where parameters are not appropriately trained.
{
"name": "fc6",
"type": "InnerProduct",
"bottom": "relu5",
"top": "fc6",
"fc_param": {
"num_output": 512
}
},
{
"name": "fc6",
"type": "InnerProduct",
"bottom": "relu5",
"top": "fc6",
"fc_param": {
"num_output": 512
}
},
{
"name": "relu6",
"type": "ReLU",
"bottom": "fc6",
"top": "relu6"
},
Comments:
Description:
commit:
commit 53a2ff8 (HEAD -> v2.3-integration, origin/v2.3-integration, origin/data-power-law-kingsley)
Merge: e75c290 3c49da0
Author: Joey Wang [email protected]
Date: Sat Oct 31 01:24:10 2020 -0700
Merge branch ‘fea-multinode-auc-dmitry-2.3’ into ‘v2.3-integration’
Multinode AUC
See merge request zehuanw/hugectr!257
dataset: /mnt/dldata/criteo_1TB/albertoa/test_dask/output/ in dlcluster
config: 2xdgxa100.json
error log:hugectr-test-1604632076.log
reproduce step:
currently I am facing this bug when using raplab. I will update how to reproduce when I have access to selene
Comments:
I just want to run a DCN sample training, and use following model JSON
{
"solver": {
"lr_policy": "fixed",
"display": 1000,
"max_iter": 10000,
"gpu": [0],
"batchsize": 512,
"snapshot": 10000000,
"snapshot_prefix": "./",
"eval_interval": 1000,
"eval_batches": 60,
"input_key_type": "I64"
},
"optimizer": {
"type": "Adam",
"global_update": true,
"adam_hparam": {
"learning_rate": 0.001,
"beta1": 0.9,
"beta2": 0.999,
"epsilon": 0.0000001
}
},
"layers": [
{
"name": "data",
"type": "Data",
"format": "Parquet",
"slot_size_array": [1461, 558, 335378, 211710, 306, 20, 12136, 634, 4, 51298, 5302, 332600, 3179, 27, 12191, 301211, 11, 4841, 2086, 4, 324273, 17, 16, 79734, 96, 58622],
"source": "./dcn_data/train/_file_list.txt",
"eval_source": "./dcn_data/val/_file_list.txt",
"check": "None",
"label": {
"top": "label",
"label_dim": 1
},
"dense": {
"top": "dense",
"dense_dim": 13
},
"sparse": [
{
"top": "data1",
"type": "DistributedSlot",
"max_feature_num_per_sample": 30,
"slot_num": 26
}
]
},
{
"name": "sparse_embedding1",
"type": "DistributedSlotSparseEmbeddingHash",
"bottom": "data1",
"top": "sparse_embedding1",
"sparse_embedding_hparam": {
"max_vocabulary_size_per_gpu": 1737709,
"embedding_vec_size": 16,
"combiner": 0
}
},
{
"name": "reshape1",
"type": "Reshape",
"bottom": "sparse_embedding1",
"top": "reshape1",
"leading_dim": 416
},
{
"name": "concat1",
"type": "Concat",
"bottom": ["reshape1","dense"],
"top": "concat1"
},
{
"name": "slice1",
"type": "Slice",
"bottom": "concat1",
"ranges": [[0,429], [0,429]],
"top": ["slice11", "slice12"]
},
{
"name": "multicross1",
"type": "MultiCross",
"bottom": "slice11",
"top": "multicross1",
"mc_param": {
"num_layers": 6
}
},
{
"name": "fc1",
"type": "InnerProduct",
"bottom": "slice12",
"top": "fc1",
"fc_param": {
"num_output": 1024
}
},
{
"name": "relu1",
"type": "ReLU",
"bottom": "fc1",
"top": "relu1"
},
{
"name": "dropout1",
"type": "Dropout",
"rate": 0.5,
"bottom": "relu1",
"top": "dropout1"
},
{
"name": "fc2",
"type": "InnerProduct",
"bottom": "dropout1",
"top": "fc2",
"fc_param": {
"num_output": 1024
}
},
{
"name": "relu2",
"type": "ReLU",
"bottom": "fc2",
"top": "relu2"
},
{
"name": "dropout2",
"type": "Dropout",
"rate": 0.5,
"bottom": "relu2",
"top": "dropout2"
},
{
"name": "concat2",
"type": "Concat",
"bottom": ["dropout2","multicross1"],
"top": "concat2"
},
{
"name": "fc4",
"type": "InnerProduct",
"bottom": "concat2",
"top": "fc4",
"fc_param": {
"num_output": 1
}
},
{
"name": "loss",
"type": "BinaryCrossEntropyLoss",
"bottom": ["fc4","label"],
"top": "loss"
}
]
}
But my first 1000 iters metrics report is very strange:
[04d15h08m10s][HUGECTR][INFO]: Iter: 1000 Time(1000 iters): 6.113479s Loss: 0.527308 lr:0.001000
[8665.98, eval_start, 0.1, ]
[04d15h08m10s][HUGECTR][INFO]: Evaluation, AUC: 0.692035
[8708.16, eval_accuracy, 0.692035, 0.1, 1000, ]
[04d15h08m10s][HUGECTR][INFO]: Eval Time for 60 iters: 0.042175s
[8708.18, eval_stop, 0.1, ]
[04d15h08m16s][HUGECTR][INFO]: Iter: 2000 Time(1000 iters): 6.171510s Loss: 0.426323 lr:0.001000
[14837.7, eval_start, 0.2, ]
....
This is normal that my first 1000 iters AUC is hit 0.692035? And I find next many 1000 iters AUC is reducing.
Is there a way to run training without having a validation set? Whenever I don't have anything for source_eval I got an file_empty kind of error.
On top of that, HugeCTR really needs to work on error messages, I had to trace the code to see what is happening.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.