rapidsai / deeplearning Goto Github PK

Python 11.02% Jupyter Notebook 88.94% Dockerfile 0.03%

deeplearning's Introduction

RAPIDS.AI Deep Learning Repo

This repository is the home of our efforts to integrate RAPIDS acceleration of dataframes on GPU into popular deep learning frameworks. The work can be broken down into three main sections:

Dataloaders and preprocessing functionality developed to help provide connectivity between RAPIDS cuDF dataframes and the different deep learning libraries available.
Improvements to optimizers through the fusion of GPU operations.
Examples of the use of each of the above in competitions or on real world datasets.

Each deep learning library is contained within it's own subfolder, with the different dataloader options and examples contained within further subfolders. For now our focus is on PyTorch, however we expect to add other libraries in the future.

deeplearning's People

Contributors

Stargazers

Watchers

Forkers

ericharper kkrbalam rjzamora madsbk mindis ilyeong-ai shashankg7 chaoso ajunlonglive harirajeev jakirkham aleksficek micseb jeromeku sumitsidana sweatyrichard returaj yosumkmk saaay71 gehongpeng linkonbsmrstu angelaheumann pintonos dayeren crossoverwang babylls huxiao64 pathfinder-x jiangquan8 jamiekang shafiahmed goncaloperes mullue zzszmyf yiyiyi4321 zizaifeiyu kuntu bognevivien moxue1314 mahbuba-tasmin sh4zkh4n valerielimyh shubhampachori12110095 jackleg demirkirans forzamilan1899 jeongyoonlee louislung sandy4321 yasark jayachithra snapbuy o7s8r6 i-hun echukwuka rakshitsakhuja aunell azamatolegen jonysalgado jainds karam333 js-ts northbreeze robin-vjc meghmak13 friskin vrm1 pranaydugar kforcodeai sbata1984 earthy123 shubhamguptaiitd czkonverse python-repository-hub zhangjingyi0617 paudel-arjun shaimathamer jamesthesnake techthiyanes anuranjan-kumar saraswatmks wffunnell nealchanai lhduc94 vicky2008 yin169 trongnghia05 maxong94 longbaonguyen13 anupriyar

deeplearning's Issues

RAdam + LAMB + LookAhead

RAdam + LAMB + LookAhead Numba Optimized Implementation

Question re: 05_1_TimeSeries_HistoricalEvents!

Good Afternoon! Thank you very much for all you have done with these repositories of knowledge! I had a question about the file: 05_1_TimeSeries_HistoricalEvents.ipynb

In the case of the solution code here:

############### Solution ###############
offset = '7D'

data_window = df_train[['product_id', 'date', 'target']].groupby(['product_id', 'date']).agg(['count', 'sum']).reset_index()
data_window.columns = ['product_id', 'date', 'count', 'sum']
data_window.index = data_window['date']

data_window_roll = data_window[['product_id', 'count', 'sum']].groupby(['product_id']).rolling(offset).sum().drop('product_id', axis=1)
data_window_roll = data_window_roll.reset_index()
data_window_roll.columns = ['product_id', 'date', 'count_' + offset, 'sum_' + offset]
data_window_roll[['count_' + offset, 'sum_' + offset]] = data_window_roll[['count_' + offset, 'sum_' + offset]].shift(1)
data_window_roll.loc[data_window_roll['product_id']!=data_window_roll['product_id'].shift(1), ['count_' + offset, 'sum_' + offset]] = 0
data_window_roll['avg_' + offset] = data_window_roll['sum_' + offset]/data_window_roll['count_' + offset]
data = df_train.merge(data_window_roll, how='left', on=['product_id', 'date'])
data

We are typically left with a np.nan value for the first row of each group's avg_7D. Would you all recode this to zero, or leave it as nan and drop the row? Additionally, would you typically include several of these in your model? Say, compute 3D and a 7D offset average?

Separately, I take it you apply identical functions to the valid and test sets, as well, right?

Lastly, where/when I might learn more about similar courses that you might offer in the future?

Thank you for your time and consideration!

XLNet code missing

Some code is missing from the winner’s submission;
https://github.com/rapidsai/deeplearning/tree/main/WSDM2021/02_Models/XLNet_SMF

Good stuff overall! Congrats!

Batch Dataset/DataLoader

Hey @EvenOldridge , I stumbled upon your post about cudf dataloaders speeding up training a long time ago... and recently got around to actually trying my hand at it, so thanks for introducing me to the idea!

I'm just curious, but is there a specific reason you implemented the batchdataloaders like you did in this repo instead of using a custom sampler + regular DataLoader with 0 workers?

I wrote out a quick sketch of the implementation I'm thinking of here: https://gist.github.com/NegatioN/1f63c3a79dfe13b183d413123d37d4fa

I understand that your implementation might already have changed significantly since you mentioned integrating it with fast.ai, but I was curious if you ruled this out for any specific reason that I can't clearly see atm. I would think it has the same performance capabilities?

Edit: The biggest difference might be we can grab each batch as a single read from contiguous memory? Did you test how large the impact of this was?

/Joakim

Broken container

The Dockerfile specifies an entrypoint script which is not present in the container.
This particular file entrypoint.sh is present in https://github.com/rapidsai/docker, but fixing the Dockerfile to COPY the file to the right location only leads to another issue (missing conda.sh).

What is the correct way to build the container then?
Thanks.

rapidsai / deeplearning Goto Github PK

deeplearning's Introduction

RAPIDS.AI Deep Learning Repo

deeplearning's People

Contributors

Stargazers

Watchers

Forkers

deeplearning's Issues

RAdam + LAMB + LookAhead

Question re: 05_1_TimeSeries_HistoricalEvents!

XLNet code missing

Batch Dataset/DataLoader

Broken container

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent