Coder Social home page Coder Social logo

tjyuyao / ice-learn Goto Github PK

View Code? Open in Web Editor NEW
5.0 5.0 0.0 1.79 MB

A high-level Deep Learning framework that extends PyTorch and PyCUDA.

Home Page: https://tjyuyao.github.io/ice-learn

License: MIT License

Python 99.60% Cuda 0.40%
framework modular multi-task-learning pycuda pytorch

ice-learn's People

Contributors

daviswang0 avatar tjyuyao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

ice-learn's Issues

unable to open shared memory object in read-write mode

Pickling in multiprocess may trigger the following error:

INTERNAL ASSERT FAILED at "../aten/src/ATen/MapAllocator.cpp":263, please report a bug to PyTorch. 
unable to open shared memory object </torch_10722_1002> in read-write mode

According to this issue, I use ulimit -n 4096 and the problem fixed.

distributed reduce recursively

Commit 7f6741e implemented a dist_backend argument for ElasticLauncher, that supports ["nccl", "gloo", "mpi", "auto"]. If given "auto", will use "nccl" for "cuda" device and "gloo" for "cpu" device in general. If cuda devices has duplicates, e.g. "cuda:0,0", then "gloo" will be used as "nccl" does not support that and trigger issue similar to Lightning-AI/pytorch-lightning#4420 (comment) .

ice will ensure only use broadcast and all_reduce function, which is available for both gloo and nccl backend on cuda devices, according to https://pytorch.org/docs/stable/distributed.html#backends .

cuda.h missing error during installing pycuda

This could happen on some machines when the environmental variables are not properly configured.

export CUDA_HOME=/usr/local/cuda
export PATH=${CUDA_HOME}/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64:${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export C_INCLUDE_PATH=${CUDA_HOME}/include:${C_INCLUDE_PATH}
export LIBRARY_PATH=${CUDA_HOME}/lib64:$LIBRARY_PATH
pip install pycuda

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.