Coder Social home page Coder Social logo

coreylowman / dfdx Goto Github PK

View Code? Open in Web Editor NEW
1.6K 31.0 96.0 2.66 MB

Deep learning in Rust, with shape checked tensors and neural networks

License: Other

Rust 92.15% Cuda 7.58% GLSL 0.26% WGSL 0.02%
rust autograd autodiff machine-learning neural-network autodifferentiation rust-lang backpropagation tensor deep-learning

dfdx's People

Contributors

cbournhonesque avatar ccaven avatar clstatham avatar coreylowman avatar daughterofmars avatar dimev avatar favilo avatar infalmo avatar inflectrix avatar jafioti avatar jcrist1 avatar kstavro avatar leodog896 avatar m1ngxu avatar narsil avatar nkoppel avatar opfromthestart avatar optman avatar quietlychris avatar rainiwu avatar swfsql avatar timerertim avatar timwedde avatar vasanthakumarv avatar vikigenius avatar viliamvadocz avatar xbagon avatar yannickfricke avatar yerke avatar zojeda avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dfdx's Issues

Clone for UniqueId should produce a different id

For safety & clarity reasons. If you clone a tensor for backprop, more often than not you want that to be a different tensor and for it to be treated separately during backprop.

For cases where you do want to keep the id the same, .duplicate() should be used.

The only place this really occurs is in kl_div_with_logits_loss where target_probs is cloned sicne it's used twice.

Add `nn::DropoutOneIn<N>`

Ideally we'd have p be a const parameter. unfortunately f32 cannot be const in stable.

Many uses cases make p 1 / N, where N is just an integer.

Dropout1In<N> would set p to be 1.0 / N as f32 for now.

add hard_cross_entropy

Current only works for actual probability distributions. hard cross entropy only has 1 non zero entry in inner dimension, so sum across that before taking mean

Randomize parameters based on parameter size

E.g. for xavier uniform initialization you need to know the in size & out size.

This will likely require a different trait than Randomize, and I'm still inclined to keep randomize. It'll also be slightly easier to use since the user won't have to pass in a distribution.

Options:

  • model.reset_params(&mut rng);
  • model.init_params(&mut rng);
  • model.randomize_params(&mut rng);

This should use Tensor::randomize() under the hood.

Add `max_last_dim()`

This would reduce last dim to the maximum value in that dimension. It can use T::Device::reduce_last_dim(..., &mut f32::max) (see logsumexp for example using that).

Example:

let t: Tensor2D<2, 3> = Tensor2D::new([[1.0, 2.0, 3.0], [-1.0, -2.0, -3.0]]);
let r: Tensor1D<2> = max_last_dim(t);
assert_eq!(r.data(), &[3.0, -1.0]);

Add `concatenate` function

Needed for #34 . In multi head attention, you concatenate the output of all the single attention heads.

Note that this may require nightly access similar to #1 because we can't do expressions with const generics yet.

Add multiple dtype support

This will be another generic parameter of all tensors. Most existing operations will likely require float generic.

Related to #9 since it involves an additional generic parameter

Add Batch sampler utility class

Something that takes a usize length of the dataset, and you can:

  1. Sample batches of a const known size
  2. Iterate shuffled batches of const known size

Each of these would return a [usize; M] where M is a const M: usize

Save/load from numpy file

This will need:

  • Write single tensor to .npy file
  • Create a zip with multiple files from a struct
  • Ability to read a single np array from file into a tensor
  • Ability to read a collection of np arrays into a arbitrarily nested struct of tensors

Select subset operation

something like

fn select<const S: usize>(self, inds: &[usize, N]) -> Self<S, ...>;

I imagine the gradients for this would just be 1 if i is in inds, otherwise 0

Add something nn layer for multi head

This would be variable sized head where the input to the module is duplicated and the same input is passed to all sub modules.

Unclear how this would work since we are already using tuples. Perhaps something like:

impl Module<I> for MultiHead<(A, B)> {}
impl Module<I> for MultiHead<(A, B, C)> {}
impl Module<I> for MultiHead<(A, B, C, D)> {}
...

?

Add `gather_last_dim()`

This would accept an array of T::Reduced::ArrayType, where Dtype is usize, and select the items from last dimension that match up. It would return a Tensor::Reduced.

Example:

let t: Tensor2D<2, 3> = Tensor2D::new([[1.0, 2.0, 3.0], [-1.0, -2.0, -3.0]]);
let r: Tensor1D<2> = gather_last_dim(t, [0, 1]);
assert_eq!(r.data(), &[1.0, -2.0]);

Use OpenBLAS/BLAS/IntelMKL for matrix multiplication

Currently using the matrixmultiply crate, but I think performance could be much improved with using the actual BLAS library. Unclear how compiling/including that works since it has to be compiled per machine.

GPU Mega Issue

There's a lot of work to be done here. Very rough list of todos:

  • Preparation

    • Move map functions to devices #199
    • Move conv to devices #198
    • Add where clauses for map functions to make partial progress on kernels possible (so we can start using cuda without all ops implemented)
  • Devices

    • Add Cuda device that wraps cudarc::CudaDevice and an rng
    • Add StdRng to Cpu
    • Add rng seed to device construction
    • Add two GATs to device trait: DeviceArc and DeviceRng
      • Add CpuRc which contains Arc<T> and Arc<Cpu>
  • Tensors

    • Add Device to all tensor structs
    • TensorCreator should accept &Device as parameter, and remove Rng since that will be accessed through device
    • Move Device to generic argument of Tensors
    • Enable moving tensors between devices
  • nn

    • Add trait ModuleCreator
      • Add ModuleCreator::zeros(Device)
      • Add ModuleCreator::default(Device) which calls zeros & reset params
    • Remove implementations for Default
    • Remove rng parameter from ResetParams, should use tensor's devices
  • Kernels

    • Add trait LaunchKernel<K, Args>
    • Move all Cpu traits to a combo of impl LaunchKernel<...> for Cpu and trait <Kernel>CpuImpl/impl <Kernel>CpuImpl for <Kernel>. See cudarc/examples/kernels.rs
    • (In a separate crate) proc macro that wraps around kernels and maps them to something usable for ptx compiling (e.g. kernel!(|a, b, c| { *a = b + c }) (#185)
    • Look into when/how to build the kernels (compile time hopefully??) (#184)
  • Testing

    • Add feature based device construction in all tests (something like #[cfg(feature="test-cuda"]) that when specified uses cuda instead of cpu?
    • Add macro build_test_device!() to use that uses testing features to create the device

Done:

  • Is it even possible to compile a rust closure to a cuda kernel? Assuming very small set of supported operations. Is this worth the maintainability?
    • If we go the fixed set of functions route, how many different generic closures does dfdx use currently?
    • ANSWER: Yes it is possible (the rust cuda project does it), but it will take some work. Automatic closure conversion to kernel is probably the direction i'll be trying to go since hand building all the cuda kernels next to the cpu closures seems too much work.
  • What functionality does nvidia provide for deep learning already? Assuming matmul & conv forward/backward. How to use these?
    • ANSWER: cudnn, all tensors are 4d, supports base set of operations. probably not what we want to depend on tbh since it doesn't support everything we would need on GPU (e.g. optimizer kernels)

Roadmap

0.9.0 - nightly conv nets & transformers

Comparison against pytorch (patch version bump)

Misc other generic const exprs functions (patch version bump)


Released v0.5.1 - Mnist example with linear MLP

Released v0.5.2 - RL examples & save/load

Released v0.6.0 - transformers prep & other additions

Transformers mega issue

Would like to add an small example of using a transformer architecture. This will likely involve new features such as batch mat mul and maybe some others.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.