Coder Social home page Coder Social logo

What's the best way to get involved? about leaf HOT 9 OPEN

autumnai avatar autumnai commented on August 27, 2024
What's the best way to get involved?

from leaf.

Comments (9)

hobofan avatar hobofan commented on August 27, 2024

Sorry for the long wait, I totally forgot to reply here 😓 .

Generally for any project I would say the best way to get involved highly depends on what your personal reasons for the involvment of a project are and also what your strengths are (/ or weaknesses you want to work on).

A few things where we'd really appreciate new contributors getting involved:

  • Building some examples that are more sophisticated than MNIST. There is a branch of leaf-examples that works against the current master branch (https://github.com/autumnai/leaf-examples/tree/feat/leaf-0.2.0). Trying out more different models should highlight what is missing and would be interesting to have build into leaf (though we already have a pile of things to tackle).
  • Feature completeness in terms of available layers or similar. #18, #19, #20, #12. Implementing some of the layers might require adding functionality to collenchyma-nn or collenchyma-blas. We are happy to mentor any work going towards this.
  • Improving documentation of layers. Currently there are a lot of resources scattered throughout the internet/research papers on what computations a layer is doing and how to use it. What we would like to get to in the near term is a documentation for each of the layers that explains: What is this layer computing (both mathematically and also in a more accessible/applied way) and when would I use it (and in combination with which other layers). I really like the explanations at https://cs231n.github.io/convolutional-networks/

from leaf.

johnnyman727 avatar johnnyman727 commented on August 27, 2024

Thanks @hobofan. I'm a novice at this point so I'll start with your first point. Once I have a solid understanding of how the existing examples work, I'll try to build some that are more interesting. If that goes well, I'll look into documentation.

As a meta-note, I think it would be really useful if you had a Github label like "beginner-friendly" for your issues so I could easily check in on what unassigned issues are ripe for the taking.

from leaf.

DavidYKay avatar DavidYKay commented on August 27, 2024

@johnnyman727 thanks for starting this discussion! I'm in a very similar situation to you and curious to start helping out and digging in.

My background: I'm primarily a mobile app developer, focused on the medical industry. Outside of native apps, I have experience in Clojure and Rust. I'm also a contributor to the React Native documentation. My interest in Leaf is that I feel like I've been spending my career on throwaway projects that aren't maximizing the use of Moore's Law. I think that convolutional neural nets / deep learning are a much better way of giving back. And why use C++ when you can use Rust? :)

Of the assignments, I'm excited to start implementing new layers, but I'm not sure if I'm ready for that. Thus, I think it'd be most useful for me to work on the documentation, at least initially, as this will help my understanding of what each layer is doing, and then progress to working on layers once my understanding is greater.

@hobofan, does this sound like a reasonable approach? Let me know if you have any time to chat about this. I'd love your input. Would love to see if you think I'm ready to take on a simple layer project or if I should cut my teeth on docs.

Thanks!

from leaf.

hobofan avatar hobofan commented on August 27, 2024

@DavidYKay Yes, that sounds great!

I think any of the Layer issues should be approachable by a newcommer to the project (most of them have a similar Layer to draw inspiration from). Feel free to hit me up on our Gitter anytime. :)

from leaf.

byronyi avatar byronyi commented on August 27, 2024

Hi @hobofan I am interested in implementing a distributed runtime for Leaf. Do we have any plan on multi-node implementations for the development team, currently?

from leaf.

hobofan avatar hobofan commented on August 27, 2024

@byronyi Nothing concrete yet, but we would like to handle it one abstraction-layer higher than Leaf, with only minimal changes required in Leaf itself. What did you have in mind for multi-node? Parameter server + workers?

Since any kind of distribution will require sending serialized parts of the networks over the network that would probably depend on the serialization being implemented, which I am currently working on (#14/#15). See my next comment.

from leaf.

byronyi avatar byronyi commented on August 27, 2024

Yes, parameter server should be a good candidate. I have just finished my work in a project in which we implemented parameter server on Hadoop using Java, so I know a little bit of the architecture.

I am not sure though, because the problem we solved was for regular machine learning algorithms (GLM, matrix factorization, or LDA) instead of deep learning, e.g. ConvNet, which should requires more frequent global synchronization. Classic message passing pattern like MPI with All Reduce might still be a reasonable choice.

What do you think?

from leaf.

hobofan avatar hobofan commented on August 27, 2024

The serialization will probably end up being based on capnproto so capnproto-rpc might be a natural choice. I personally have only little (unpleasant) experience with MPI so I might be biased against it, but from what I gather it is mainly used in scientific fields, and I am not sure it fits in that well with Leaf.

One of the main problems with DNN parameter servers is that the weight updates are usually quite huge and thus synchronization can already with a few nodes become quite slow. There are a few ways to reduce the load, like introducing a threshold for weight updates to become relevant for synchronization and transfer weight updates as f16 instead of f32. With that in mind I take back my previous statement with regards to serialization, since the data you want to send for weight updates is likely very different from the one you want to serialize.

from leaf.

byronyi avatar byronyi commented on August 27, 2024

I don't really have much opinion on the serialization part; I had experience with protocol buffer (with home brewed RPC when gRPC was not released), and I think cap'nproto(-rpc) would just work fine.

I do agree that special care is needed when sending weight updates, and it would be better if we could make it flexible enough so people can experiment with different compression/filtering techniques, as this might slow down or even screw up model convergence.

Regarding to your thought on MPI, I am a little curious on the goal of Leaf. Maybe I am wrong, but I think other projects on your benchmark page (Caffe, Torch, TensorFlow) still have most their users working in a research area related to deep learning. Yahoo announced a hybrid project where they initialize Caffe inside Spark executor and synchronize the model using MPI style communication with RDMA. What specific aspect of such communication style you think might not fit well with Leaf?

It might make a big difference, if Leaf is not designed to share some of the fundamental characteristics of MPI, e.g. lack of fault tolerance.

from leaf.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.