Coder Social home page Coder Social logo

Adding support for hvd LaidOutVariable about mesh HOT 7 OPEN

EiffL avatar EiffL commented on September 24, 2024
Adding support for hvd LaidOutVariable

from mesh.

Comments (7)

EiffL avatar EiffL commented on September 24, 2024 2

ok found a couple of problems, the update seems to work now, will create a proper branch ^^

from mesh.

tobias-liaudat avatar tobias-liaudat commented on September 24, 2024 1

Hi @EiffL, I'm leaving this comment as a checkpoint of when we finished the IDRIS hackaton.

With @b-remy we were trying to solve this issue but we hadn't had the time to finish. I uploaded my progress in the branch tob_vars and Benjamin on his branch ben-variable.

With our current implementation we were able to run a forward pass of a simple dense network. However, when we tried to optimise the network we were having an error and we're not sure of the origin and how to solve it.

The error we found was this one, and is always related with the Assign and the optimisation.

Traceback (most recent call last):
  File "optim_demo.py", line 174, in <module>
    (type(obj).__name__, types_str))
TypeError: Can not convert a Assign into a Tensor or Operation.
TypeError: Fetch argument <mesh_tensorflow.ops.Assign object at 0x14d1fcb94c50> has invalid type <class 'mesh_tensorflow.ops.Assign'>, must be a string or Tensor. (Can not convert a Assign into a Tensor or Operation.)

To reproduce the error one can run this test script with this job script. The implementation of the LaidOutVariable is here.

from mesh.

EiffL avatar EiffL commented on September 24, 2024 1

Thanks so much @tobias-liaudat will take a look!

from mesh.

EiffL avatar EiffL commented on September 24, 2024 1

oookkkkk so I tried something here:
https://github.com/DifferentiableUniverseInitiative/mesh/tree/u/EiffL/toy_model

with this script https://github.com/DifferentiableUniverseInitiative/mesh/blob/u/EiffL/toy_model/examples/toy_model_gpu.sh

It runs apparently, can save and restore, but not clear if it's actually training ^^" the loss function doesnt go down much

from mesh.

EiffL avatar EiffL commented on September 24, 2024

The goal here would be to be able to train the MNIST demo model with the new backend implemenation https://github.com/DifferentiableUniverseInitiative/mesh/blob/master/examples/mnist.py

from mesh.

EiffL avatar EiffL commented on September 24, 2024

I've opened branch variables starting from tob_vars, and cleaned it up a bit.
Problems where:

  • lowering needs to happen after computing the mesh gradients and stuff, otheriwse the update ops are not registered in the lowering
  • small problem in the assignement of slices

from mesh.

EiffL avatar EiffL commented on September 24, 2024

probably, we can try to first train the mnist model with commenting out the restore and save parts

from mesh.

Related Issues (5)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.