Coder Social home page Coder Social logo

probml-notebooks's People

Contributors

always-newbie161 avatar arpitvaghela avatar benlau6 avatar codeboy5 avatar gerdm avatar gileshd avatar karalleyna avatar karm-patel avatar mjsml avatar murphyk avatar nalzok avatar neoanarika avatar nsanghi avatar patel-zeel avatar petergchang avatar posgnu avatar susnato avatar tendulkarlabs avatar umarj avatar vz415 avatar yuanx749 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

probml-notebooks's Issues

Dead Colab links

For example, the Colab link in the front matter of notebooks-d2l/attention_jax.ipynb points to the following URL, which gives a Notebook not found error

https://colab.research.google.com/github/probml/pyprobml/blob/master/notebooks/attention_jax.ipynb

The correct URL should be

https://colab.research.google.com/github/probml/probml-notebooks/blob/main/notebooks-d2l/attention_jax.ipynb

Before I update the URLs, we need to make sure the notebooks are in the correct directory hierarchy, since the URL depends on the notebook's path relative to the repository root. Most notably, the notebooks and notebooks-text-format directories contain hundreds of notebooks/scripts, and it's preferable to have them in several subdirectories for better organization.

Replace `jax.tree_multimap` with `jax.tree_map`

According to google/jax#10126, the function jax.tree_util.tree_multimap has been deprecated in favour of jax.tree_util.tree_map. Users will see a deprecation warning

xxx/.venv/lib/python3.10/site-packages/jax/_src/tree_util.py:189: FutureWarning: jax.tree_util.tree_multimap() is deprecated. Please use jax.tree_util.tree_map() instead as a drop-in replacement.
  warnings.warn('jax.tree_util.tree_multimap() is deprecated. Please use jax.tree_util.tree_map() '

Scripts and notebooks to be affected are

[GSoC 2022 Proposal] Reworking the Rough Edges

Whereas applicants are typically excited about contributing more code through GSoC, my proposal is about addressing some of the existing technical debt. I am posting it here to get some feedback from the community.

Background

The probml/probml-notebooks repository has witnessed a lot of contributions, presumably because one of the prerequisites of applying to the JAX ML Textbook project is to commit at least one pull request. However, crowd-sourcing can result in notebooks of varying character and inconsistent style, and I argue that there is still a lot of work to be done to ensure the quality of these notebooks.

The Rough Edges

Unclear objective

I am not sure how the notebooks are supposed to be used. Typically notebooks are self-contained pieces with sufficient narration to walk the reader through the code, but the D2L notebooks barely contain any commentary. If we don't want to follow the path of literal programming, then why not use plain .py files which are much easier imported from? In addition, I suppose they are meant as companion notebooks for a textbook, but it is not clear to me which part of the narration should go into the book, and which should go into the notebooks.

JAX with a PyTorch accent

I believe most recent contributions come from probml/pyprobml#686, which asks us to translate a lot of PyTorch notebooks to JAX. The problem is that JAX calls for a fundamentally different way of thinking (JAX: stateless, functional, JIT; PyTorch: stateful, object-oriented, backend written in C++), so our word-by-word translation is essentially "writing PyTorch in JAX". One can easily tell the notebooks are far from idiomatic by comparing them against the official examples. This is probably not a big issue for sophisticated programmers, but we should teach beginners how to write pure JAX, as opposed to JAX with a PyTorch accent.

Inconsistency

Inconsistency can manifest itself in many ways. First, the notebooks originally written for "Dive Into Deep Learning" may not suit the need for our textbook well, and using them without change introduces consistency issues between the textbook and the notebooks. Secondly, different contributors also have their own preferred style, which is often different from each other, and readers might be distracted by the different approaches to address the same problem. For instance, all notebooks should ideally use a similar training loop to lower the cognitive load.

Moreover, almost all notebooks contain identical data loading, prepossessing, and plotting code, and the JAX training loop in the notebooks also shares more duplicate code than necessary. These components could divert readers from the key points unless we extract them to separate scripts and import from them. Inconsistency across notebooks may raise if we are not careful in the process.

Minor issues

A number of notebooks contain leftovers from their PyTorch counterpart. Most notably, almost all notebooks contain necessary import statements. As another example, the Reshape class in notebooks-d2l/lenet_jax.ipynb becomes redundant when the code is translated to JAX. Most cross-reference URLs in the notebooks are also broken and require attention. Finally, this repository is not extremely well organized, with all notebooks living in a few flat namespaces. Switching to a finer directory structure would make things tidier and easier to navigate.

Proposal

I believe the best way to solve this problem is to rewrite all D2L notebooks from scratch, because as I stated above, there are so many issues to be fixed that fixing the existing ones won't be easier. That being said, the existing notebooks can be kept aside and used as a reference. An example of the notebooks I intend to deliever is the multi_gpu_training_jax.ipynb notebook I contributed a few days ago. Note that it barely shares anything with the original D2L notebook.

Reworking the notebooks is a lot of work, and the contributors are best restricted to a small, highly-coordinated group, or preferably an individual, to ensure consistency in style and programming paradigm. I am not sure if @murphyk has sufficient time and energy to embrace the grind, but I am willing to take it up. Along the path, I will consult the mentors for their preferences and create a style guide for future contributors (e.g. use of jax.xmap). Additionally, I will keep the need for our textbook in mind and make some adjustments accordingly during the renovation. For example, I could remove some nonessential code to make the notebooks more focused, add some type annotations and assert to make the code more concrete, or write more commentary to make them more self-contained.

Programs must be written for people to read, and only incidentally for machines to execute.
-- Harold Abelson

This is especially true for reference implementations and demos to appear in a textbook. I will do my best to produce correct, clear, and consistent notebooks for pedagogical use.

Questions

I have a few questions when preparing my proposal

  1. First of all, are you interested in this project?
  2. Apart from D2L, which part of the repository do you want me to rework first? (I'm a graduate student in Statistics from UChicago, which means I'm pretty comfortable with both coding and math!)
  3. How are you going to use these notebooks? I need to know your goal to define the milestones and deliverables.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.