Coder Social home page Coder Social logo

numpy / numpy-tutorials Goto Github PK

View Code? Open in Web Editor NEW
450.0 37.0 169.0 137.43 MB

NumPy tutorials & educational content in notebook format

Home Page: https://numpy.org/numpy-tutorials/

License: Other

Makefile 7.27% Python 84.56% Batchfile 7.06% HTML 1.11%
tutorials numpy

numpy-tutorials's Introduction

NumPy tutorials

For the rendered tutorials, see https://numpy.org/numpy-tutorials/.

The goal of this repository is to provide high-quality resources by the NumPy project, both for self-learning and for teaching classes with. If you're interested in adding your own content, check the Contributing section. This set of tutorials and educational materials is not a part of the NumPy source tree.

To download a local copy of the .ipynb files, you can either clone this repository or navigate to any of the documents listed below and download it individually.

Content

  1. Learn to write a NumPy tutorial: our style guide for writing tutorials.
  2. Tutorial: Linear algebra on n-dimensional arrays
  3. Tutorial: Determining Moore's Law with real data in NumPy
  4. Tutorial: Saving and sharing your NumPy arrays
  5. Tutorial: NumPy deep learning on MNIST from scratch
  6. Tutorial: X-ray image processing
  7. Tutorial: NumPy deep reinforcement learning with Pong from pixels
  8. Tutorial: Masked Arrays
  9. Tutorial: Static Equilibrium
  10. Tutorial: Plotting Fractals
  11. Tutorial: NumPy natural language processing from scratch with a focus on ethics
  12. Tutorial: Analysing the impact of the lockdown on air quality in Delhi, India

Contributing

We very much welcome contributions! If you have an idea or proposal for a new tutorial, please open an issue with an outline.

Don’t worry if English is not your first language, or if you can only come up with a rough draft. Open source is a community effort. Do your best – we’ll help fix issues.

Images and real-life data make text more engaging and powerful, but be sure what you use is appropriately licensed and available. Here again, even a rough idea for artwork can be polished by others.

The NumPy tutorials are a curated collection of MyST-NB notebooks. These notebooks are used to produce static websites and can be opened as notebooks in Jupyter using Jupytext.

Note: You should use CommonMark markdown cells. Jupyter only renders CommonMark.

Why Jupyter Notebooks?

The choice of Jupyter Notebook in this repo instead of the usual format (reStructuredText, through Sphinx) used in the main NumPy documentation has two reasons:

  • Jupyter notebooks are a common format for communicating scientific information.
  • Jupyter notebooks can be launched in Binder, so that users can interact with tutorials
  • rST may present a barrier for some people who might otherwise be very interested in contributing tutorial material.

Note

You may notice our content is in markdown format (.md files). We review and host notebooks in the MyST-NB format. We accept both Jupyter notebooks (.ipynb) and MyST-NB notebooks (.md). If you want to sync your .ipynb to your .md file follow the pairing tutorial.

Adding your own tutorials

If you have your own tutorial in the form of a Jupyter notebook (a .ipynb file) and you'd like to add it to the repository, follow the steps below.

Create an issue

Go to https://github.com/numpy/numpy-tutorials/issues and create a new issue with your proposal. Give as much detail as you can about what kind of content you would like to write (tutorial, how-to) and what you plan to cover. We will try to respond as quickly as possible with comments, if applicable.

Check out our suggested template

You can use our Tutorial Style Guide to make your content consistent with our existing tutorials.

Upload your content

    Fork this repository (if you haven't before).
    In your own fork, create a new branch for your content.
    Add your notebook to the content/ directory.

    Update the environment.yml file with the dependencies for your tutorial (only if you add new dependencies).

    Update this README.md to include your new entry.

    Update the attribution section (below) to credit the original tutorial author, if applicable.

    Create a pull request. Make sure the "Allow edits and access to secrets by maintainers" option is selected so we can properly review your submission.

    🎉 Wait for review!

For more information about GitHub and its workflow, you can see this document.

Building the Sphinx site locally

Building the tutorials website, which is published at https://github.com/numpy/numpy-tutorials, locally isn't necessary before making a contribution, but may be helpful:

conda env create -f environment.yml
conda activate numpy-tutorials
cd site
make html

Translations

While we don't have the capacity to translate and maintain translated versions of these tutorials, you are free to use and translate them to other languages.

Useful links and resources

The following links may be useful:

Note that regular documentation issues for NumPy can be found in the main NumPy repository (see the Documentation labels there).

numpy-tutorials's People

Contributors

8bitmp3 avatar bjnath avatar bsipocz avatar carreau avatar cooperrc avatar dbhasin1 avatar dependabot[bot] avatar dmcmurchy avatar h2o-ds avatar isabela-pf avatar jsdodge avatar mattip avatar maxrjones avatar melissawm avatar michaelripa avatar mukulikaa avatar octopusinvitro avatar oriolabril avatar partev avatar peytondmurray avatar pritesh-shrivastava avatar rgommers avatar rifatrakib avatar rossbar avatar rowanc1 avatar saulshanabrook avatar stefanv avatar willhoh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

numpy-tutorials's Issues

Make sure the `random.Generator` interface is used in all tutorials

The tutorials should use the recommended random.Generator interface for random number generation. A quick grep shows that there are some instances of the older usage in the following files:

  • tutorial-nlp-from-scratch.md
  • text_preprocessing.py

It would be a nice improvement to replace these instances with the recommended random interface.

Set up gitpod

After the final format for the repo is decided and merged.

Consider adding generated content to .gitignore

So maintainers can add/update them with --force, but they won't show up as modified files for random contributors:

	modified:   content/mooreslaw_regression.csv
	modified:   content/mooreslaw_regression.npz
	modified:   content/tutorial-x-ray-image-processing/xray_image.gif

content/video/ also shows up as untracked files, probably should be added fully to .gitignore?

Fail CI for warnings

Ideally we should handle warnings raised during the tutorials, and fail CI with anything that is unexpected.

Storing data sets used in the tutorials

Document in README.md what options were considered for storing data sets used in the NumPy tutorials and why GitHub was selected as the optimal (interim) solution.

atari-py ROM configuration

The latest atari-py wheels break the pong tutorial with an exception about needing to install/configure appropriate ROMS. This is a known issue: open-ai/atari-py#79. Proposed workaround is to use an older version of atari-py==0.2.5 as suggested here, but this requires local building of the wheels, which is at the core of #69.

In the long run, it would be great if we could switch to using the newer atari-py wheels and figure out the minimal ROM configuration to get the example working sustainably again.

BUG: Use newer than 3.7 python for binder

Launching the tutorials in binder, it launches with python 3.7 and therefore numpy 1.21.

We should make sure that we in fact provide a more up-to-date version when demoing for tutorial users (and that we in fact CI the versions that are deployed with e.g. binder)

Repo name

The docs team had a discussion about this and figured that we might want to change the repo name so it can also contain how tos, instead of just tutorials. Suggestions included

  • numpy-cookbook: will this clash with SciPy Cookbook?
  • numpy-recipes: this might be an alternative that avoids confusion with the existing SciPy Cookbook project.

cc @rossbar @bjnath

Reconfigure binder

After the work of #36 is finished, we should re-add a binder button to the README so people can see live versions of the tutorials.

Proposal: Static Equilibrium with NumPy

I have made a NumPy tutorial with Ryan Cooper that teaches static equilibrium with NumPy matrix manipulation. This primarily involves beams and cables. The tutorial is ready to be submitted.

Issue on page /index.html

when I run my script, the error comes up, does Numpy 1.22.3 is not compatible for the previous vesion.(the script can work with numpy 1.21.5)

PS E:\Revo3.0\Py_Revo3.0\py_scripts> & E:/anaconda3/envs/myenv/python.exe e:/Revo3.0/Py_Revo3.0/py_scripts/Spotfire_IFR90_Parse_grpby.py
E:\anaconda3\envs\myenv\Lib\site-packages\numpy_init_.py:142: UserWarning: mkl-service package failed to import, therefore Intel(R) MKL initialization ensuring its correct out-of-the box operation under condition when Gnu OpenMP had already been loaded by Python process is not assured. Please install mkl-service package, see http://github.com/IntelPython/mkl-service
from . import distributor_init
Traceback (most recent call last):
File "e:\Revo3.0\Py_Revo3.0\py_scripts\Spotfire_IFR90_Parse_grpby.py", line 8, in
import pandas as pd
File "E:\anaconda3\envs\myenv\Lib\site-packages\pandas_init
.py", line 16, in
raise ImportError(
ImportError: Unable to import required dependencies:
numpy:

IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!

Importing the numpy C-extensions failed. This error can happen for
many reasons, often due to issues with your setup or how NumPy was
installed.

We have compiled some common reasons and troubleshooting tips at:

https://numpy.org/devdocs/user/troubleshooting-importerror.html

Please note and check the following:

  • The Python version is: Python3.9 from "E:\anaconda3\envs\myenv\python.exe"
  • The NumPy version is: "1.22.3"

and make sure that they are the versions you expect.
Please carefully study the documentation linked above for further help.

Original error was: DLL load failed while importing _multiarray_umath: 找不到指定的模块。

Testing gitpod

This issue is being used as a test for gitpod integration.

Proposal - Plotting Fractals

Hi NumPy Devs,

I have mostly finished writing a tutorial called "Plotting Fractals", which teaches the basics of what fractals are and how to begin creating fractal images. I know that there are many fractal tutorials out there, so I tried making mine slightly different and appropriate for NumPy in the following ways:

  • I tried explaining what fractals are through demonstration, which I think flows nicely with the tutorial style of writing.
  • I used Universal Functions and Boolean Indexing in the tutorial and touched on how they helped speed up the computations.
  • I included a large number of different fractals, including generalizations of the Mandelbrot set, Newton fractals and some "new" ones I made up.
  • I provide guidance on "finding new fractals" through experimenting with different combinations of Universal Functions and by playing with the parameters.

If this is a topic of interest, I would be happy to submit a pull request of my work once I finish (which will likely be over the next couple days).

Thanks for your time

Michael

Use tags for content info?

At the very least, I would suggest using tags such as "Beginner", "Intermediate" or "Advanced". I'm not sure those are descriptive enough, or if we should group content in some other way. Thoughts?

Framing possible contributions

In PR #11, @bjnath, @melissawm, and @cooperrc discussed two topics

  1. Having a template for the tutorials in this repo
  2. Framing the possible contributions into each document style.

Topic 1 is covered by PR #11 so I wanted to bring up topic 2 in a separate issue.

My thoughts on framing the contributions would be to provide a checklist (maybe it could be included in the PR for a tutorial or how-to). I am using a scientific document as a template because I think many users are using NumPy in scientific applications. Scientific documentation also provides a nice structure to present a problem, propose a solution, and show evidence that it works while providing room for future work.

The checklist would also serve to help review and organize the content. Here is a proposed checklist for review (and general feedback):

Introduction

  • Who is the audience
  • What will you learn
  • What is the problem
  • What will you do

Methods

  • What do you need
  • What will you use
  • Are the steps of solution are detailed enough for user
    # Results

# Discussion

Wrapping up

  • Are there any missing pieces or future work?
  • How did NumPy help solve the original problem posed?

# Conclusion

Add testing configuration for running pytest locally

#132 added notebook smoketesting with pytest-nbval to CI, but didn't add any pytest configuration nor a way to automatically handle installing testing dependencies. Handling this configuration correctly will make it easier for users to smoketest notebooks locally with pytest. #139 is related as it should also be handled via pytest config. One idea would be to use tox.ini - see #132 (comment).

Joint cookbook with other scientific Python projects

In #6 @rossbar writes:

Re: the scipy cookbook - it would be nice if we could get some clarity on the status of that project and whether it's actively maintained and, if not, whether more attention would be welcome.

That was my understanding too, from later in the call when we talked about uniting doc efforts in the scientific Python community. @melissawm had the great idea of using SciPy Cookbook as a starting point.

The first step would indeed be to sound out SciPy. We'd propose updating the content, rewriting it as notebooks, and presenting the Cookbook as a joint NumPy/SciPi resource.

Cookbooks are a natural intersection of projects, so it's likely that as other scientific Python projects learn of it they'll want to opt in, making it a community resource of growing value.

Clearly there are questions of governance, of where on the Internet it should go, and so on.

For instance, do we continue in the original Cookbook model of user-contributed content or does somebody (who?) review PRs? A new project might want to use the Cookbook to show off a use case. This is both great and not so great, because the project might not be stable or long-lived.

I'd volunteer to get things moving. But we have at least one doc meeting's worth of discussion before we go public.

Add CI job that covers notebook content

https://github.com/reviewNB/treon could work as the tool to validate notebook, it looks like it has extra features like running examples as doctests that could be useful.

The basic "run a notebook" is jupyter nbconvert --to notebook --inplace --execute somenotebook.ipynb.

I don't know what CI setups other projects use.

atari-py dependency doesn't seem maintained

The reinforcement learning tutorial depends on atari-py which doesn't seem to be actively maintained (last commit Aug 2019). The README indicates that it's in "maintenance mode" but there haven't been any active developments and the wheels are out of date. This latter necessitates that wheels be built locally, requiring cmake.

I'm not familiar with gym or atari-py so I don't know if there are any newer packages or viable alternatives. Does anyone know of any viable alternatives?

Move style guide into a separate contributors category

Identified by @rgommers in #81 (comment) - it would be an improvement to keep the pedagogical material (i.e. tutorials, how-tos, explanations, etc) separate from info about contributing, e.g. the tutorial style guide.

This would involve a minor reorganization to move the tutorial style guide out of the main toctree and reference it from a "contributing" section instead.

MNIST dataset can't be downloaded automatically

We have been getting an error in the CI for the MNIST tutorial and I just figured the reason: we are getting a 403 - Forbidden when we try to download the datasets from the website listed in the tutorial. Checking that website I got a message:

Please refrain from accessing these files from automated scripts with high frequency. Make copies!

I don't think we want to keep the dataset locally. Are there alternatives for getting this dataset online? @8bitmp3 do you have any thoughts here?

Tutorial Proposal: Implement KNN using numpy.

Proposal

This tutorial will cover a specific use case i.e Implementing KNN using NumPy.
(KNN - K Nearest Neighbours)

What will be covered in the Tutorial?

  • KNN Implementation
  • Euclidean distance implementation
  • Tackling a standard classification problem

Problem statement used for the Tutorial

Given four data points for categories Fruit, Protein and Vegetable.
Out of these given categories, Find out in which of these the fifth input, test data point lies.

Sample Attached Jupyter Notebook - KNN Implementation

Alt-text sprint 2021-08-21

This task list was started on 2021-08-21 for an alt-text code sprint. We can check off tutorials as the alt-text is added to the images.

NumPy Features

  • Linear algebra on n-dimensional arrays
  • Saving and sharing your NumPy arrays
  • Masked Arrays

NumPy Applications

  • Determining Moore’s Law with real data in NumPy
  • Deep learning on MNIST
  • Deep reinforcement learning with Pong from pixels
  • X-ray image processing
  • Determining Static Equilibrium in NumPy
    Contributing
    Why Jupyter Notebooks?

Improve omission of notebooks from nbval test collection

There are some notebooks that we don't necessarily want to test when running nbval, e.g. notebooks from the contributor guide, non-executable articles, etc. Currently this is handled manually in the CI job by simply deleting the notebooks we want to test:

# TODO: find better way to exclude notebooks from test
rm content/tutorial-deep-reinforcement-learning-with-pong-from-pixels.ipynb
rm content/pairing.ipynb
rm content/tutorial-style-guide.ipynb
rm content/tutorial-nlp-from-scratch.ipynb

This is inelegant and doesn't translate well to local workflows (i.e. users who want to run pytest-nbval on their own machines). It'd be a nice improvement to do this more cleanly - perhaps via a pytest configuration option to ignore files during collection?

The Pong tutorial: atari-py has been deprecated by Open AI

The Pong tutorial works with the currently pinned versions of gym and atari-py in the requirements.txt file.
However, atari-py (which was the Open AI fork of the ALE), was deprecated in September 2021:
https://github.com/openai/atari-py

The replacement is the original ALE, through the ale-py module:
https://github.com/mgbellemare/Arcade-Learning-Environment

In September 2021, atari-py was finally replaced with ale-py in the gym official repository:
openai/gym#2348

In the gym release notes of October 2021, it specifies the correct way to install the ROMs from now on:

# requirements.txt
-gym==0.18.0
-atari-py==0.2.5
+gym[atari,accept-rom-license]

this will update these libraries while still making the tutorial work with no further changes.

Alternatively, this can be done instead:

# requirements.txt
-gym==0.18.0
-atari-py==0.2.5
+autorom~=0.4.2
+gym[atari]

this will install AutoROM independently and then in the docs the user has to be informed to run:

AutoROM --accept-license

This will also make the tutorial run with no further changes.

Regarding cmake, there is no need to install it anymore, as it was needed for atari-py.
However, ffmpeg is still needed by gym's Monitor wrapper, as it produces videos of the agent’s learning.

I am happy to raise a PR for whichever of the two options if you want to go ahead with the update.
I didn't want to do the PR directly because I don't know if it will be a problem for the numpy tutorials to accept this license.
The commit in the numpy tutorials repo where the versions of gym and atari-py where pinned references a PR in the atari-py repository, which as mentioned before, has been deprecated.

Cheers!

P.S. Regarding this issue, it is now indeed trivial to update the library as all that needs to be done is install gym with the accept-rom-license extra, and everything will work, provided there is no issue for this tutorials to accept the license.

Proposal: Analysis of air pollution levels before and after lockdowns

Hi all,
I am planning to write a tutorial on the analysis of air pollution levels before and after lockdowns.

Aim:

Users would hopefully learn about analyzing time-series data with NumPy. It will also increase awareness about pollutants in the air we breathe and show us whether the complete shutdown of human activities in a region has a large enough impact on its air quality.

I am interested to know if this is a suitable topic to showcase and teach NumPy's functionalities!

cc: @melissawm @rossbar

Make notebooks easily executable

We agreed at the 8 June 20 doc team meeting that interactivity is a vital feature of notebooks.

To ensure this, each tutorial will include:

  1. A button to run the notebook in Binder
  2. Also a button to run the notebook in Colab. Although we'd prefer that users choose Binder, Colab runs more quickly. We'd give users a choice.
  3. A button to easily download the notebook for local execution.

Per @rossbar all three of these can be included.

Noting this here as a requirement for new notebooks.

ENH: Add author tag to notebooks

As @8bitmp3 points out here, we don't currently have anything indicating the original authorship in the notebooks. We should add this and make sure it's retroactively applied to all the existing tutorials.

CI: sphinx-build doesn't fail on notebook execution error.

It turns out that execution errors from the myst-nb extension only raise sphinx warnings, not errors, so the sphinx-build process is treated as completed successfully even if a notebook was not executed entirely successfully!

There are several ways we could handle this:

  1. Have sphinx-build treat warnings as errors with SPHINXOPTS=-W
  2. Push for a config option in myst-nb to treat execution failures as sphinx errors.
  3. Re-activate nbval (or similar) in CI

The only problem with option 1 is that it may prove to be too stringent, as the build will fail on any warning. Option 2 has already been raised upstream: executablebooks/MyST-NB#248. Reading a bit more about nbval, it seems like a nice option with more testing features than just the lax mode (which is essentially what we're trying to achieve with the sphinx build, without needing to build twice).

CI: test binder startup

At another, much smaller traffic, project I run into the issue that the binder deployment stopped working without CI noticing.
Would be nice to work out a CI job that checks on that all the deployment options (binder, colab, or anything else we enable) works as expected (that case it was due to version incompatibilities),

Add MyST integration

At the docs team meeting, we discussed merging the ideas from this repo by @rossbar. This is an improvement in that it has a simple workflow for building and hopefully makes contributions easier.

MAINT: Remove explicit sphinx pin from environment.yml

As of the time of posting, the conda environment resolution is not working properly. I suspect there is a packaging problem w/ dependency pinning on conda-forge (unproven).

This issue is a reminder to periodically remove the pin by reverting #136 in the future to test whether the problem has been fixed upstream. Once the conda CI job is green again without the pin, we can merge the reversion. See #136 for details.

Air pollution tutorial should show path to "vectorization"

The air polution tutorial has a "vectorized" function to calculte the AIQ (IIRC). This can be vectorized using searchsorted (which is a bit much work, but not too tricky).
I am also almost completely certain that it can also be replaced with a single call to np.interp1d.

Having the "vectorize" version seems good, but doesn't fully leverage the concepts that NumPy provides. I think it would be great arc to keep it, but then also show the final interp and maybe even the searchsorted idea. (I honestly don't like stopping at vectorize becuzse it makes seems that vectorize is a common approach, when I consider it more of a fallback solution – whether used a lot in practice or not.)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.