Coder Social home page Coder Social logo

stanfordjournalism / data-journalism-notebooks Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 0.0 7.54 MB

Data journalism notebooks for Stanford Graduate Journalism Program

Home Page: https://stanfordjournalism.github.io/data-journalism-notebooks/

Jupyter Notebook 99.28% Python 0.21% HTML 0.51%
datajournalism jupyter python

data-journalism-notebooks's People

Contributors

zstumgoren avatar

Stargazers

 avatar

Watchers

 avatar  avatar

data-journalism-notebooks's Issues

Create screencasts for Codespaces, assignment submissions, workflow and getting help

Create screencasts about each step in the workflow and submission process:

  • Accepting an assignment, opening a Codespace
  • Finding your GitHub Classroom assignment repos
    • Repositories tab vs. Settings -> Repositories (where shared repos can be found)
  • Opening and shutting down your Codespaces
  • Submitting your work
    • Stage/Commit/Sync
    • Copy URL from repo to Canvas
  • Tinker, Copy, Run, Repeat for Python workflow
    • First, set up your environment
      • Open Codespaces
      • Make sure the terminal is open
      • Add a new file. Now we're ready to code
    • IMPORTANT: Work incrementally!
    • Start by tinkering in interactive shell or JupyterLite
    • Transfer working code to module.py and run it on the CLI
    • Rinse and repeat this process
    • HOWEVER, as you get deeper into the script or begin writing "blocks" of code that span multiple lines, the interactive shell (esp. on CLI), might grow cumbersome. At that point, you might want to switch to solely working in the script (or JupyterLite).
    • Coding gotchas
      • Be careful not to copy over dots or chevrons from the Python interpreter into your script.
      • Code you run on bash is NOT magically available in the Python interpreter

Create data analysis vignettes

These should mirror and expand upon the skills covered in 1st Jupyter NB, and fill in missing gaps such as reshaping data, stacking data, etc.

  • Get/read data
    • Massaging data on import
  • Filter
  • Sort
  • Transform (data cleaning, adding new columns)
    • include use of apply with built-in, user-defined and lambda functions
    • Use "type" with apply to debug column value issues
    • Dealing with nulls
  • Compute/Calculate
  • Group
  • Join/Merge
  • Visualize

Update Python Overview

Update python_overview.ipynb as detailed below:

Jupyter Notebooks

Flesh out in more detail to mention:

  • typically runs on your machine only
  • Or there are hosted Jupyter Notebooks such as Google Colab, Kaggle, etc where third parties run the Jupyter Notebook or Jupyter Lab software for you. These can be very convenient, providing a nice combination of zero overhead with the ability to do real work, in some cases including features such as real-time collaboration. However, these environments also have limitations (e.g. the amount of data you can process or analyze) as well as their own non-standard workflows.

Code editors in the Cloud

TK - GitHub CodeSpaces

Python in the Browser

Explain how WebAssembly is now making it possible to run Python itself directly in your browser. This means that you don't need to install the software, nor do you need to use a third party such as Google to host the notebook for you.

It's a super-convenient way to learn without having to slog through the process of setting up your own local installation of Python, Jupyter and related libraries.

However, there are drawbacks. These installations are not intended for handling large quantities of data, and there are limitations and friction points when it comes to saving work and normal day-to-day usages of Python, such as idiosyncratic workflows for the very common case of obtaining files from other websites, e.g. when scraping a government agency for data or documents.

So what Python environment should I use?

In our opinion, there's a time and a place for each of these different coding contexts.

JupyterLite -- ie Python in your Browser -- is a great way to start ramping up immediately. It's so handy that the First Python Notebook is actually a JupyterLite instance that requires no installation of Python or related libraries for you to get started.

But when you're working on projects, we prefer other options. A plain old code editor is handy for whipping up Python scripts or multi-step pipelines which need to run on a regular schedule on a virtual machine in the cloud. These types of machines typically have no graphical interface, and while you can run Jupyter Notebooks as scripts in a shell, it's far more common and convenient to use plain old Python scripts.

For data analysis, we of course recommend Jupyter Notebooks/Lab, either running in your browser or using a third party provider such as Google Colab.

When starting out, it can be tempting to choose convenience (e.g. Google Colab) over learning the slightly harder but more standard way of doing things. In this course, we'll take the latter route, primarily because we want you to learn standard workflows that most teams in the news use, and many of the tutorials and blog posts assume out on the wider Internet. That said, we're very excited about CodeSpaces, which combine standard workflows with a zero setup environment based entirely in the cloud. While it too has limitations in terms of pricing and resources, it's a convenient way to get up and running on real work, using standard practices.

Last but not least, even the humble Python interpreter in your shell can be handy for quickly testing out code snippets and exploring a library, without the overhead of having to install and run a Jupyter Notebook.

Screencast: Finding and moving files from Bash in Codespaces

Folks are accidentally creating files in hidden config directories rather than the project root using Codespaces.

Create a screencast that shows:

  • How to create a new file (ensuring you're in the proper directory)
  • Drag-and-drop method to move file from Codespace GUI (if possible), along with the Git workflow to deal with moving files without using git mv
  • Bare minimum Bash commands to locate and copy/move file (pwd, ls, and cp should be enough)

Rework Demystifying Dot Notation

  • Move the Number, FancyNumber and FanciestNumber examples to the end of the module, since these illustrate the more advanced use case of calling one or more methods on the same class in a chain.
  • For the section on method chaining on the same type of object or class, find or craft a pandas example that calls multiple methods on DataFrame itself without returning a different data type
  • As a first example on method chaining, use something basic such as: " one two three ".strip().upper().split()

Update README on dev workflow

  • Creating or converting Markdown to Myst
  • Using jupytext to generate notebooks from Myst Markdown files
  • Build and test locally with JupyterLite
  • Deploy by committing to main and pushing (which triggers GH deployment action)

Integrate GitHub CodeSpaces into classroom workflow

  • Refresh on CodeSpaces (again)
    • Cost for students?
    • Limitations
      • 30-60 hours per month are free (for 2 or 4 cores) and .18 or .36 per hour after that. Additional cores available for fewer free hours and higher hourly prices.
      • 15 GB storage, .07 / month for each additional GB
  • Figure out how to integrate with assignment submissions for course
  • Decide if we'll use CodeSpaces exclusively or additional option (especially for Windows folks)

Art of Functions

  • Convert to Myst Markdown
  • Add overview of functions
  • Functions can take positional args
  • Functions can take kwargs
  • Functions can return things. In fact they always return something, even if it's nothing.
  • Update or delete link to "chopping up large problems"
  • Add preliminary exercises for creating and getting comfortable with functions (before the final exercise)
  • Replace the last Exercise with something that does not require the use of external calls. requests simply doesn't work and pyodide's pyfetch can't call sites other than the origin due to CORS restrictions. Basically, the only option is to use a file that emanates from the same domain or is packaged along with the JupyterLite build

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.