The data-journalism-notebooks from stanfordjournalism

data-journalism-notebooks's Issues

Create screencasts for Codespaces, assignment submissions, workflow and getting help

Create screencasts about each step in the workflow and submission process:

Accepting an assignment, opening a Codespace
Finding your GitHub Classroom assignment repos
- Repositories tab vs. Settings -> Repositories (where shared repos can be found)
Opening and shutting down your Codespaces
Submitting your work
- Stage/Commit/Sync
- Copy URL from repo to Canvas
Tinker, Copy, Run, Repeat for Python workflow
- First, set up your environment
  - Open Codespaces
  - Make sure the terminal is open
  - Add a new file. Now we're ready to code
- IMPORTANT: Work incrementally!
- Start by tinkering in interactive shell or JupyterLite
- Transfer working code to module.py and run it on the CLI
- Rinse and repeat this process
- HOWEVER, as you get deeper into the script or begin writing "blocks" of code that span multiple lines, the interactive shell (esp. on CLI), might grow cumbersome. At that point, you might want to switch to solely working in the script (or JupyterLite).
- Coding gotchas
  - Be careful not to copy over dots or chevrons from the Python interpreter into your script.
  - Code you run on bash is NOT magically available in the Python interpreter

Update broken links on padj23-code README/TOC

Links on Python TOC should point to JupyterLite.

Downloading files tutorial should Open in Codespaces

Create data analysis vignettes

These should mirror and expand upon the skills covered in 1st Jupyter NB, and fill in missing gaps such as reshaping data, stacking data, etc.

Get/read data
- Massaging data on import
Filter
Sort
Transform (data cleaning, adding new columns)
- include use of apply with built-in, user-defined and lambda functions
- Use "type" with apply to debug column value issues
- Dealing with nulls
Compute/Calculate
Group
Join/Merge
Visualize

Tutorial/screencast - debugging tools and techniques

Understanding stack traces
print statements
Tools to view local vars (e.g. Jupyter and GH Codespaces)
debugger (CLI, Jupyter and VS Code scripts)

Add list indexing section to Python Syntax Crash Course

Update Syntax Crash Course to include a brief section on accessing items in lists using position/index.

Create custom Codespaces environment that supports Jupyter Notebooks

Update Python Overview

Update python_overview.ipynb as detailed below:

Jupyter Notebooks

Flesh out in more detail to mention:

typically runs on your machine only
Or there are hosted Jupyter Notebooks such as Google Colab, Kaggle, etc where third parties run the Jupyter Notebook or Jupyter Lab software for you. These can be very convenient, providing a nice combination of zero overhead with the ability to do real work, in some cases including features such as real-time collaboration. However, these environments also have limitations (e.g. the amount of data you can process or analyze) as well as their own non-standard workflows.

Code editors in the Cloud

TK - GitHub CodeSpaces

Python in the Browser

Explain how WebAssembly is now making it possible to run Python itself directly in your browser. This means that you don't need to install the software, nor do you need to use a third party such as Google to host the notebook for you.

It's a super-convenient way to learn without having to slog through the process of setting up your own local installation of Python, Jupyter and related libraries.

However, there are drawbacks. These installations are not intended for handling large quantities of data, and there are limitations and friction points when it comes to saving work and normal day-to-day usages of Python, such as idiosyncratic workflows for the very common case of obtaining files from other websites, e.g. when scraping a government agency for data or documents.

So what Python environment should I use?

In our opinion, there's a time and a place for each of these different coding contexts.

JupyterLite -- ie Python in your Browser -- is a great way to start ramping up immediately. It's so handy that the First Python Notebook is actually a JupyterLite instance that requires no installation of Python or related libraries for you to get started.

But when you're working on projects, we prefer other options. A plain old code editor is handy for whipping up Python scripts or multi-step pipelines which need to run on a regular schedule on a virtual machine in the cloud. These types of machines typically have no graphical interface, and while you can run Jupyter Notebooks as scripts in a shell, it's far more common and convenient to use plain old Python scripts.

For data analysis, we of course recommend Jupyter Notebooks/Lab, either running in your browser or using a third party provider such as Google Colab.

When starting out, it can be tempting to choose convenience (e.g. Google Colab) over learning the slightly harder but more standard way of doing things. In this course, we'll take the latter route, primarily because we want you to learn standard workflows that most teams in the news use, and many of the tutorials and blog posts assume out on the wider Internet. That said, we're very excited about CodeSpaces, which combine standard workflows with a zero setup environment based entirely in the cloud. While it too has limitations in terms of pricing and resources, it's a convenient way to get up and running on real work, using standard practices.

Last but not least, even the humble Python interpreter in your shell can be handy for quickly testing out code snippets and exploring a library, without the overhead of having to install and run a Jupyter Notebook.

Screencast: Finding and moving files from Bash in Codespaces

Folks are accidentally creating files in hidden config directories rather than the project root using Codespaces.

Create a screencast that shows:

How to create a new file (ensuring you're in the proper directory)
Drag-and-drop method to move file from Codespace GUI (if possible), along with the Git workflow to deal with moving files without using git mv
Bare minimum Bash commands to locate and copy/move file (pwd, ls, and cp should be enough)

Enable Real-Time notebook collaboration

https://jupyterlite.readthedocs.io/en/latest/howto/configure/rtc.html

Possible to blend Jupyter Book with JupyterLite Notebooks?

Can we update the build process so that GH Pages serves both JupyterLite notebooks and JupyterBook style Markdown files?

Rework Demystifying Dot Notation

Move the Number, FancyNumber and FanciestNumber examples to the end of the module, since these illustrate the more advanced use case of calling one or more methods on the same class in a chain.
For the section on method chaining on the same type of object or class, find or craft a pandas example that calls multiple methods on DataFrame itself without returning a different data type
As a first example on method chaining, use something basic such as: " one two three ".strip().upper().split()

Create JupyterLite ChatGPT Extension

First, do some googling to see if one exists.

If not, write an extension (likely need a Server Side component since it requires using a secret API key for ChatGPT):

Update README on dev workflow

Creating or converting Markdown to Myst
Using jupytext to generate notebooks from Myst Markdown files
Build and test locally with JupyterLite
Deploy by committing to main and pushing (which triggers GH deployment action)

Integrate GitHub CodeSpaces into classroom workflow

Refresh on CodeSpaces (again)
- Cost for students?
- Limitations
  - 30-60 hours per month are free (for 2 or 4 cores) and .18 or .36 per hour after that. Additional cores available for fewer free hours and higher hourly prices.
  - 15 GB storage, .07 / month for each additional GB
Figure out how to integrate with assignment submissions for course
Decide if we'll use CodeSpaces exclusively or additional option (especially for Windows folks)

Add First Jupyter NB link to bottom of README.md.
Rework to include section headers on Python basics and Data Analysis

Art of Functions

Convert to Myst Markdown
Add overview of functions
Functions can take positional args
Functions can take kwargs
Functions can return things. In fact they always return something, even if it's nothing.
Update or delete link to "chopping up large problems"
Add preliminary exercises for creating and getting comfortable with functions (before the final exercise)
Replace the last Exercise with something that does not require the use of external calls. requests simply doesn't work and pyodide's pyfetch can't call sites other than the origin due to CORS restrictions. Basically, the only option is to use a file that emanates from the same domain or is packaged along with the JupyterLite build

stanfordjournalism / data-journalism-notebooks Goto Github PK

data-journalism-notebooks's People

Contributors

Stargazers

Watchers

data-journalism-notebooks's Issues

Jupyter Notebooks

Code editors in the Cloud

Python in the Browser

So what Python environment should I use?

Recommend Projects

Recommend Topics

Recommend Org