gdsbook / book Goto Github PK

This book serves as an introduction to a whole new way of thinking systematically about geographic data, using geographical analysis and computation to unlock new insights hidden within data.

Home Page: https://geographicdata.science

License: Other

Jupyter Notebook 99.96% Dockerfile 0.01% Makefile 0.01% TeX 0.03% CSS 0.01% Python 0.01%

data-science data-analysis-python geographical-information-system geographic-data spatial-analysis spatial-statistics statistics spatial-data-analysis

book's Introduction

Repository for the book "Geographic Data Science with Python"

Check the current table of contents.

Resources

Gitter room

book's People

Contributors

Stargazers

Watchers

Forkers

ljwolf darribas sjsrey ralvite mcnakhaee cuulee fototo allilou hossein-madadi tkayne23 rogercre sylvainyu gracecarrillo anhnguyendepocen verrah trutorj ibbudiarto aolifodaisy cule sindile amzedhossen jellisfw nkm-ml matthew-law kchastko dongyi1996 k20shores dunaymurgeo49 vojavocni ecv19 angwar26 joselinceron kianoco huangzq681 memo1986 mszell skatiyar97 allthingurban dynaryu cmg777 kaixuandai matttriano depocen urschrei eluisluzquadros wukkkinz-0725 marquisvictor salami-mary karthy257 alwynbrand josiahparry cyrus723 hayesjohnd geographiliac reveurmichael blackender gaybro8777 farahmehboob yemnaing muskuloes hyperchriskibasi shaanbarca marconasuto ahmedsalama96 daniel-rolen phuntshobhutan mihirt-10 stephen137 iblahlou jdavidortega edzer wellalb mtzun10 nwut fducau cimadure jstraughter zia-foisal ilimikato profsergiocosta ar-puuk dancejod tristannew geneidy roger120981 timbindingpcc

book's Issues

Decide license

This will need to be decided before we go public, just making a collective mental note.

Gitter room link seems to be down

Great to see this book announced, looking forward to a much needed resource on #geopy! FYI tried clicking on the Gitter room: https://gitter.im/sjsrey/gds_pysal - may just be me but got a '404' type message. Best of luck with it.

Explore adding CI for chapters

convention to refer to earlier chapters' methods/techniques

In local_autocorrelation.ipynb, there is an inline question:

Should we adopt some scheme so as to refer to earlier chapters, as in the case of maps and weights here to improve the flow of the book and reduce any repetition of basic tasks?

I say yes, we should always refer to a previous chapter if we talked about a topic before.

Ch. 1 - need to specify tag in docker `pull`

In the Run the book's container section, I would like to suggest that you change the following docker command:

docker pull gdsbook/stack

to:

docker pull gdsbook/stack:3.0

When I run the command that is currently in the book in Windows PowerShell, I receive the following error:

Error response from daemon: manifest for gdsbook/stack:latest not found: manifest unknown: manifest unknown

Per this thread on the Docker forums, I understand that running the command without the 3.0 tag defaults to pulling the image with tag:latest, but the "latest" tag doesn't exist, so I receive that error.

Thanks very much for all your work on this wonderful book!

CHAPTER: Global Spatial Autocorrelation

Thread to discuss chapter on global autocorrelation. Picks up from GDS#4

edits for clustering and regionalization chapters

Question 6, @darribas writes:

We could recast this question to make it more practical, along the lines @ljwolf does in other chapters. For example, on this one, it could be:

Re-run the analysis in the chapter w/ a different set of weights.

Compare the resulting clusters visually

What are the key differences between the two W's?

How do you think such differences affect the final result?

Question 7, @darribas writes

This one I think it's pretty hard for an introductory text.

Question 8, @darribas writes

I'm not sure I understand this one

Set up Binder

Peak into this to use Dockerfiles

Cache data used from `osmnx`

Currently we're reading data off OSM on-the-fly via osmnx. It'd be good to have a cache in the repo that allows us to read them locally if connectivity is not an option, and add a note with code in the chapter that it can be read locally.

unable to pull "the image"

I tried "docker pull gdsbook/stack", but the resulted as follows: "Using default tag: latest
Error response from daemon: manifest for gdsbook/stack:latest not found: manifest unknown: manifest unknown"

Thank you for your assistance. db

CHAPTER: Spatial Inequality

Issue to track Chapter 10 on Spatial Inequality.

Update container with `gds_py:3.0` and consider pushing to Docker hub

Permissions error with Docker

This is a hack I use to remap user (and group) IDs on Docker:

docker run -ti --user root -e NB_UID=1001 -e NB_GID=100 -p 8888:8888 -v /home/dani:/home/jovyan/work darribas/gds start.sh

I "think" this remaps the host UID into 1001 and host group ID into 100.

Duplicate records in cleaned US county data

https://github.com/gdsbook/book/tree/master/data/us_county_income
Two records (per capita income) are found for the Bedford County, VA (FIPS code 51019) in the cleaned US county data set: "uscountypcincome.gpkg":

The cleaned csv file "uscountyincome.csv" with county-level per capita income, personal income, and population also has duplicate records for this county.

CHAPTER: Computational Environment

Thread to discuss chapter on computational environment and infrastructure. Picks up from gds#8.

Add landing page title

Take from the book proposal

Fill in landing page for project

On gdsbook.github.io (which lives on geographicdata.science)

And consider turning binder logo to B&W

github page build error

The page build failed for the master branch with the following error:

A file was included in docs/assets/html/search_form.html that is a symlink or does not exist in your _includes directory. For more information, see https://help.github.com/en/articles/page-build-failed-file-is-a-symlink.

For information on troubleshooting Jekyll see:

https://help.github.com/articles/troubleshooting-jekyll-builds

If you have any questions you can contact us by replying to this email.

add data for inequality chapter

Add new dataset: GHSL

It'd be really cool to add an additional raster with non-traditional raster info. I'm thinking the GHSL for population. This would fit well in:

Ch.3 (#23): manipulating objects as surfaces
Ch.4 (#26) : weights from rasters
Ch.7 (#25): local statistics from a raster (extension since chapter is short)

I'm happy to add it myself. As for regions, I was thinking of using somewhere in Latin America, perhaps Sao Paulo (Brazil)?

CHAPTER: Points

Issue set up to track "Chapter 9: Point Pattern Analysis"

Make container throws error "No module named 'jupytext'

Step 4/19 : RUN jupyter-book create testbook --demo

Throws:

ModuleNotFoundError: No module named 'jupytext'

CHAPTER: Geographic Thinking for Data Science

Issue to track Chapter 2 on Spatial Data.

Build support for Google Colab

It'd be good to have the book run on Google Colab.

Higher resolution Matplotlib inline images

By default Matplotlib outputs %matplotlib inline images at a resolution of 100 dpi. This results in some blurry images, e.g. Spatial Weights

This could be adjusted globally by adjusting the settings in the matplotlibrc file. The relevant parameters are:

figure.dpi       : 100

or in a script:

import matplotlib as mpl

mpl.rcParams['figure.dpi'] = 100

Alternatively this magic could be used to output an svg instead: %config InlineBackend.figure_format = 'svg'.

CHAPTER: Spatial Data Processing

Issue to track Chapter 3 on Spatial Data Processing

Typos

Hey guys,

Thank you for the wonderful book! Great start. I went through Part 1 and found a few typos, do you welcome pull requests or would you like me to mention these in the Issues?

Thank you!

Simplify the brexit data using topojson

Currently, we're pointing to a broken data.gov.uk We need to add a cleaned dataset

loss of bookdata.py

in the reorganization, we lost the bookdata.py file that was used in weights to read the datasets. We will need to restore this from a previous version.

Renaming to do

Opening a quick ticket to keep track of bits that need to be renamed across the book:

File name for "Ch.2 - Geographic Thinking" from 02_spatial_data to 02_geographic_thinking
File name for "Ch.3 - Spatial Data" from 03_spatial_data_processing to 03_spatial_data
Sub-sub-heading on Ch.10 from "Spatial Feature Engineering" to "Spatial Heterogeneity" (now we have a full chapter, I think this fits best as "heterogeneity")

If @ljwolf and @sjsrey agree on this, I'd suggest to do it when there are no other PRs waiting, and then merge right away to avoid conflicts.

CHAPTER: Spatial Regression

Thread on regression chapter, picking up discussion from gds#24.

Our relationship to the internet

We need to flesh out our relationship to the internet, specifically when running chapters.

Ideally, we'd like people to be able to execute the chapters without the internet. A few issues arise though:

We occasionally have reads of remote data, such as in the Local Autocorrelation chapter (where we read in brexit returns) and the income inequality chapter (where we read in the county rectified polygons). We need a solution to reduce the size of these datasets and ship them locally. Further, we also use osmnx occasionally, and need to save the network outputs and offer a local option when reading them in. For the large polygonal datasets, consider using the polygon simplification from topojson?
We use remote basemaps a ton with contextily. We need to either add an option to contextily that allows for a "failsaife" mode that returns a white basemap if the provider fails to be reached, or ship a cache of basemaps along with the book.

delabel, don't de-axis

the axis can be useful for multi-facet visualizations... like, if you want to label every row or every column, removing the axis from each facet means you can't use set_xlabel or set_ylabel.

I use this function to delabel the axis, meaning that you remove ticklabels & ticks, but keep the actual bounding box.

def delabel(ax):
    if isinstance(ax, numpy.ndarray):
        orig_shape = ax.shape
        result = numpy.asarray([delabel(ax_) for ax_ in ax.flatten()])
        return result.reshape(orig_shape)
    ax.set_xticks([])
    ax.set_xticklabels([])
    ax.set_yticks([])
    ax.set_yticklabels([])
    return ax

Port Chapter issues from previous private repo

We should create new issues for each chapter and create links to thos in the previous repo. This includes:

Upgrade to jupyter-book0.6

https://github.com/jupyter/jupyter-book/releases/tag/v0.6.0

CHAPTER: Spatial Feature Engineering

Thread to discuss development on chapter on Spatial Feature Engineering.

Fix belab on book site

The belab button now correctly builds the computational backend (through the Docker set up for Binder I think) but it's not properly setup so code runs as expected. Mainly, the belab kernel drops you in the home directory of the repository, not on the content/notebook, as each notebook (and Binder now) expects.

Docker container error

On a new ubuntu machine I'm trying to setup things and hitting a docker error:

Step 13/22 : RUN cd /home/$NB_USER/testbook  && gem install bundler -v 1.17.2  && bundle install
 ---> Running in 412f6a892798
Successfully installed bundler-1.17.2
Parsing documentation for bundler-1.17.2
Installing ri documentation for bundler-1.17.2
Done installing documentation for bundler after 4 seconds
1 gem installed
/usr/lib/ruby/2.5.0/rubygems.rb:289:in `find_spec_for_exe': can't find gem bundler (>= 0.a) with executable bundle (Gem::GemNotFoundException)
        from /usr/lib/ruby/2.5.0/rubygems.rb:308:in `activate_bin_path'
        from /home/jovyan/gems/bin/bundle:23:in `<main>'
The command '/bin/sh -c cd /home/$NB_USER/testbook  && gem install bundler -v 1.17.2  && bundle install' returned a non-zero code: 1
make: *** [Makefile:2: container] Error 1

Not sure why this is now happening - did the container change upstream maybe?

Typo in Ch. 6

Shared by email:

While going through your book, I noticed a small error on the following page:

https://geographicdata.science/book/notebooks/06_spatial_autocorrelation.html#id3

The text under the plot_moran(moran) command says that - 'The black rug signals the mean', while the rug is blue in the diagram.

https://econtent.hogrefe.com/doi/pdf/10.1027/2151-2604/a000387

Duplicate + case conflict in some readme folders in docs/data

See here. This needs to be addressed. On some platforms, this causes a case conflict issue.

Reference management

jupyter-book supports jekyll-scholar:

https://jupyter.org/jupyter-book/features/citations.html

We should explore whether this works.

gdsbook / book Goto Github PK

book's Introduction

Table of Contents