Repository for the book "Geographic Data Science with Python"
Check the current table of contents.
This book serves as an introduction to a whole new way of thinking systematically about geographic data, using geographical analysis and computation to unlock new insights hidden within data.
Home Page: https://geographicdata.science
License: Other
Repository for the book "Geographic Data Science with Python"
Check the current table of contents.
This will need to be decided before we go public, just making a collective mental note.
This would make it easier to directly point to the part of the notebook where the link is provided
In local_autocorrelation.ipynb
, there is an inline question:
Should we adopt some scheme so as to refer to earlier chapters, as in the case of maps and weights here to improve the flow of the book and reduce any repetition of basic tasks?
I say yes, we should always refer to a previous chapter if we talked about a topic before.
Thread to discuss chapter on computational environment and infrastructure. Picks up from gds#8.
Thread to discuss chapter on global autocorrelation. Picks up from GDS#4
Issue to track Chapter 10 on Spatial Inequality.
Take from the book proposal
Issue to track Chapter 3 on Spatial Data Processing
https://github.com/gdsbook/book/tree/master/data/us_county_income
Two records (per capita income) are found for the Bedford County, VA (FIPS code 51019) in the cleaned US county data set: "uscountypcincome.gpkg":
The cleaned csv file "uscountyincome.csv" with county-level per capita income, personal income, and population also has duplicate records for this county.
Opening a quick ticket to keep track of bits that need to be renamed across the book:
02_spatial_data
to 02_geographic_thinking
03_spatial_data_processing
to 03_spatial_data
If @ljwolf and @sjsrey agree on this, I'd suggest to do it when there are no other PRs waiting, and then merge right away to avoid conflicts.
Currently, we're pointing to a broken data.gov.uk We need to add a cleaned dataset
Question 6, @darribas writes:
We could recast this question to make it more practical, along the lines @ljwolf does in other chapters. For example, on this one, it could be:
- Re-run the analysis in the chapter w/ a different set of weights.
- Compare the resulting clusters visually
- What are the key differences between the two W's?
- How do you think such differences affect the final result?
Question 7, @darribas writes
This one I think it's pretty hard for an introductory text.
Question 8, @darribas writes
I'm not sure I understand this one
See here. This needs to be addressed. On some platforms, this causes a case conflict issue.
Hey guys,
Thank you for the wonderful book! Great start. I went through Part 1 and found a few typos, do you welcome pull requests or would you like me to mention these in the Issues?
Thank you!
Ticket to track process of "Chapter 10 - Clustering and regionalization"
Currently we're reading data off OSM on-the-fly via osmnx
. It'd be good to have a cache in the repo that allows us to read them locally if connectivity is not an option, and add a note with code in the chapter that it can be read locally.
In the Run the book's container
section, I would like to suggest that you change the following docker command:
docker pull gdsbook/stack
to:
docker pull gdsbook/stack:3.0
When I run the command that is currently in the book in Windows PowerShell, I receive the following error:
Error response from daemon: manifest for gdsbook/stack:latest not found: manifest unknown: manifest unknown
Per this thread on the Docker forums, I understand that running the command without the 3.0 tag defaults to pulling the image with tag:latest
, but the "latest" tag doesn't exist, so I receive that error.
Thanks very much for all your work on this wonderful book!
Here's a reference that might be worth mentioning in Chapter 1:
https://econtent.hogrefe.com/doi/pdf/10.1027/2151-2604/a000387
Discussion on choropleths chapter. Picks up from GDS#5.
Great to see this book announced, looking forward to a much needed resource on #geopy! FYI tried clicking on the Gitter room: https://gitter.im/sjsrey/gds_pysal - may just be me but got a '404' type message. Best of luck with it.
On a new ubuntu machine I'm trying to setup things and hitting a docker error:
Step 13/22 : RUN cd /home/$NB_USER/testbook && gem install bundler -v 1.17.2 && bundle install
---> Running in 412f6a892798
Successfully installed bundler-1.17.2
Parsing documentation for bundler-1.17.2
Installing ri documentation for bundler-1.17.2
Done installing documentation for bundler after 4 seconds
1 gem installed
/usr/lib/ruby/2.5.0/rubygems.rb:289:in `find_spec_for_exe': can't find gem bundler (>= 0.a) with executable bundle (Gem::GemNotFoundException)
from /usr/lib/ruby/2.5.0/rubygems.rb:308:in `activate_bin_path'
from /home/jovyan/gems/bin/bundle:23:in `<main>'
The command '/bin/sh -c cd /home/$NB_USER/testbook && gem install bundler -v 1.17.2 && bundle install' returned a non-zero code: 1
make: *** [Makefile:2: container] Error 1
Not sure why this is now happening - did the container change upstream maybe?
Issue to track Chapter 2 on Spatial Data.
This would help keep the website up to date with development without additional burden on developers.
By default Matplotlib outputs %matplotlib inline
images at a resolution of 100 dpi. This results in some blurry images, e.g. Spatial Weights
This could be adjusted globally by adjusting the settings in the matplotlibrc
file. The relevant parameters are:
figure.dpi : 100
or in a script:
import matplotlib as mpl
mpl.rcParams['figure.dpi'] = 100
Alternatively this magic could be used to output an svg instead: %config InlineBackend.figure_format = 'svg'
.
We need to flesh out our relationship to the internet, specifically when running chapters.
Ideally, we'd like people to be able to execute the chapters without the internet. A few issues arise though:
osmnx
occasionally, and need to save the network outputs and offer a local option when reading them in. For the large polygonal datasets, consider using the polygon simplification from topojson?contextily
. We need to either add an option to contextily that allows for a "failsaife" mode that returns a white basemap if the provider fails to be reached, or ship a cache of basemaps along with the book.Thread to discuss chapter on Spatial Weights. It picks discussion up from gds#40.
the axis can be useful for multi-facet visualizations... like, if you want to label every row or every column, removing the axis from each facet means you can't use set_xlabel
or set_ylabel
.
I use this function to delabel the axis, meaning that you remove ticklabels & ticks, but keep the actual bounding box.
def delabel(ax):
if isinstance(ax, numpy.ndarray):
orig_shape = ax.shape
result = numpy.asarray([delabel(ax_) for ax_ in ax.flatten()])
return result.reshape(orig_shape)
ax.set_xticks([])
ax.set_xticklabels([])
ax.set_yticks([])
ax.set_yticklabels([])
return ax
In order to use the tools in pointpats
for the Points chapter (#28), we need to add the pointpats
package to the docker.
in the reorganization, we lost the bookdata.py
file that was used in weights
to read the datasets. We will need to restore this from a previous version.
The page build failed for the master
branch with the following error:
A file was included in docs/assets/html/search_form.html
that is a symlink or does not exist in your _includes
directory. For more information, see https://help.github.com/en/articles/page-build-failed-file-is-a-symlink.
For information on troubleshooting Jekyll see:
https://help.github.com/articles/troubleshooting-jekyll-builds
If you have any questions you can contact us by replying to this email.
This is a hack I use to remap user (and group) IDs on Docker:
docker run -ti --user root -e NB_UID=1001 -e NB_GID=100 -p 8888:8888 -v /home/dani:/home/jovyan/work darribas/gds start.sh
I "think" this remaps the host UID into 1001 and host group ID into 100.
We should create new issues for each chapter and create links to thos in the previous repo. This includes:
Shared by email:
While going through your book, I noticed a small error on the following page:
https://geographicdata.science/book/notebooks/06_spatial_autocorrelation.html#id3
The text under the plot_moran(moran) command says that - 'The black rug signals the mean', while the rug is blue in the diagram.
Issue set up to track "Chapter 9: Point Pattern Analysis"
Step 4/19 : RUN jupyter-book create testbook --demo
Throws:
ModuleNotFoundError: No module named 'jupytext'
Peak into this to use Dockerfile
s
The belab button now correctly builds the computational backend (through the Docker set up for Binder I think) but it's not properly setup so code runs as expected. Mainly, the belab kernel drops you in the home directory of the repository, not on the content/notebook
, as each notebook (and Binder now) expects.
Thread on regression chapter, picking up discussion from gds#24.
Thread to discuss development on chapter on Spatial Feature Engineering.
I tried "docker pull gdsbook/stack", but the resulted as follows: "Using default tag: latest
Error response from daemon: manifest for gdsbook/stack:latest not found: manifest unknown: manifest unknown"
Thank you for your assistance. db
It'd be good to have the book run on Google Colab.
We should come up with a way to refer to figures and equations on the notebook
jupyter-book
supports jekyll-scholar
:
https://jupyter.org/jupyter-book/features/citations.html
We should explore whether this works.
It'd be really cool to add an additional raster with non-traditional raster info. I'm thinking the GHSL for population. This would fit well in:
I'm happy to add it myself. As for regions, I was thinking of using somewhere in Latin America, perhaps Sao Paulo (Brazil)?
Thread to coordinate the development of Chapter 7 on Local Autocorrelation. It picks discussion up from gds#62.
On gdsbook.github.io
(which lives on geographicdata.science
)
And consider turning binder logo to B&W
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.