Coder Social home page Coder Social logo

alltheplaces.xyz's Introduction

All the Places

A project to generate point of interest (POI) data sourced from websites with 'store location' pages. The project uses scrapy, a popular Python-based web scraping framework, to execute individual site spiders that retrieve POI data, publishing the results in a standard format. There are various scrapy tutorials on the Internet and this series on YouTube is reasonable.

Getting started

Development setup

Windows users may need to follow some extra steps, please follow the scrapy docs for up to date details.

Ubuntu

These instructions were tested with Ubuntu 22.04.1 LTS on 2024-02-21.

  1. Install Python 3 and pip:

    $ sudo apt-get update
    $ sudo apt-get install -y python3 python3-pip python-is-python3
    
  2. Install pyenv and ensure the correct version of Python is available. The following is a summary of the steps, please refer to the pyenv documentation for the most up-to-date instructions.

    $ sudo apt-get install -y build-essential libssl-dev zlib1g-dev \
          libbz2-dev libreadline-dev libsqlite3-dev curl git \
          libncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev \
          libffi-dev liblzma-dev
    $ curl https://pyenv.run | bash
    $ echo 'export PATH="$HOME/.pyenv/bin:$PATH"' >> ~/.bashrc
    $ echo 'eval "$(pyenv init --path)"' >> ~/.bashrc
    $ echo 'eval "$(pyenv init -)"' >> ~/.bashrc
    $ exec "$SHELL"
    $ pyenv install 3.11
    
  3. Install pipenv and check that it runs:

    $ pip install --user pipenv
    $ pipenv --version
    pipenv, version 2023.12.1
    
  4. Clone a copy of the project from the All the Places repo (or your own fork if you are considering contributing to the project):

    $ git clone [email protected]:alltheplaces/alltheplaces.git
    
  5. Use pipenv to install the project dependencies:

    $ cd alltheplaces
    $ pipenv sync
    
  6. Test for successful project installation:

    $ pipenv run scrapy
    

    If the above runs without complaint, then you have a functional installation and are ready to run and write spiders.

macOS

These instructions were tested with macOS 14.3.1 on 2024-02-21.

  1. Install Python 3 and pip:

    $ brew install python@3
    
  2. Install pyenv and ensure the correct version of Python is available. The following is a summary of the steps, please refer to the pyenv documentation for the most up-to-date instructions.

    $ brew install pyenv
    $ echo 'eval "$(pyenv init --path)"' >> ~/.zshrc
    $ echo 'eval "$(pyenv init -)"' >> ~/.zshrc
    $ exec "$SHELL"
    $ pyenv install 3.11
    
  3. Install pipenv and check that it runs:

    $ brew install pipenv
    $ pipenv --version
    pipenv, version 2023.12.1
    
  4. Clone a copy of the project from the All the Places repo (or your own fork if you are considering contributing to the project):

    $ git clone [email protected]:alltheplaces/alltheplaces.git
    
  5. Use pipenv to install the project dependencies:

    $ cd alltheplaces
    $ pipenv sync
    
  6. Test for successful project installation:

    $ pipenv run scrapy
    

    If the above runs without complaint, then you have a functional installation and are ready to run and write spiders.

Codespaces

You can use GitHub Codespaces to run the project. This is a cloud-based development environment that is created from the project's repository and includes a pre-configured environment with all the tools you need to develop the project. To use Codespaces, click the button below:

Open in GitHub Codespaces

Docker

You can use Docker to run the project. This is a container-based development environment that is created from the project's repository and includes a pre-configured environment with all the tools you need to develop the project.

  1. Clone a copy of the project from the All the Places repo (or your own fork if you are considering contributing to the project):

    $ git clone [email protected]:alltheplaces/alltheplaces.git
    
  2. Build the Docker image:

    $ cd alltheplaces
    $ docker build -t alltheplaces .
    
  3. Run the Docker container:

    $ docker run -it alltheplaces
    

Contributing code

Many of the sites provide their data in a standard format. Others export their data via simple APIs. We have a number of guides to help you develop spiders:

The weekly run

The output from running the project is published on a regular cadence to our website: alltheplaces.xyz. You should not run all the spiders to pick up the output: the less the project "bothers" a website the more we will be tolerated.

Contact us

Communication is primarily through tickets on the project GitHub issue tracker. Many contributors are also present on OSM US Slack, in particular we watch the #poi channel.

License

The data generated by our spiders is provided on our website and released under Creative Commons’ CC-0 waiver.

The spider software that produces this data (this repository) is licensed under the MIT license.

alltheplaces.xyz's People

Contributors

brawer avatar cj-malone avatar davidhicks avatar dependabot[bot] avatar iandees avatar jleedev avatar matkoniecz avatar mjoe999 avatar tif-calin avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

alltheplaces.xyz's Issues

CSS preventing access to link

On the main page of the website https://www.alltheplaces.xyz/ I am unable to click the "alltheplaces" link in the bottom left because the :hover CSS causes the link to move before it can be clicked:

alltheplaces-hover

Apparently there is this CSS rule:

a:hover, a :focus {
    color: #069;
    font-weight: bold;
}

and the change to bold forces the line to wrap. Replacing the font-weight: bold with something like text-decoration: underline should solve the problem. I couldn't find this CSS in the repo -- I guess it is part of jekyll-theme-minimal?

Consider building dataset compared with OpenStreetMap

Of potential interest:

  • listing objects without nearby OSM match, revealing locations potentially worth surveying
  • listing OSM objects marked as branded with brand present in ATP and no nearby ATP match

and publishing both as a map.

Both can end detecting bad data in ATP rather than missing/outdated shops in OSM, definitely would not be usable for direct import but I think it would be of interest and useful for further reprocessing.

It is possible that I would be able to implement something like that if there would be interest in running and publishing it.

Add links to the logs

Currently getting to the logs is a pain, as far as I know the only way is by knowing the url. Adding a link to them would help (maybe spiders.html on the features count?)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.