Coder Social home page Coder Social logo

edwardbetts / osm-wikidata Goto Github PK

View Code? Open in Web Editor NEW
103.0 12.0 20.0 2.86 MB

Match OSM entities with Wikidata items

Home Page: http://osm.wikidata.link/

License: GNU General Public License v3.0

Python 82.07% HTML 14.29% CSS 0.56% JavaScript 3.08%
wikidata openstreetmap

osm-wikidata's Introduction

Introduction

This tool is running live using the Wikidata SPARQL query service and the OSM Overpass and Nominatim APIs. It works best with a city, island or administrative area.

The index page displays list of existing matches and a search box. If you pick an existing result you'll see Wikidata items within the given area, for each item there is a list of candidate matching OSM items.

Matching process interface

The matching process takes a few minutes. It downloads potential OSM objects from Overpass and loads them into PostgreSQL with osm2pgsql. The interface provides status updates while the matching process is running.

A large area will trigger an Overpass timeout and the match will fail.

Once the matching process is complete a 'view match candidates' link will appear. The matching is based on names and English Wikipedia categories.

One-to-one between OSM and Wikidata

With this system the aim is for a single OSM entity to link to one Wikidata item. Many geographical entities are represented by multiple objects in OSM with the same name, for example dual carriageway bridges are mapped as two roadways, or buildings within a large site like a hospital.

For bridges with two roadways the system will look for the outline of the bridge tagged with man_made=bridge and for a hospital or other campus the aim is to tag the way or relation that represents the entire site.

English language Wikipedia categories and Wikidata are used for matching

The matching system makes use of categories on Wikipedia because the information on Wikidata is incomplete. Wikidata includes an import of all Wikipedia including the coordinates, but for a lot of items the 'instance of' property is not set.

There is a mapping from Wikipedia category to possible OSM tags. For example if a Wikipedia article has the word 'station' in any category then railway=station is added to the list of possible tags.

Name matching

Each Wikidata name is compared with each name in OSM. Wikidata names are pulled from the item name, alias and sitelinks in every language. Most name fields on the OSM side are considered, old_name and a few others are ignored.

The name comparison includes some normalisation, punctuation is removed.

Existing OSM tags for Wikidata and Wikipedia

Existing Wikipedia tags are ignored. If an OSM entity already has a Wikidata tag it will be left alone.

Items that are unlikely to match

  • Radio stations: not mapped on OpenStreetMap
  • Streets: the matcher ignores streets to minimise the Overpass data download
  • Sites of Special Scientific Interest: not tagged as such on OSM
  • Rivers: river relations are not multipolygons so osm2pgsql ignores them
  • Things that no longer exist: Wikidata items that existed in the past
  • Organisations: Small organisations within office buildings are not in OSM
  • Events and festivals: Not part of OSM

Development

osm-wikidata's People

Contributors

antoine-g avatar camelcasenick avatar cclauss avatar edwardbetts avatar iknowjoseph avatar matkoniecz avatar tilmanb avatar todrobbins avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

osm-wikidata's Issues

Include maps

A long list of results is dull. Better to show them on a map, maybe with the list next to it

Allow aborted match to be resumed

Add a field to item, called checked default false, set to true after check. Record update checked field after each individual match.

Matcher grabs count of checked items and list of unchecked items.

show login page when adding tags

Users need to be logged in via oauth to add wikidata tags to OSM.

We should show users the login page with a redirect back to the add tags page.

Drop back to name match for individual items

The matching for individual items doesn't need to be so precise because it is being scrutinised by the user.
If matching by tag fails we can search for OSM elements in the search radius with a name that exactly matches a name from Wikidata.

Need to decide if this is one or two overpass queries.

Add Wikipedia categories size to database

We can crawl Wikipedia and find the size of categories, it would be useful to show this information on the criteria page. To make these numbers accessible they need to be in the database.

Add a URL to lookup an individual Wikidata item

URLs like this: https://osm.wikidata.link/Q9427

First use overpass to check if the item is already tagged on OSM, if so show the matching item.
If the item isn't tagged then try and find a match. Grab relevant data from Wikidata, then use overpass to look for matching item on OSM. Show possible match and offer to update item with OAuth.

Would be nice to include a map.

Interface with taginfo API

Extract number of occurrences of each OSM tag/key. Display on matching criteria page.

Taginfo images are available via the taginfo page, they should be included in the matching criteria cards.

Match name to initials

"Cambridge Centre for Sixth-form Studies" should match "CCSS".

Start with a unit test.

Two step overpass idea

First search for all items with wikidata tags in selected area. Check if they're a good match and exclude them from the second overpass query.

Maybe allows us to attempt bigger overpass areas if they already have good Wikidata tag coverage.

TypeError when opening candidates result page

I wanted to look at the results for Berlin, Germany:

https://osm.wikidata.link/filtered/Berlin/candidates/62422

I get a traceback:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/flask/app.py", line 1997, in __call__
    return self.wsgi_app(environ, start_response)
  File "/usr/lib/python3/dist-packages/flask/app.py", line 1985, in wsgi_app
    response = self.handle_exception(e)
  File "/usr/lib/python3/dist-packages/flask/app.py", line 1540, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/lib/python3/dist-packages/flask/_compat.py", line 33, in reraise
    raise value
  File "/usr/lib/python3/dist-packages/flask/app.py", line 1982, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/lib/python3/dist-packages/flask/app.py", line 1614, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/lib/python3/dist-packages/flask/app.py", line 1517, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/lib/python3/dist-packages/flask/_compat.py", line 33, in reraise
    raise value
  File "/usr/lib/python3/dist-packages/flask/app.py", line 1612, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/lib/python3/dist-packages/flask/app.py", line 1598, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/edward/src/2017/osm-wikidata/matcher/view.py", line 105, in candidates_with_filter
    return candidates(osm_id)
  File "/home/edward/src/2017/osm-wikidata/matcher/view.py", line 167, in candidates
    candidates=items)
  File "/usr/lib/python3/dist-packages/flask/templating.py", line 134, in render_template
    context, ctx.app)
  File "/usr/lib/python3/dist-packages/flask/templating.py", line 116, in _render
    rv = template.render(context)
  File "/usr/lib/python3/dist-packages/jinja2/environment.py", line 1008, in render
    return self.environment.handle_exception(exc_info, True)
  File "/usr/lib/python3/dist-packages/jinja2/environment.py", line 780, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/lib/python3/dist-packages/jinja2/_compat.py", line 37, in reraise
    raise value.with_traceback(tb)
  File "/home/edward/src/2017/osm-wikidata/matcher/templates/candidates.html", line 1, in top-level template code
    {% extends "base.html" %}
  File "/home/edward/src/2017/osm-wikidata/matcher/templates/base.html", line 17, in top-level template code
    {% block content %}{% endblock %}
  File "/home/edward/src/2017/osm-wikidata/matcher/templates/candidates.html", line 64, in block "content"
    {% set m = c.get_match() %}
  File "/home/edward/src/2017/osm-wikidata/matcher/model.py", line 341, in get_match
    return match.check_for_match(self.tags, self.item, endings, wikidata_names)
TypeError: check_for_match() takes from 2 to 3 positional arguments but 4 were given

individual match should trim endings

Matching railway stations with the item page or API fails because we're not trimming " railway station" from the name. We should use the trim data from the entity criteria.

osm2pgsql doesn't load type=waterway relations

Trying to match rivers by looking for relations with type=waterway doesn't work because osm2pgsql ignores them. Either need to switch away from osm2pgsql or find a way to make it load linear (non-polygon) relations.

Filter for public_transport=station

Yser metro station (Q2715354)
name:nl=IJzer (node, 348 m) railway=tram_stop, public_transport=platform
name:nl=IJzer (way) railway=station, public_transport=station
name:nl=IJzer (node, 324 m) railway=tram_stop, public_transport=platform

Filter should just match the element tagged public_transport=station, ignore the elements tagged public_transport=platform.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.