pelias / pelias Goto Github PK

View Code? Open in Web Editor NEW

3.1K 101.0 218.0 10.87 MB

Pelias is a modular open-source geocoder using Elasticsearch.

Home Page: https://pelias.io

License: MIT License

CSS 34.50% JavaScript 14.24% Dockerfile 3.14% Twig 48.12%

pelias's Introduction

A modular, open-source search engine for our world.

Pelias is a geocoder powered completely by open data, available freely to everyone.

Local Installation · Cloud Webservice · Documentation · Community Chat

What is Pelias?

Pelias is a search engine for places worldwide, powered by open data. It turns addresses and place names into geographic coordinates, and turns geographic coordinates into places and addresses. With Pelias, you’re able to turn your users’ place searches into actionable geodata and transform your geodata into real places.

We think open data, open source, and open strategy win over proprietary solutions at any part of the stack and we want to ensure the services we offer are in line with that vision. We believe that an open geocoder improves over the long-term only if the community can incorporate truly representative local knowledge.

Pelias

A modular, open-source geocoder built on top of Elasticsearch for fast and accurate global search.

What's a geocoder do anyway?

Geocoding is the process of taking input text, such as an address or the name of a place, and returning a latitude/longitude location on the Earth's surface for that place.

... and a reverse geocoder, what's that?

Reverse geocoding is the opposite: returning a list of places near a given latitude/longitude point.

What are the most interesting features of Pelias?

Completely open-source and MIT licensed
A powerful data import architecture: Pelias supports many open-data projects out of the box but also works great with private data
Support for searching and displaying results in many languages
Fast and accurate autocomplete for user-facing geocoding
Support for many result types: addresses, venues, cities, countries, and more
Modular design, so you don't need to be an expert in everything to make changes
Easy installation with minimal external dependencies

What are the main goals of the Pelias project?

Provide accurate search results
Work equally well for a small city and the entire planet
Be highly configurable, so different use cases can be handled easily and efficiently
Provide a friendly, welcoming, helpful community that takes input from people all over the world

Where did Pelias come from?

Pelias was created in 2014 as an early project at Mapzen. After Mapzen's shutdown in 2017, Pelias is now part of the Linux Foundation.

How does it work?

Magic! (Just kidding) Like any geocoder, Pelias combines full text search techniques with knowledge of geography to quickly search over many millions of records, each representing some sort of location on Earth.

The Pelias architecture has three main components and several smaller pieces.

Data importers

The importers filter, normalize, and ingest geographic datasets into the Pelias database. Currently there are six officially supported importers:

OpenStreetMap: supports importing nodes and ways from OpenStreetMap
OpenAddresses: supports importing the hundreds of millions of global addresses collected from various authoritative government sources by OpenAddresses
Who's on First: supports importing admin areas and venues from Who's on First
Geonames: supports importing admin records and venues from Geonames
Polylines: supports any data in the Google Polyline format. It's mainly used to import roads from OpenStreetMap
CSV: supports importing any data in CSV format, which is great for custom data or proprietary data

We are always discussing supporting additional datasets. Pelias users can also write their own importers, for example to import proprietary data into your own instance of Pelias.

Database

The underlying datastore that does most of the query heavy-lifting and powers our search results. We use Elasticsearch. Currently versions 7 and 8 are supported.

We've built a tool called pelias-schema that sets up Elasticsearch indices properly for Pelias.

Frontend services

This is where the actual geocoding process happens, and includes the components that users interact with when performing geocoding queries. The services are:

API: The API service defines the Pelias API, and talks to Elasticsearch or other services as needed to perform queries.
Placeholder: A service built specifically to capture the relationship between administrative areas (a catch-all term meaning anything like a city, state, country, etc). Elasticsearch does not handle relational data very well, so we built Placeholder specifically to manage this piece.
PIP: For reverse geocoding, it's important to be able to perform point-in-polygon(PIP) calculations quickly. The PIP service is is very good at quickly determining which admin area polygons a given point lies in.
Libpostal: Pelias uses the libpostal project for parsing addresses using the power of machine learning. We use a Go service built by the Who's on First team to make this happen quickly and efficiently.
Interpolation: This service knows all about addresses and streets. With that knowledge, it is able to supplement the known addresses that are stored directly in Elasticsearch and return fairly accurate estimated address results for many more queries than would otherwise be possible.

Dependencies

These are software projects that are not used directly but are used by other components of Pelias.

There are lots of these, but here are some important ones:

model: provide a single library for creating documents that fit the Pelias Elasticsearch schema. This is a core component of our flexible importer architecture
wof-admin-lookup: A library for performing administrative lookup using point-in-polygon math. Previously included in each of the importers but now only used by the PIP service.
query: This is where most of our actual Elasticsearch query generation happens.
config: Pelias is very configurable, and all of it is driven from a single JSON file which we call pelias.json. This package provides a library for reading, validating, and working with this configuration. It is used by almost every other Pelias component
dbclient: A Node.js stream library for quickly and efficiently importing records into Elasticsearch

Helpful tools

Finally, while not part of Pelias proper, we have built several useful tools for working with and testing Pelias

Notable examples include:

acceptance-tests: A Node.js command line tool for testing a full planet build of Pelias and ensuring everything works. Familiarity with this tool is very important for ensuring Pelias is working. It supports all Pelias features and has special facilities for testing autocomplete queries.
compare: A web-based tool for comparing different instances of Pelias (for example a production and staging environment). We have a reference instance at pelias.github.io/compare/
dashboard: Another web-based tool for providing statistics about the contents of a Pelias Elasticsearch index such as import speed, number of total records, and a breakdown of records of various types.

Documentation

The main documentation lives in the pelias/documentation repository.

Additionally, the README file in each of the component repositories listed above provides more detail on that piece.

Here's an example API response for a reverse geocoding query

$ curl -s "search.mapzen.com/v1/reverse?size=1&point.lat=40.74358294846026&point.lon=-73.99047374725342&api_key={YOUR_API_KEY}" | json
{
    "geocoding": {
        "attribution": "https://search.mapzen.com/v1/attribution",
        "engine": {
            "author": "Mapzen",
            "name": "Pelias",
            "version": "1.0"
        },
        "query": {
            "boundary.circle.lat": 40.74358294846026,
            "boundary.circle.lon": -73.99047374725342,
            "boundary.circle.radius": 500,
            "point.lat": 40.74358294846026,
            "point.lon": -73.99047374725342,
            "private": false,
            "querySize": 1,
            "size": 1
        },
        "timestamp": 1460736907438,
        "version": "0.1"
    },
    "type": "FeatureCollection",
    "features": [
        {
            "geometry": {
                "coordinates": [
                    -73.99051,
                    40.74361
                ],
                "type": "Point"
            },
            "properties": {
                "borough": "Manhattan",
                "borough_gid": "whosonfirst:borough:421205771",
                "confidence": 0.9,
                "country": "United States",
                "country_a": "USA",
                "country_gid": "whosonfirst:country:85633793",
                "county": "New York County",
                "county_gid": "whosonfirst:county:102081863",
                "distance": 0.004,
                "gid": "geonames:venue:9851011",
                "id": "9851011",
                "label": "Arlington, Manhattan, NY, USA",
                "layer": "venue",
                "locality": "New York",
                "locality_gid": "whosonfirst:locality:85977539",
                "name": "Arlington",
                "neighbourhood": "Flatiron District",
                "neighbourhood_gid": "whosonfirst:neighbourhood:85869245",
                "region": "New York",
                "region_a": "NY",
                "region_gid": "whosonfirst:region:85688543",
                "source": "geonames"
            },
            "type": "Feature"
        }
    ],
    "bbox": [
        -73.99051,
        40.74361,
        -73.99051,
        40.74361
    ]
}

How can I install my own instance of Pelias?

To try out Pelias quickly, use our Docker setup. It uses Docker and docker-compose to allow you to quickly set up a Pelias instance for a small area (by default Portland, Oregon) in under 30 minutes.

Do you offer a free geocoding API?

You can sign up for a trial API key at Geocode Earth. A commercial service has been operated by the core development team behind Pelias since 2014 (previously at search.mapzen.com). Discounts and free plans are available for free and open-source software projects.

What's it built with?

Pelias itself (the import pipelines and API) is written in Node.js, which makes it highly accessible for other developers and performant under heavy I/O. It aims to be modular and is distributed across a number of Node packages, each with its own repository under the Pelias GitHub organization.

For a select few components that have performance requirements that Node.js cannot meet, we prefer to write things in Go. A good example of this is the pbf2json tool that quickly converts OSM PBF files to JSON for our OSM importer.

Elasticsearch is our datastore of choice because of its unparalleled full text search functionality, scalability, and sufficiently robust geospatial support.

Contributing

We built Pelias as an open source project not just because we believe that users should be able to view and play with the source code of tools they use, but to get the community involved in the project itself.

Especially with a geocoder with global coverage, it's just not possible for a small team to do it alone. We need you.

Anything that we can do to make contributing easier, we want to know about. Feel free to reach out to us via Github, Gitter, email, or Twitter. We'd love to help people get started working on Pelias, especially if you're new to open source or programming in general.

We have a list of Good First Issues for new contributors.

Both this meta-repo and the API service repo are worth looking at, as they're where most issues live. We also welcome reporting issues or suggesting improvements to our documentation.

The current Pelias team can be found on Github as missinglink and orangejulius.

Members emeritus include:

pelias's People

Contributors

Stargazers

Watchers

Forkers

gijs kiselev-dv missinglink hitkumar hkrishna flaviofalcao emurphy javm nygeog brendancol gitter-badger miguelramosfdz jstopchick liqianggao cule bradh vincent-kuwornu armgilles lydonchandra dlobrien conqs qbektrix ecolog tmdvs anujmehta frankzwang jparish3 apsaltis migurski ahmedfawzy smbale trinitycomputers cuulee easherma khanchan lambder sarvex ludovicf01 mashamba the-eagle-eye-33 nvkelso tallytalwar biddyweb 5tsrl danielshir sandykurniawan19 ambroisekritz enobrev johnzimm apollolm sumit27 tylerjharden magicallyindia k56flex philipdomann geolibrerian coyotey fjteam doneladams jhgjhtuytdfbnfvmnbgjtuydt lyntel donrv oliverbienert riordan tdevico jjediny gscalia growingupfisher mahermeg17 lighthouse-io rmglennon kerrick-lyft cnouguier mblankleder nextzen dvbportal edkimmel jqnatividad msbarry smorin va2ron1 seizethedata davidalisterbell cookry echelon9 johan-- pgwelch cugwind tld01 ramzi-alqrainy paraskashyap saik003 davidmr001 kampkb dragoon87 adonig defozo sm2x spiritinlife arpitabatra

pelias's Issues

Agree on a consistent unit testing framework to use in all pelias repos

As agreed upon in a recent chat, it is important to agree on a single framework to be used for unit testing across the pelias organization. This will ensure consistency and cohesion.
We are currently predominantly using tape, but there is no strong preference for tape. I prefer mocha. I've compiled a short list of my reasons for liking it, and an even shorter list of my reasons for not liking it. Also, I like using mocha with should, which reads a lot like natural language and has extensive assertions built in.

Pros:

skip/only are great tools during development
- skip makes the tests show up as pending so you don’t forget to come back to them. you can skip entire suites or single tests.
```
describe.skip('some test', function () { ... });
describe('another test', function () {
it.skip('should work', function (done) { ... });
});
```
- you can write unimplemented tests, which are effective place holders
```
describe('another test', function () {
it('should work in the future');
});
```
- only lets you run a single test while debugging an issue, no need to comment out all other test cases
before/beforeEach/after/afterEach can be nested at various levels
can run tests matching some regex pattern , or inverse of regex match results
lots of built-in/plug-in reporters
easy hookup to coverage tools (if we decide to go that route)
very well supported and embraced by the community. a lot of open source projects use it so contributors would feel comfortable adding tests. searching github:
- dependencies mocha extension:.json: 136,726
- dependencies tape extension:.json: 10,808

Cons:

have to add globals to .jshintrc because there is some magic that happens with both mocha and should
need to rewrite existing tests for consistency

@missinglink @hkrishna @sevko opinions please

Pelias team policy on data privacy should be in-line with the Mapzen policy document [PL-GG03]

Do not track or store any request/user data when the user requests privacy mode. This will support the privacy-centric Open app currently in the works.

*use no-tracking header

Simplify API [PL-API01]

Do we need suggest and search?
Can we drop suggest?
Investigate ngram solution. Decided if we should wait for ES or implement ngrams.

related issues:

elastic/elasticsearch#8909 (comment)

admin1_abbr population comes back

Expose bounding boxes in results where appropriate

When using a geocoder to jump to a particular location, bounding boxes are useful to appropriately set the viewport's zoom level. Assuming that Pelias stores them in ES, would it be possible to expose them in the API results?

Nominatim exposes boundingbox, e.g. http://nominatim.openstreetmap.org/search?q=Brooklyn&format=json:

{
  "place_id": "5988439137",
  "licence": "Data © OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright",
  "osm_type": "node",
  "osm_id": "158857828",
  "boundingbox": [
    "40.6501007080078",
    "40.6501045227051",
    "-73.9495849609375",
    "-73.949577331543"
  ],
  "lat": "40.6501038",
  "lon": "-73.9495823",
  "display_name": "Brooklyn, Downtown Brooklyn, Kings County, New York City, New York, United States of America",
  "class": "place",
  "type": "suburb",
  "importance": 0.79454442710904,
  "icon": "http://nominatim.openstreetmap.org/images/mapicons/poi_place_village.p.20.png"
}

In Pelias' case, it would probably make sense to expose the bounding box in addition to the point geometry. GeoJSON's bbox looks appropriate for this.

standardize logging

We should discuss a uniform logging solution:

~~what logger to use~~: we've basically already settled on using winston, so we'll stick with that unless there's a good reason not to.
default logger configuration: what default settings do we want to instantiate our logger with? I've been using the following openaddresses and dbclient:
```
winston.remove( winston.transports.Console );
winston.add( winston.transports.Console, {
  timestamp: true,
  colorize: true,
  level: 'verbose'
});
```
Is it worth spinning up a Pelias package containing our logger preferences, so that we can simply require( 'pelias-logger' ) and not duplicate them everywhere?
environment-specific logger configuration: we'll presumably want to be able to tweak the defaults from pelias-config.

cc @dianashk , @hkrishna , @missinglink

Improve import pipeline speed

Demo link "not found"

In the readme, the link for the demo is not found. Consider a revised link?

Improve admin hierarchy [PL-GG07]

Make improvements to the admin hierarchy to ensure accurate and consistent results .

related issues:

pelias-deprecated/admin-lookup#1

investigate postal_code polygons

We should explore whether zip codes are valuable to index. They currently aren't accounted for in our Elasticsearch pipeline, and are mostly discarded by import scripts.

Consider using a different index in ES for each data source [PL-IM05]

Consider using a different index in ES for each data source.

Strange spelling mistake or a language I don't know

When I click somewhere in West Flanders (or West-Vlaanderen in Dutch), see https://mapzen.com/pelias#loc=14,50.9382,3.0918

The typical address that appears is "xx street, town, West-vlanderen". That's wrongly capitalised, and misses an "a" to be correct Dutch. I'm not sure where the problem comes from, as the data is correct in OSM: http://www.osm.org/relation/416271

openstreetmap pipeline improvements

[?] add regression tests to cover mappings, including the new address mapper

Clean old branches and tickets

Remove the remnants of the old system from this repo.
related: #18

Scoring based on the admin area it belongs to [PL-AG05]

Inherit score from enclosing admin area

related issues:

Hierarchy lookup should also provide a score

When we import geonames, osm nodes, ways etc - we lookup what admin boundaries each point (lat/lon) belongs to and populate a document object. I think this lookup should return all admin info (admin0, admin1, neighborhood, locality, alpha3 etc) and a score (based on population, popularity, category scores of individual admin types).

This way when we search for 123 main st - 123 main st, new york, ny has a higher score than 123 main st, lnyxville, wi

select only certain fields for geoname lookup

Add vagrant info to readme

Add information about vagrant installs and a link to that repo in the README file in this repository.

Fallback on street centroid for missing addresses [PL-GG05]

When not able to find street number, fallback to street centroid.
Should already be happening, since we import OSM way centroids. They don't seem to be trickling into the search results, though. Investigate.

Guidelines for contributing

Add guidelines for contributing to the Pelias repositories: CONTRIBUTING.md?

@KathleenLD

document code with JSDoc-style block comments

I'm a fan of JSDoc-style documentation comments, since it's usually helpful to comprehensively document your module/function APIs (while the implementation itself should be self-documented, unless you're doing something quite clever). We won't be using the jsdoc utility to actually generate documentation, but I think it's a decent format to adhere to. I only typically use @param and @return tags, like in this verbose example:

/**
 * Import all OpenAddresses CSV files in a directory into Pelias elasticsearch.
 *
 * @param {string} dirPath The path to a directory. All *.csv files inside of
 *    it will be read and imported (they're assumed to contain OpenAddresses
 *    data).
 * @param {object} opts Options to configure the import. Supports the following
 *    keys:
 *
 *      deduplicate: Pass address object through `address-deduplicator-stream`
 *        to perform deduplication. See the documentation:
 *        https://github.com/pelias/address-deduplicator-stream
 *
 *      admin-values: Add admin values to each address object (since
 *        OpenAddresses doesn't contain any) using `hierarchy-lookup`. See the
 *        documentation: https://github.com/pelias/hierarchy-lookup
 */

I'll eschew it for short/obvious functions.

Thoughts from @pelias/contributors ?

problem with import

I'm trying to import data for Europe but fail at the first step. Postgres is 9.3 with PostGIS 2.1.1 (shp2pgsql is the same version). The encoding is UTF-8 and osm2pgsql data is already imported in that database. Any Ideas what might cause the issue?

bundle exec rake quattroshapes:prepare_all

....
invalid command \N
invalid command \N
invalid command \.
ERROR:  syntax error at or near "AUS"
LINE 1: AUS Australia adm2 AU Australia 
        ^
ROLLBACK
rm /tmp/mapzen/qs_adm2*
wget http://static.quattroshapes.com/qs_localadmin.zip -P /tmp/mapzen

Provide Howto for installing on private server

That would be great.!

pelias/quattroshapes-patch

character encoding problem - special letters ( admin0 , admin1, ... )

I see some character encoding problems
test case: http://mapzen.com/pelias ; search: kamut

result :

But the Correct:

Kamut (locality)
Kamut, Békés
admin0: Magyarország
admin1: Békés
locality: Kamut

Kamut: ( http://en.wikipedia.org/wiki/Kamut,_Hungary )
Kamut is a village in Békés County, in the Southern Great Plain region of south-east Hungary.

Make admin lookup part of every importer [PL-IMO8]

Will result in a consistent hierarchy lookup

related issues:

Failure to create ES index?

Hi,

Thx for building and releasing this!

I'm having a bit of trouble creating the indices using bundle exec rake index:create.

I've cloned the master branch into a 12.04 lxc container with PostGIS 2 up and running. Also have ES 1.1.0 running on the default port.
I installed Ruby 2.1.1p76 (2014-02-24 revision 45161) [x86_64-linux] and changed the version of debugger from 1.6.5 to 1.6.6 (after which bundle succeeded).

Any ideas on where to look? I'm not very familiar with Ruby...

Thx!

Output from Ruby (and ES below):

vagrant@vagrant-base-precise-amd64:/vagrant/tmp/pelias$ bundle exec rake index:create --trace ** Invoke index:create (first_time) ** Execute index:create rake aborted! [500] {"error":"IndexCreationException[[pelias] failed to create index]; nested: FailedToResolveConfigException[Failed to resolve config path [/vagrant/tmp/pelias/config/synonyms.txt], tried file path [/vagrant/tmp/pelias/config/synonyms.txt], path file [/vagrant/tmp/elasticsearch-1.1.0/config/vagrant/tmp/pelias/config/synonyms.txt], and classpath]; ","status":500} /home/vagrant/.rvm/gems/ruby-2.1.1/gems/elasticsearch-transport-1.0.1/lib/elasticsearch/transport/transport/base.rb:132:in __raise_transport_error'
/home/vagrant/.rvm/gems/ruby-2.1.1/gems/elasticsearch-transport-1.0.1/lib/elasticsearch/transport/transport/base.rb:227:in perform_request' /home/vagrant/.rvm/gems/ruby-2.1.1/gems/elasticsearch-transport-1.0.1/lib/elasticsearch/transport/transport/http/faraday.rb:20:in perform_request'
/home/vagrant/.rvm/gems/ruby-2.1.1/gems/elasticsearch-transport-1.0.1/lib/elasticsearch/transport/client.rb:102:in perform_request' /home/vagrant/.rvm/gems/ruby-2.1.1/gems/elasticsearch-api-1.0.1/lib/elasticsearch/api/namespace/common.rb:21:in perform_request'
/home/vagrant/.rvm/gems/ruby-2.1.1/gems/elasticsearch-api-1.0.1/lib/elasticsearch/api/actions/indices/create.rb:77:in create' /vagrant/tmp/pelias/lib/pelias/tasks/index.rake:9:in block (2 levels) in <top (required)>'
/home/vagrant/.rvm/gems/ruby-2.1.1/gems/rake-10.1.1/lib/rake/task.rb:236:in call' /home/vagrant/.rvm/gems/ruby-2.1.1/gems/rake-10.1.1/lib/rake/task.rb:236:in block in execute'
/home/vagrant/.rvm/gems/ruby-2.1.1/gems/rake-10.1.1/lib/rake/task.rb:231:in each' /home/vagrant/.rvm/gems/ruby-2.1.1/gems/rake-10.1.1/lib/rake/task.rb:231:in execute'
/home/vagrant/.rvm/gems/ruby-2.1.1/gems/rake-10.1.1/lib/rake/task.rb:175:in block in invoke_with_call_chain' /home/vagrant/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/monitor.rb:211:in mon_synchronize'
/home/vagrant/.rvm/gems/ruby-2.1.1/gems/rake-10.1.1/lib/rake/task.rb:168:in invoke_with_call_chain' /home/vagrant/.rvm/gems/ruby-2.1.1/gems/rake-10.1.1/lib/rake/task.rb:161:in invoke'
/home/vagrant/.rvm/gems/ruby-2.1.1/gems/rake-10.1.1/lib/rake/application.rb:149:in invoke_task' /home/vagrant/.rvm/gems/ruby-2.1.1/gems/rake-10.1.1/lib/rake/application.rb:106:in block (2 levels) in top_level'
/home/vagrant/.rvm/gems/ruby-2.1.1/gems/rake-10.1.1/lib/rake/application.rb:106:in each' /home/vagrant/.rvm/gems/ruby-2.1.1/gems/rake-10.1.1/lib/rake/application.rb:106:in block in top_level'
/home/vagrant/.rvm/gems/ruby-2.1.1/gems/rake-10.1.1/lib/rake/application.rb:115:in run_with_threads' /home/vagrant/.rvm/gems/ruby-2.1.1/gems/rake-10.1.1/lib/rake/application.rb:100:in top_level'
/home/vagrant/.rvm/gems/ruby-2.1.1/gems/rake-10.1.1/lib/rake/application.rb:78:in block in run' /home/vagrant/.rvm/gems/ruby-2.1.1/gems/rake-10.1.1/lib/rake/application.rb:165:in standard_exception_handling'
/home/vagrant/.rvm/gems/ruby-2.1.1/gems/rake-10.1.1/lib/rake/application.rb:75:in run' /home/vagrant/.rvm/gems/ruby-2.1.1/gems/rake-10.1.1/bin/rake:33:in <top (required)>'
/home/vagrant/.rvm/gems/ruby-2.1.1/bin/rake:23:in load' /home/vagrant/.rvm/gems/ruby-2.1.1/bin/rake:23:in

'
/home/vagrant/.rvm/gems/ruby-2.1.1/bin/ruby_executable_hooks:15:in eval' /home/vagrant/.rvm/gems/ruby-2.1.1/bin/ruby_executable_hooks:15:in '
Tasks: TOP => index:create
vagrant@vagrant-base-precise-amd64:/vagrant/tmp/pelias$ bundle exec rake index:create
rake aborted!
[500] {"error":"IndexCreationException[[pelias] failed to create index]; nested: FailedToResolveConfigException[Failed to resolve config path [/vagrant/tmp/pelias/config/synonyms.txt], tried file path [/vagrant/tmp/pelias/config/synonyms.txt], path file [/vagrant/tmp/elasticsearch-1.1.0/config/vagrant/tmp/pelias/config/synonyms.txt], and classpath]; ","status":500}
/home/vagrant/.rvm/gems/ruby-2.1.1/gems/elasticsearch-transport-1.0.1/lib/elasticsearch/transport/transport/base.rb:132:in

__raise_transport_error' /home/vagrant/.rvm/gems/ruby-2.1.1/gems/elasticsearch-transport-1.0.1/lib/elasticsearch/transport/transport/base.rb:227:in

perform_request'
/home/vagrant/.rvm/gems/ruby-2.1.1/gems/elasticsearch-transport-1.0.1/lib/elasticsearch/transport/transport/http/faraday.rb:20:in

perform_request' /home/vagrant/.rvm/gems/ruby-2.1.1/gems/elasticsearch-transport-1.0.1/lib/elasticsearch/transport/client.rb:102:in

perform_request'
/home/vagrant/.rvm/gems/ruby-2.1.1/gems/elasticsearch-api-1.0.1/lib/elasticsearch/api/namespace/common.rb:21:in

perform_request' /home/vagrant/.rvm/gems/ruby-2.1.1/gems/elasticsearch-api-1.0.1/lib/elasticsearch/api/actions/indices/create.rb:77:in

create'
/vagrant/tmp/pelias/lib/pelias/tasks/index.rake:9:in block (2 levels) in <top (required)>' /home/vagrant/.rvm/gems/ruby-2.1.1/bin/ruby_executable_hooks:15:in eval'
/home/vagrant/.rvm/gems/ruby-2.1.1/bin/ruby_executable_hooks:15:in <main>' Tasks: TOP => index:create (See full trace by running task with --trace)

Elastis Search complains with the following traceback:

[2014-04-14 20:55:23,048][DEBUG][action.admin.indices.create] [Patsy Hellstrom] [pelias] failed to create
org.elasticsearch.indices.IndexCreationException: [pelias] failed to create index
    at org.elasticsearch.indices.InternalIndicesService.createIndex(InternalIndicesService.java:300)
    at org.elasticsearch.cluster.metadata.MetaDataCreateIndexService$2.execute(MetaDataCreateIndexService.java:343)
    at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:308)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:134)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:701)
Caused by: org.elasticsearch.env.FailedToResolveConfigException: Failed to resolve config path [/vagrant/tmp/pelias/config/synonyms.txt], tried file path [/vagrant/tmp/pelias/config/synonyms.txt], path file [/vagrant/tmp/elasticsearch-1.1.0/config/vagrant/tmp/pelias/config/synonyms.txt], and classpath
    at org.elasticsearch.env.Environment.resolveConfig(Environment.java:207)
    at org.elasticsearch.index.analysis.Analysis.getReaderFromFile(Analysis.java:270)
    at org.elasticsearch.index.analysis.SynonymTokenFilterFactory.<init>(SynonymTokenFilterFactory.java:66)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:534)
    at org.elasticsearch.common.inject.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:54)
    at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:86)
    at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:98)
    at org.elasticsearch.common.inject.FactoryProxy.get(FactoryProxy.java:52)
    at org.elasticsearch.common.inject.InjectorImpl$5$1.call(InjectorImpl.java:781)
    at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:837)
    at org.elasticsearch.common.inject.InjectorImpl$5.get(InjectorImpl.java:777)
    at org.elasticsearch.common.inject.assistedinject.FactoryProvider2.invoke(FactoryProvider2.java:221)
    at com.sun.proxy.$Proxy18.create(Unknown Source)
    at org.elasticsearch.index.analysis.AnalysisService.<init>(AnalysisService.java:151)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:534)
    at org.elasticsearch.common.inject.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:54)
    at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:86)
    at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:98)
    at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:45)
    at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:837)
    at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:42)
    at org.elasticsearch.common.inject.Scopes$1$1.get(Scopes.java:57)
    at org.elasticsearch.common.inject.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:45)
    at org.elasticsearch.common.inject.SingleParameterInjector.inject(SingleParameterInjector.java:42)
    at org.elasticsearch.common.inject.SingleParameterInjector.getAll(SingleParameterInjector.java:66)
    at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:85)
    at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:98)
    at org.elasticsearch.common.inject.FactoryProxy.get(FactoryProxy.java:52)
    at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:45)
    at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:837)
    at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:42)
    at org.elasticsearch.common.inject.Scopes$1$1.get(Scopes.java:57)
    at org.elasticsearch.common.inject.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:45)
    at org.elasticsearch.common.inject.SingleParameterInjector.inject(SingleParameterInjector.java:42)
    at org.elasticsearch.common.inject.SingleParameterInjector.getAll(SingleParameterInjector.java:66)
    at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:85)
    at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:98)
    at org.elasticsearch.common.inject.FactoryProxy.get(FactoryProxy.java:52)
    at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:45)
    at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:837)
    at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:42)
    at org.elasticsearch.common.inject.Scopes$1$1.get(Scopes.java:57)
    at org.elasticsearch.common.inject.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:45)
    at org.elasticsearch.common.inject.InjectorBuilder$1.call(InjectorBuilder.java:200)
    at org.elasticsearch.common.inject.InjectorBuilder$1.call(InjectorBuilder.java:193)
    at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:830)
    at org.elasticsearch.common.inject.InjectorBuilder.loadEagerSingletons(InjectorBuilder.java:193)
    at org.elasticsearch.common.inject.InjectorBuilder.injectDynamically(InjectorBuilder.java:175)
    at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:110)
    at org.elasticsearch.common.inject.InjectorImpl.createChildInjector(InjectorImpl.java:131)
    at org.elasticsearch.common.inject.ModulesBuilder.createChildInjector(ModulesBuilder.java:69)
    at org.elasticsearch.indices.InternalIndicesService.createIndex(InternalIndicesService.java:298)
    ... 6 more

[dupe] dbclient cleanup

pelias/dbclient#3

Distant Administrative area names should not appear in results unless they represent highly populated metropolitan areas [PL-CG04]

Example:

request for Soho (+NY geo bias)
results should contain Soho, New York first
results should also contain Soho, London (assuming that's a highly populated/popular location)

Parse addresses from input string more effectively [PL-GG12]

Support addresses from different locales.

related modules:

https://github.com/openvenues/address_normalizer

Elasticsearch plugin [PL-GG11]

Clean up trello tickets for the work done by Francisco, consider rolling back master to previous stable version as HEAD is unstable.

related issues:

upgrade to nodejs 0.12.x

this ticket is for any issues associated with the upgrade to nodejs version 0.12.x.

Geographic bias can either be set by the device or through specifying an administrative area, eg "Hackney, UK" [PL-API13]

Common across all types of geocoders, but should be acceptance tested specifically for each one.

related issues:

TokenStream expanded to 384 finite strings. Only <= 256 finite strings are supported

Hello! I try to execute the following command:

sudo -u gis bundle exec rake quattroshapes:populate_locality ES_INLINE=1 --trace

But subsequently I have this error in the /var/log/elasticsearch/elasticsearch.log:

.....
java.lang.IllegalArgumentException: TokenStream expanded to 384 finite strings. Only <= 256 finite strings are supported
......

What the reason for the error? =(

TokenStream error: "Only <= 256 finite strings are supported"

These non-fatal errors are recurring and may be fixed with a config/schema/plugin update. @hkrishna may be able to provide more info.

[2015-02-26 16:58:53,415][DEBUG][action.bulk              ] [Demolition Man] [pelias][0] failed to execute bulk item (index) index {[pelias][osmnode][1974379318], source[{"center_point":{"lat":51.7543469,"lon":-0.3363454},"name":{"default":"St Peter's St o/s St Albans Tandoori"},"type":"node","alpha3":"GBR","admin1":"Hertfordshire","locality":"St Albans","neighborhood":"Porters Wood","admin0":"United Kingdom","admin2":"Hertfordshire","suggest":{"input":["st peter's st o/s st albans tandoori"],"output":"osmnode:1974379318","weight":6}}]}
java.lang.IllegalArgumentException: TokenStream expanded to 336 finite strings. Only <= 256 finite strings are supported
    at org.elasticsearch.search.suggest.completion.CompletionTokenStream.incrementToken(CompletionTokenStream.java:66)
    at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:618)
    at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:359)
    at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:318)
    at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:239)
    at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:457)
    at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1511)
    at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1246)
    at org.elasticsearch.index.engine.internal.InternalEngine.innerIndex(InternalEngine.java:594)
    at org.elasticsearch.index.engine.internal.InternalEngine.index(InternalEngine.java:522)
    at org.elasticsearch.index.shard.service.InternalIndexShard.index(InternalIndexShard.java:425)
    at org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:439)
    at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:150)
    at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:512)
    at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:419)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

street intersections

it would be nice to search for intersections like "corner of x and y".

this is particularly important for cities based off a grid like NYC and Melbourne.

ref: https://www.google.com/maps/search/4th+and+3rd/@40.7410986,-73.9888682,15z/data=!3m1!4b1
ref: https://www.google.com/maps/search/lonsdale+and+elizabeth/@-37.812981,144.9619207,17z/data=!3m1!4b1

reported by: @randymeech

Address de-duper should be part of the import pipeline

When we import any geospatial data into pelias index, they should undergo the following

de-duper: to make sure we dont have two points with the same name from one or more sources.

Ideally, a de-duper instance should be kept alive through out the import process across different sources (geonames, osm, quattroshapes etc) so that it catches duplicate names/ points before adding them to the ES index.

Score POIs by category [PL-PG07]

This is describing relative scores between categories (not within each category)

related issues:

pelias/openstreetmap#33

related to pelias/api#106

Implement acceptance test framework [PL-GG06]

Build framework that makes it

easy to execute and parse results
easy for everyone to contribute new test cases

This could be https://github.com/pelias/regression-tests or https://github.com/pelias/acceptance-tests or some combination of the two.
This test suite should be deployed upon every release and results should be published along with release notes.

Why Ruby?

If you want this to be an easy install for people then I don't believe Ruby is going to help.
Just look at the discourse.org project and all the troubles they have trying to roll out an easy to install server.

Is there anyway this could use maybe node.js or mono?

Street fallback

In cases where no street numbers exist for a certain street, or there are few numbers on that street it would be ideal to simply return the name of the street with a centroid of the polyline.

An example would be if for 'Main Street' we only had numbers 1,2 and 42. A user should still be able to type 'Main Street' and get the central point for that street, while also being able to search for "1 Main Street" etc.

Street segments may need to be re-assembled to accurately compute the centroid, or alternatively we could try to import the roads as a geoJSON polyline type.

reported by: @randymeech

Build Reporting [PL-IMO7]

Save metrics after each build so we can measure improvement/regression.

related issues:

https://trello.com/c/aMEkdmdH/132-discuss-build-reporting

Autocorrect query text [PL-PG03]

If user mistypes a known word in the dictionary it should be autocorrected.
Also search with Levenshtein distances threshold for names.

Needs to work across all endpoints.

Categorize POIs to facilitate result filtering [PL-PG06]

Categories will make reverse place geocoding a useful service.

This could be done in conjunction with adding popularity scores/tags.

Revisit API design [PL-API10]

Revisit API design

OSM administrative boundaries [PL-CG02]

Extract OSM admin polygons as an alternative to Quattroshapes.

related issues:

Reach out to internal users of API for additional requirements [PL-PG05]

Drew
Chuck
Robin
Routing team

Advanced Admin area scoring - taking population and popularity into account [PL-CG12]

Consider various possible scoring systems:

Score admin areas by population
Score admin areas by search frequency/popularity

There has been a lot of work being done around scoring and it's crucial that we use population and popularity information correctly - this helps with coarse geocoding without a geobias.

related issues:

#45
pelias-deprecated/admin-lookup#5
#51 takes popularity of the admin area it belongs to, into account. This issues aims at using population of the admin area into account as well.

Documented Email

The "get in contact" email in the README.md is bouncing emails sent to it.

Potential optimizations

This looks like a very cool project! I heard about it on Twitter and took a look...very impressive! I just wanted to drop off a few potential optimizations

Feel free to ignore any/all of these, especially if you've already tried them out. :)

Heap Usage

It was mentioned on twitter (and the readme) that pelias uses a bunch of heap. There are a few things you can do to reduce heap usage:

If the data is static after import, you could disable bloom filters on each index. The bloom filters are used to speed up indexing, but if the data is static, it just represents wasted heap space. Details about unloading are here (see the "tip").
Similarly, if the data is fairly static, you can probably reduce your primary shard count. Extra shards laying around will eat up heap space from Lucene overhead (term dictionaries, etc) and by reducing inverted index compression. If you reduce the shard count to 40 or even 20 primary shards, you will save some memory. This could potentially increase your query throughput, but may also increase the query latency (since queries will be mostly CPU bound by the geo_* filters, decreasing primary shards decreases how many machines participate in each query)

Ingestion speed

The readme says it takes three days ingest data? Is that all Elasticsearch or also other components? I would bump your bulk size to around 1000 docs. The dataset is 66m docs and 300gb, so each doc is probably around 4kb. Bumping the bulk size to 1000 docs will put you around 5-6mb per bulk, which is more optimal.
Do you spin up multiple import processes/threads? This can drastically reduce ingestion time

Query optimizations

The search looks to mostly be a demonstration, so you may not care about these, but a few easy wins:

You may consider enabling lat_lon for the geo_points. This indexes the lat/lon as individual fields, which enables the geo_* filters to execute ranges on the inverted index instead of field data. This is sometimes faster depending on the data involved.
The "closest" query can be restructured into a filtered query, which will potentially be faster. Something like:

{
    "query": {
        "filtered": {
           "filter": {
               "and": {
                  "filters": [
                     {"term": {"location.type": "search.type"}},
                     {"geo_distance": {}}
                  ]
               }
           }
        }
    }
}

This does a few things. First, it executes the Term as a filter instead of a query, which allows caching and should be faster. Second, it executes the Term filter before the expensive geo because of the And compound filter.

Unclear if this would be faster, however, since your usage of the top-level filter will do a good job limiting the number of documents that the geo will see. You might also try a hybrid approach like this:

{
    "query": {
        "filtered": {
           "filter": {
              "term": {"location.type": "search.type"}
           }
        }
    },
    "post_filter" : {
        "geo_distance: {}
    }
}

Basically identical to the existing query, except it uses a filter instead of a term query.

pelias / pelias Goto Github PK

pelias's Introduction

A modular, open-source search engine for our world.

Pelias

What's a geocoder do anyway?

... and a reverse geocoder, what's that?

What are the most interesting features of Pelias?

What are the main goals of the Pelias project?

Where did Pelias come from?

How does it work?

Data importers

Database

Frontend services

Dependencies

Helpful tools

Documentation

How can I install my own instance of Pelias?

Do you offer a free geocoding API?

What's it built with?

Contributing

pelias's People

Contributors

Stargazers

Watchers

Forkers

pelias's Issues

result :

But the Correct:

Heap Usage

Ingestion speed

Query optimizations

Recommend Projects

Recommend Topics

Recommend Org