openstreetmap / chef Goto Github PK
View Code? Open in Web Editor NEWChef configuration management repo for configuring & maintaining the OpenStreetMap servers.
License: Apache License 2.0
Chef configuration management repo for configuring & maintaining the OpenStreetMap servers.
License: Apache License 2.0
A new version of openstreetmap-carto has been released, with no change in dependencies.
A new version of openstreetmap-carto, v3.3.0, has been released.
The only deployment-related change is that the Hanazono font is now used for some characters outside the BMP. This can be obtained from the fonts-hanazono
package on Ubuntu and Debian.
There's a private OSMF Chef repository with various roles and/or cookbooks, for hysterical raisins. I don't have any access to this so I'm not sure about the details.
I'm interested in knowing if we can shut this down yet? I assume there are still cookbooks or roles in there with secrets hard-coded, or are there other reasons why some aspects can't be made public? Is there any other reasons to the private repo?
We currently use recaptcha on the wiki sites, but this leads to errors on non-wiki.openstreetmap.org sites. We should have per-site credential pairs via databags.
We have a PR for osm-carto (gravitystorm/openstreetmap-carto#2874) which might improve rendering performance, but it needs testing. My question is if you'd like to evaluate it on production and what prerequisites would be needed?
A new release has been tagged, v2.33.0 with no changes to dependencies
A new version of openstreetmap-carto, v2.45.0, has been released.
A new release has been made, v2.39.0
A new version of openstreetmap-carto, v3.3.1, has been released.
This version fixes a regression in the rendering of intermittent streams.
I am quite sure, that after
gravitystorm/openstreetmap-carto#1461
downloading ne_10m_populated_places.zip is not needed anymore
But it is still downloaded:
Lines 65 to 70 in 650d244
A new release of openstreetmap-carto has been tagged, and is ready for deployment.
A new version of openstreetmap-carto, v2.38.0, has been released
The Table of Contents section on some wiki pages is too long. As per the discussion at https://wiki.openstreetmap.org/wiki/Talk:Wiki#Limiting_TOCs there needs to be a change to the stylesheet so that the Template:TOC_limit is effective.
A new version of openstreetmap-carto, 2.42.0, has been released.
https://github.com/openstreetmap/chef/blob/master/cookbooks/tilecache/templates/default/nginx_tile_ssl.conf.erb doesn't currently enable verification of the OCSP staple.
We should enable nginx parameter "ssl_stapling_verify on".
ssl_stapling_verify also requires we correctly set ssl_trusted_certificate.
A new version of openstreetmap-carto, v2.43.0, has been released.
v2.31.0 has been tagged, without changes to dependencies.
A new version of openstreetmap-carto, v2.45.1, has been released.
On behalf of the openstreetmap-carto project, we'd like to see the OSMF tileservers upgraded to using mapnik3. We are confident that our stylesheet works on both mapnik2 and mapnik3 and are happy to make any tweaks that are uncovered.
Most importantly, there's a step-change improvement in text rendering quality for non-latin scripts just by upgrading.
A new version of openstreetmap-carto, v3.2.0, has been released.
A new openstreetmap-carto release has been tagged, v2.30.0
No infrastructure changes are required.
A new version of openstreetmap-carto, v4.4.0, has been released.
For setting up the nominatim DB slave, I have to remove the data in postgres' data dir including server.crt/.key and recovery.conf. It would be nice to be able to recover these files with chef after the base backup is done. However, starting chef at this point goes horribly wrong because chef starts up the postgres server before these files are copied back, generally destroying the database replica in the process.
Anything we can do about this?
openstreetmap-carto has now released v2.32.0 . Note that this requires a new shapefile, but this is already in the tileserver role with cecd219
From #78 (comment)
The rendering machines are, currently, completely independent. This is great for redundancy and fail-over, as they are effectively the same. However, it means duplication of tiles stored on disk and tiles rendered. Duplication of tiles on disk is somewhat desirable in the case of fail-over, but duplicating the renders is entirely pointless.
Adding a 3rd server, therefore, is unlikely to reduce load by 1/3rd on the existing servers from rendering. However, a lot of the load comes from serving still-fresh tiles off disk to "back-stop" the CDN, which would be split amongst the servers (sort of evenly).
What would be great, as @pnorman and I were discussing the other day, is a way to "broadcast" rendered tiles in a PUB-SUB fashion amongst the rendering servers so that they can opportunistically fill their own caches with work from other machines. At the moment, it's no more than an idea, but it seems like a feasible change to renderd.
Currently the two servers are independent, and clients go to one based on geoip. This means that the rendering workload is not fully duplicated between the two servers, as users in the US tend to view tiles in the US and users in Germany tend to view tiles in Germany. This has been tested by swapping locations and seeing an increase in load.
Unfortunately, this doesn't scale well to higher numbers of servers.
chef/cookbooks/networking/templates/default/resolv.conf.erb
Lines 5 to 7 in 420bef3
<% node[:networking][:nameservers].each do |nameserver| -%>
nameserver <%= nameserver %>
<% end -%>
This means that by default, any recipe that depends on networking
gets no nameservers and generally fails on the first attempt to apt-get. This makes testing the cookbooks tedious since you need to explicitly set nameserver attributes for any cookbook that has networking::default somewhere in its dependencies.
There are (at least) a couple of options:
A new version of openstreetmap-carto, v3.1.0, has been released.
The chef script which installs Nominatim currently has hard-coded URLs and a few other parameters:
https://github.com/openstreetmap/chef/blob/master/cookbooks/nominatim/recipes/base.rb
It would be excellent if this could be generalised so that someone wanting to install their own Nominatim could set the relevant values for their installation (e.g. domain name) and then run the recipe.
We currently have a bash script for installing our Nominatim:
https://github.com/cyclestreets/nominatim-install/blob/master/run.sh
but if a standard chef recipe were available we would probably be able to deprecate that.
We've noticed that the tileservers use the default .style file from osm2pgsql when processing updates (by not specifying one, it defaults to the default.style)
The openstreetmap-carto project includes a style file for use with osm2pgsql. This ensures that the database layout matches what the stylesheets expect, as well as allowing use of arbitrary osm2pgsql versions instead of forcing people to upgrade to the latest if columns get changed.
I believe there's no substantial differences between the master version in osm2pgsql and that in the latest openstreetmap-carto release, so this is not currently a major issue. But it might be worth changing the update script (and the import script, if there is one).
A new version of openstreetmap-carto, v4.1.0, has been released.
Our squid cookbook depends on squid 2.7, but that's no longer available in Ubuntu so we package our own version in a PPA:
https://launchpad.net/~osmadmins/+archive/ubuntu/ppa/+packages
The package claims to replace "squid3" which I guess is how the package "squid"
part of the recipe is supposed to work. However, I can't get it to install on a xenial image. I've got a WIP test-kitchen config at gravitystorm@40da20e . When it runs, it install squid 3.5.12 . Uninstalling, apt-get update, reinstalling leads to the same place. I can only get it working by uninstalling squid, then running:
sudo apt-get install squid=2.7.STABLE9-4ubuntu10 squid-common=2.7.STABLE9-4ubuntu10
Can anyone shine a light onto this problem? What steps need to be taken to get the correct version of squid installed?
A new version of openstreetmap-carto, v2.41.0, has been released
The list of font packages has changed.
On Ubuntu 16.04 the list is
fonts-dejavu-core fonts-droid-fallback ttf-unifont \
fonts-sipa-arundina fonts-sil-padauk fonts-khmeros \
fonts-beng-extra fonts-gargi fonts-taml-tscu fonts-tibetan-machine
On Ubuntu 14.04 the list is
fonts-dejavu-core fonts-droid ttf-unifont \
fonts-sipa-arundina fonts-sil-padauk fonts-khmeros \
fonts-beng-extra fonts-gargi fonts-taml-tscu fonts-tibetan-machine
A new version of openstreetmap-carto, v2.34.0, has been released with no dependency changes.
OpenStreetMap Carto's next release with v3.0.0, which brings some changes to dependencies. The OSMF tile servers meet the difficult ones (Mapnik 3), but some of the others might need minor changes
project.mml
is no longer a generated file, but passed directly to CartoCSS.I believe the CartoCSS version change is the only one which the OSMF servers might not already meet.
The request logging we do on the tile caches has a number of problems and could do with some improvement.
Each request is actually logged two, or sometimes three times, which is wasteful of I/O time and disk space on the caches. On top of which the logs that we recover to our central store are missing some important details.
The logs we currently generate are:
squid/access.log
standard squid access log with no UA or referersquid/zere.log
added for @zerebubuth's analysis stuff has UA but no referer and is recovered to ironbellynginx/access.log
for https requests only and generally more detailed than squid logs with the UA and referer includedI would like to change the squid access log to include the UA and referer and whatever else @zerebubuth needs and drop the special zere log and potentially drop the nginx logs as well so long as it passes through the real IP and squid can be made to log that.
Many cookbooks use the munin_plugin
provider, but this has a compile-time dependency on the service[munin-node]
declaration. This makes it hard to test cookbooks (e.g. squid) independently.
The basic workaround is to add "include_recipe[munin::default]" to every cookbook that uses the munin_plugin
resource. But that's not ideal since it slows down all the tests (and squid doesn't actually depend on munin being installed) and feels a bit icky.
Instead it would be better to allow cookbooks to call munin_plugin
without having the whole of munin::default
pulled in too. This could be achieved by inverting the notification, i.e. make service[munin]
subscribe to the munin_plugin, but that's not straightforward due to the restart_munin attribute, and also subscribing to a particular munin_plugin invocation.
I propose using a dummy resource, to decouple the munin_plugin notifications from the service[munin] subscriptions, something like:
change the after_created in munin/resources/plugin.rb:
def after_created
notifies :run , "execute[plugin-requires-munin-restart]" if restart_munin
end
somewhere:
# This is a dummy resource for other resources to subscribe to
execute 'plugin-requires-munin-restart' do
command 'date'
action :nothing
end
in munin/recipes/default.rb:
service[munin] do
[...]
subscribes :run, "execute[plugin-requires-munin-restart", :delayed
end
Thoughts? Is there an easier way to decouple the plugins from the service definition?
According to @cquest the OSM FR servers gain over 25%-50% rendering throughput when they recluster them after about a year by reducing table and index bloat. This can be done without a full outage and only stopping updates, but we probably want to wait until #79 is done to increase capacity. The reclustering depends on database IO and CPU, which are not maxed out.
The overall pan is to create a new copy of tables, build new indexes, then replace old tables. Because update frequency is more important for osm.org than other hosts, I'd recommend doing it slightly differently. Instead of reclustering all the tables, do one table, resume updates and let them catch up, do another, etc.
Starting with the points table and progressing by table size minimizes the disk usage. I think there's enough free space that it doesn't matter, but this is a best practice.
My recommendation is the following is done on both servers, starting with whichever has gone the longest since the initial import
Record the results of \dt+
and \di+
. It would also be useful to have the results of the following SQL for future planning purposes
SELECT CORR(page,geohash)
FROM (
SELECT
(ctid::text::point)[0] AS page,
rank() OVER (ORDER BY St_GeoHash(st_transform(way,4326))) AS geohash
FROM planet_osm_point
) AS s; -- area server result .93, takes 461s
SELECT CORR(page,geohash)
FROM (
SELECT
(ctid::text::point)[0] AS page,
rank() OVER (ORDER BY St_GeoHash(st_transform(way,4326))) AS geohash
FROM planet_osm_roads
) AS s; -- area server result .58, takes 119s
SELECT CORR(page,geohash)
FROM (
SELECT
(ctid::text::point)[0] AS page,
rank() OVER (ORDER BY St_GeoHash(st_transform(way,4326))) AS geohash
FROM planet_osm_line
) AS s;
SELECT CORR(page,geohash)
FROM (
SELECT
(ctid::text::point)[0] AS page,
rank() OVER (ORDER BY St_GeoHash(st_transform(way,4326))) AS geohash
FROM planet_osm_polygon
) AS s;
Stop updates and make a backup of the state file.
Start by creating a schema to do work in
CREATE SCHEMA IF NOT EXISTS recluster;
Starting with the smallest table, recluster it into the new schema.
\timing
SET search_path TO recluster,"$user",public;
CREATE TABLE planet_osm_point AS
SELECT * FROM public.planet_osm_point
ORDER BY ST_GeoHash(ST_Transform(ST_Envelope(way),4326),10) COLLATE "C";
Create indexes. The indexes here are the recommended ones for OpenStreetMap Carto. If you want to use others you can.
\timing
SET search_path TO recluster,"$user",public;
CREATE INDEX planet_osm_point_place
ON planet_osm_point USING GIST (way)
WHERE place IS NOT NULL AND name IS NOT NULL;
CREATE INDEX planet_osm_point_index
ON planet_osm_point USING GIST (way);
CREATE INDEX planet_osm_point_pkey
ON planet_osm_point (osm_id);
Replace the table in the public schema in a transaction, keeping the old one
CREATE SCHEMA IF NOT EXISTS backup;
BEGIN;
ALTER TABLE public.planet_osm_point
SET SCHEMA backup;
ALTER TABLE recluster.planet_osm_point
SET SCHEMA public;
COMMIT;
Verify that tiles are still rendering
Drop the old table
DROP TABLE backup.planet_osm_point;
Resume updates. When updates are done, repeat for the other three rendering tables
For planet_osm_roads
\timing
SET search_path TO recluster,"$user",public;
CREATE TABLE planet_osm_roads AS
SELECT * FROM public.planet_osm_roads
ORDER BY ST_GeoHash(ST_Transform(ST_Envelope(way),4326),10) COLLATE "C";
CREATE INDEX planet_osm_roads_admin
ON planet_osm_roads USING GIST (way)
WHERE boundary = 'administrative';
CREATE INDEX planet_osm_roads_roads_ref
ON planet_osm_roads USING GIST (way)
WHERE highway IS NOT NULL AND ref IS NOT NULL;
CREATE INDEX planet_osm_roads_admin_low
ON planet_osm_roads USING GIST (way)
WHERE boundary = 'administrative' AND admin_level IN ('0', '1', '2', '3', '4');
CREATE INDEX planet_osm_roads_index
ON planet_osm_roads USING GIST (way);
CREATE INDEX planet_osm_roads_pkey
ON planet_osm_roads (osm_id);
BEGIN;
ALTER TABLE public.planet_osm_roads
SET SCHEMA backup;
ALTER TABLE recluster.planet_osm_roads
SET SCHEMA public;
COMMIT;
Test, then
\timing
DROP TABLE backup.planet_osm_roads;
For planet_osm_line, resume updates, wait for updates to catch up, then
\timing
SET search_path TO recluster,"$user",public;
CREATE TABLE planet_osm_line AS
SELECT * FROM public.planet_osm_line
ORDER BY ST_GeoHash(ST_Transform(ST_Envelope(way),4326),10) COLLATE "C";
CREATE INDEX planet_osm_line_ferry
ON planet_osm_line USING GIST (way)
WHERE route = 'ferry';
CREATE INDEX planet_osm_line_river
ON planet_osm_line USING GIST (way)
WHERE waterway = 'river';
CREATE INDEX planet_osm_line_name
ON planet_osm_line USING GIST (way)
WHERE name IS NOT NULL;
CREATE INDEX planet_osm_line_index
ON planet_osm_line USING GIST (way);
CREATE INDEX planet_osm_line_pkey
ON planet_osm_line (osm_id);
BEGIN;
ALTER TABLE public.planet_osm_line
SET SCHEMA backup;
ALTER TABLE recluster.planet_osm_line
SET SCHEMA public;
COMMIT;
Test then
DROP TABLE backup.planet_osm_line;
Polygons will take the longest. Resume updates and let them catch up, then stop them and
\timing
SET search_path TO recluster,"$user",public;
CREATE TABLE planet_osm_line AS
SELECT * FROM public.planet_osm_line
ORDER BY ST_GeoHash(ST_Transform(ST_Envelope(way),4326),10) COLLATE "C";
CREATE INDEX planet_osm_polygon_military
ON planet_osm_polygon USING GIST (way)
WHERE landuse = 'military';
CREATE INDEX planet_osm_polygon_nobuilding
ON planet_osm_polygon USING GIST (way)
WHERE building IS NULL;
CREATE INDEX planet_osm_polygon_name
ON planet_osm_polygon USING GIST (way)
WHERE name IS NOT NULL;
CREATE INDEX planet_osm_polygon_way_area_z6
ON planet_osm_polygon USING GIST (way)
WHERE way_area > 59750;
CREATE INDEX planet_osm_polygon_index
ON planet_osm_polygon USING GIST (way);
CREATE INDEX planet_osm_polygon_pkey
ON planet_osm_polygon (osm_id);
BEGIN;
ALTER TABLE public.planet_osm_polygon
SET SCHEMA backup;
ALTER TABLE recluster.planet_osm_polygon
SET SCHEMA public;
COMMIT;
Test then
\timing
DROP TABLE backup.planet_osm_polygon;
Resume rendering and clean up with
DROP SCHEMA recluster;
DROP SCHEMA backup;
\dt+
and \di+
again.Ref: http://paulnorman.ca/blog/2016/06/improving-speed-with-reclustering/
REINDEX
statement which is much simpler.maintainance_work_mem
should probably be increasedIn case of a problem a rollback can be done by restoring the table from the backup schema.
If diffs are mistakenly restarted early the state file needs to be reset and diffs re-run.
The OpenStreetMap Carto Lua branch which will require a reimport with hstore is not out of development. We have a few open issues before we can merge and are lacking in deveoper time for these issues. Once the lua branch is merged we will still be releasing 2.x releases which will work with the old database to allow time to change over.
Doing a reimport with the current settings is in some ways better, but requires either a full outage of the server, a fair amount of database disk space, or the possibility of updates being down for an extended time[1], and the certainty that updates will be stopped for about a day.
[1] If the old DB slim tables are dropped this saves room, but stops any updates on the old DB
I'm running a test on the server for testing old-style multipolygons. It's got faster single-threaded performance and absurdly faster drives, but it should give an indication. I'll add times when it's done.
A new version of openstreetmap-carto, v3.0.0, has been released.
Deployment-related this release are
carto -a "3.0.0"
The apt::default recipe expects all nodes to have a country attribute set, and will fail without it.
It would be better if either:
@Firefishy says we aren't backing up the munin data, and we should...
A new version of openstreetmap-carto, v2.44.1, has been released.
Deployment related changes are a new recommendation for a minimum freetype version and listing font packages separately rather than relying on a metapackage and recommends. Both are already done on the OSMF servers.
A new version of openstreetmap-carto, v4.2.0, has been released.
It's easier to contribute when you're confident that your PR won't fail spectacularly.
From experience elsewhere I can recommend the following:
They all cover different aspects of the cookbooks so I'd suggest using all of them.
To start I'd suggest making rubocop and foodcritic config files to turn off things that we're currently relying on. Then would be to get test-kitchen working for various cookbooks.
The PostgreSQL GUC track_activity_query_size
specifies the number of bytes reserved to track the currently executing command for each active session for pg_stat_activity, with a default of 1024.
Queries in most Mapnik stylesheets exceed this, with about a quarter of OpenStreetMap Carto's being over this limit. In gravitystorm/openstreetmap-carto#2316 I'm looking at ways to better identify the layer and zoom, but to get the full query this GUC needs to be increased.
I recommend increasing it to 16384, so if we have a slow query or stuck query we can see what it is to EXPLAIN it or debug it locally. The longest current query is 9818 bytes before Mapnik inserts additional text.
This would cost 15kb more memory per connection slot.
A new version of openstreetmap-carto, v4.3.0, has been released.
A new version of openstreetmap-carto has been released, v2.36.0 and it contains significant changes to the road colours, among the 135 changes since v2.35.0
A new version of openstreetmap-carto, v2.44.0, has been released.
This version changes the fonts requried, which are documented in the readme.
A new version of openstreetmap-carto, v2.40.0, has been released
The repository needs a licence file. I'm not sure what licence actually applies though, are there any constraints? If not then Apache 2.0 is commonly used for chef cookbooks.
$ curl -I http://planet.openstreetmap.org/tile_logs/tiles-2015-02-03.txt.xz
HTTP/1.1 200 OK
Date: Thu, 05 Feb 2015 19:44:58 GMT
Server: Apache/2.4.7 (Ubuntu)
Last-Modified: Thu, 05 Feb 2015 07:30:10 GMT
ETag: "4d74e0-50e52461df99e"
Accept-Ranges: bytes
Content-Length: 5076192
Vary: Accept-Encoding
Access-Control-Allow-Origin: *
Content-Type: text/plain; charset=utf-8
should be application/x-xz
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.