Coder Social home page Coder Social logo

igor-suhorukov / openstreetmap_h3 Goto Github PK

View Code? Open in Web Editor NEW
90.0 5.0 7.0 6.14 MB

OSM planet dump high performance data loader. Transform OpenStreetMap World/Region PBF dump into partitioned by H3 regions PostGIS pgsnapshot (lossless) OSM schema representation and/or into ArrowIPC/Parquet dumps

Home Page: https://habr.com/en/post/717408

License: Apache License 2.0

Dockerfile 1.98% Shell 0.40% Java 89.87% PLpgSQL 7.76%
apache-arrow arrow duckdb geometry-processing geospatial java openstreetmap parquet pbf postgis postgresql world citusdb column-store apache-spark converter parquet-files pbf-format apach-sedona

openstreetmap_h3's Introduction

openstreetmap_h3

OSM planet dump high performance data loader. Transform OpenStreetMap World/Region PBF dump into partitioned by H3 regions PostGIS pgsnapshot (lossless) OSM schema representation and/or into ArrowIPC/Parquet dumps with schema. "columnar_storage" option activate Citus columnar storage in PostgreSQL to speedup aggregation queries.

More details in blog posts...

image

image

Build and run in Docker

git clone https://github.com/igor-suhorukov/openstreetmap_h3.git
cd openstreetmap_h3 && docker build -t openstreetmap_h3 .
cd postgis_docker-master && docker build -t postgres15_postgis .

wget https://download.geofabrik.de/europe/cyprus-latest.osm.pbf
docker run -it --rm -w $(pwd) -v $(pwd):/$(pwd) -v /var/run/docker.sock:/var/run/docker.sock openstreetmap_h3:latest -source_pbf $(pwd)/cyprus-latest.osm.pbf -result_in_tsv true
docker run --name postgis15-cyprus --memory=12g --memory-swap=12g --memory-swappiness 0 --shm-size=1g -v $(pwd)/database:/var/lib/postgresql/data -v $(pwd)/cyprus-latest_loc_ways:/input -e POSTGRES_PASSWORD=osmworld -d -p 5432:5432 postgres15_postgis:latest -c checkpoint_timeout='15 min' -c checkpoint_completion_target=0.9 -c shared_buffers='4096 MB' -c wal_buffers=-1 -c bgwriter_delay=200ms -c bgwriter_lru_maxpages=100 -c bgwriter_lru_multiplier=2.0 -c bgwriter_flush_after=0 -c max_wal_size='32768 MB' -c min_wal_size='16384 MB'

Build and run in JVM:

~/dev/projects/oss_contrib/openstreetmap_h3$ mvn install
[INFO] Scanning for projects...
[INFO] 
[INFO] -------< com.github.igor-suhorukov:osm-to-pgsnapshot-schema-ng >--------
[INFO] Building osm-to-pgsnapshot-schema-ng 1.0-SNAPSHOT
[INFO] --------------------------------[ jar ]---------------------------------
[INFO] 
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ osm-to-pgsnapshot-schema-ng ---
[WARNING] Using platform encoding (UTF-8 actually) to copy filtered resources, i.e. build is platform dependent!
[INFO] Copying 12 resources
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ osm-to-pgsnapshot-schema-ng ---
[INFO] Changes detected - recompiling the module!
[WARNING] File encoding has not been set, using platform encoding UTF-8, i.e. build is platform dependent!
[INFO] Compiling 24 source files to /home/acc/dev/projects/oss_contrib/openstreetmap_h3/target/classes
[INFO] 
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ osm-to-pgsnapshot-schema-ng ---
[WARNING] Using platform encoding (UTF-8 actually) to copy filtered resources, i.e. build is platform dependent!
[INFO] skip non existing resourceDirectory /home/acc/dev/projects/oss_contrib/openstreetmap_h3/src/test/resources
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ osm-to-pgsnapshot-schema-ng ---
[INFO] No sources to compile
[INFO] 
[INFO] --- maven-surefire-plugin:2.12.4:test (default-test) @ osm-to-pgsnapshot-schema-ng ---
[INFO] No tests to run.
[INFO] 
[INFO] --- maven-jar-plugin:2.4:jar (default-jar) @ osm-to-pgsnapshot-schema-ng ---
[INFO] Building jar: /home/acc/dev/projects/oss_contrib/openstreetmap_h3/target/osm-to-pgsnapshot-schema-ng-1.0-SNAPSHOT.jar
[INFO] 
[INFO] --- spring-boot-maven-plugin:2.7.3:repackage (default) @ osm-to-pgsnapshot-schema-ng ---
[INFO] Layout: ZIP
[INFO] Replacing main artifact with repackaged archive
[INFO] 
[INFO] --- maven-install-plugin:2.4:install (default-install) @ osm-to-pgsnapshot-schema-ng ---
[INFO] Installing /home/acc/dev/projects/oss_contrib/openstreetmap_h3/target/osm-to-pgsnapshot-schema-ng-1.0-SNAPSHOT.jar to /home/acc/.m2/repository/com/github/igor-suhorukov/osm-to-pgsnapshot-schema-ng/1.0-SNAPSHOT/osm-to-pgsnapshot-schema-ng-1.0-SNAPSHOT.jar
[INFO] Installing /home/acc/dev/projects/oss_contrib/openstreetmap_h3/pom.xml to /home/acc/.m2/repository/com/github/igor-suhorukov/osm-to-pgsnapshot-schema-ng/1.0-SNAPSHOT/osm-to-pgsnapshot-schema-ng-1.0-SNAPSHOT.pom
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  29.592 s
[INFO] Finished at: 2023-01-29T11:47:33+03:00
[INFO] ------------------------------------------------------------------------

Usage details:

~/dev/projects/oss_contrib/openstreetmap_h3$ java -jar target/osm-to-pgsnapshot-schema-ng-1.0-SNAPSHOT.jar
The following option is required: [-source_pbf]
Usage: <main class> [options]
  Options:
  * -source_pbf
      Source path for OpenStreetMap data in PBF format
    -collect_only_statistics
      Collect only statistics from data - partition distribution
      Default: false
    -columnar_storage
      Use columnar storage in PostgreSql tables for nodes/ways/multipolygon
      Default: false
    -data_partition_ratio
      Filling ratio from maximum size of partition. This parameter change 
      PostgreSQL partitions count
      Default: 0.48
    -help
      Information about command line parameters
    -pg_script_count
      Script count for PostgreSQL parallel COPY
      Default: 4
    -preserve_all_nodes
      Preserve all nodes information in case of 'true' or only nodes with tags 
      or referenced from relations in other case
      Default: false
    -arrow_format
      In case of not null parameter save Arrow data serialization: ARROW_IPC 
      or PARQUET
      Possible Values: [PARQUET, ARROW_IPC]
    -result_in_tsv
      Save result data in TabSeparatedValue format for PostgreSQL COPY
      Default: true
    -scale_approx_calc
      Approximate scale calculation. Value 'false' - distance in meter
      Default: false
    -skip_buildings
      Skip any ways with 'building' tag
      Default: false
    -skip_highway
      Skip any ways with 'highway' tag
      Default: false
    -worker_threads
      Worker threads count for data processing
      Default: 4

Run:


~/dev/projects/oss_contrib/openstreetmap_h3$ java -jar target/osm-to-pgsnapshot-schema-ng-1.0-SNAPSHOT.jar -source_pbf /home/acc/dev/map/thailand/thailand-latest.osm.pbf

[ 0:00] Started osmium add-locations-to-ways
[ 0:00]   osmium version 1.14.0
[ 0:00]   libosmium version 2.18.0
[ 0:00] Command line options and default settings:
[ 0:00]   input options:
[ 0:00]     file names: 
[ 0:00]       thailand-latest.osm.pbf
[ 0:00]     file format: 
[ 0:00]   output options:
[ 0:00]     file name: thailand-latest_loc_ways.pbf
[ 0:00]     file format: pbf,pbf_compression=none
[ 0:00]     generator: osmium/1.14.0
[ 0:00]     overwrite: no
[ 0:00]     fsync: no
[ 0:00]   other options:
[ 0:00]     index type (for positive ids): sparse_mem_array
[ 0:00]     index type (for negative ids): flex_mem
[ 0:00]     keep untagged nodes: no
[ 0:00]     keep nodes that are relation members: yes
[ 0:00] 
[ 0:00] Getting all nodes referenced from relations...
[ 0:01] Found 7508 nodes referenced from relations.
[ 0:01] Copying input file 'thailand-latest.osm.pbf'...
[ 0:13] About 540 MBytes used for node location index (in main memory or on disk).
[ 0:13] Peak memory used: 2152 MBytes
[ 0:13] Done.

0
566  time 89572
diff between total and processing 45908
total thread time 358078
total processing time 312170
total save time 9689
total waiting for save time 23255
thread max time 2482
processing max time 2065
nodes 382390
ways 4117237
relations 18317
relation members 191294
multipolygon count 6995
0	[-32768,25648)	67392403	58416
1	[25648,25728)	46338970	80
2	[25728,25744)	87393905	16
3	[25744,25760)	41088683	16
4	[25760,25768)	204997591	8
5	[25768,25880)	78535020	112
6	[25880,25992)	41090100	112
7	[25992,26000)	84484353	8
8	[26000,26008)	17026978	8
9	[26008,26016)	135197249	8
10	[26016,32695)	223367	6679
11 0 73.06987445454546

[ 0:00] Started osmium export
[ 0:00]   osmium version 1.14.0
[ 0:00]   libosmium version 2.18.0
[ 0:00] Command line options and default settings:
[ 0:00]   input options:
[ 0:00]     file name: thailand-latest.osm.pbf
[ 0:00]     file format: 
[ 0:00]   output options:
[ 0:00]     file name: thailand-latest_loc_ways/multipolygon/source.tsv
[ 0:00]     file format: pg
[ 0:00]     overwrite: no
[ 0:00]     fsync: yes
[ 0:00]   attributes:
[ 0:00]     type:      @type
[ 0:00]     id:        @id
[ 0:00]     version:   (omitted)
[ 0:00]     changeset: (omitted)
[ 0:00]     timestamp: (omitted)
[ 0:00]     uid:       (omitted)
[ 0:00]     user:      (omitted)
[ 0:00]     way_nodes: (omitted)
[ 0:00]   output format options:
[ 0:00]     tags_type = hstore
[ 0:00]   linear tags: none
[ 0:00]   area tags:   one of the following:
[ 0:00]     @type=relation
[ 0:00]   other options:
[ 0:00]     index type: sparse_mem_array
[ 0:00]     add unique IDs: no
[ 0:00]     keep untagged features: no
[ 0:00] 
[ 0:00] Create table with something like this:
[ 0:00] CREATE EXTENSION IF NOT EXISTS hstore;
[ 0:00] CREATE TABLE osmdata (
[ 0:00]     geom      GEOMETRY, -- or GEOGRAPHY
[ 0:00]     osm_type  TEXT,
[ 0:00]     osm_id    BIGINT,
[ 0:00]     tags      hstore
[ 0:00] );
[ 0:00] Then load data with something like this:
[ 0:00] \copy osmdata FROM 'thailand-latest_loc_ways/multipolygon/source.tsv'
[ 0:00] 
[ 0:00] First pass (of two) through input file (reading relations)...
[ 0:01] First pass done.
[ 0:01] Second pass (of two) through input file...
Geometry error: Could not build area geometry
Geometry error: Could not build area geometry
Geometry error: Could not build area geometry
Geometry error: Could not build area geometry
Geometry error: Could not build area geometry
Geometry error: Could not build area geometry
Geometry error: Could not build area geometry
Geometry error: Could not build area geometry
Geometry error: Could not build area geometry
[ 0:20] About 540 MBytes used for node location index (in main memory or on disk).
[ 0:20] Second pass done.
[ 0:20] Wrote 156178 features.
[ 0:20] Encountered 9 errors.
[ 0:20] Peak memory used: 2134 MBytes
[ 0:20] Done.

cat /home/acc/dev/map/thailand/thailand-latest_loc_ways/multipolygon/source.tsv | grep $'\trelation\t' | split -l 1749 - /home/acc/dev/map/thailand/thailand-latest_loc_ways/multipolygon/multipolygon_

image

~/dev/map/thailand/thailand-latest_loc_ways$ tree
.
├── import_related_metadata
│   ├── osm_file_block_content.tsv
│   ├── osm_file_block.tsv
│   └── osm_file_statistics.tsv
├── multipolygon
│   ├── multipolygon_aa
│   ├── multipolygon_ab
│   ├── multipolygon_ac
│   ├── multipolygon_ad
│   └── multipolygon_ae
├── nodes
│   ├── 16713.tsv
│   ├── 16717.tsv
│   ├── 16744.tsv
│   ├── 16745.tsv
│   ├── 16747.tsv
│   ├── 16748.tsv
│   ├── 16749.tsv
│   ├── 25600.tsv
│   ├── 25601.tsv
│   ├── 25606.tsv
│   ├── 25620.tsv
│   ├── 25622.tsv
│   ├── 25632.tsv
│   ├── 25634.tsv
│   ├── 25636.tsv
│   ├── 25638.tsv
│   ├── 25648.tsv
│   ├── 25649.tsv
│   ├── 25650.tsv
│   ├── 25651.tsv
│   ├── 25652.tsv
│   ├── 25653.tsv
│   ├── 25654.tsv
│   ├── 25728.tsv
│   ├── 25729.tsv
│   ├── 25730.tsv
│   ├── 25731.tsv
│   ├── 25732.tsv
│   ├── 25733.tsv
│   ├── 25734.tsv
│   ├── 25744.tsv
│   ├── 25745.tsv
│   ├── 25748.tsv
│   ├── 25749.tsv
│   ├── 25756.tsv
│   ├── 25760.tsv
│   ├── 25761.tsv
│   ├── 25762.tsv
│   ├── 25763.tsv
│   ├── 25764.tsv
│   ├── 25765.tsv
│   ├── 25766.tsv
│   ├── 25774.tsv
│   ├── 25776.tsv
│   ├── 25777.tsv
│   ├── 25778.tsv
│   ├── 25779.tsv
│   ├── 25780.tsv
│   ├── 25781.tsv
│   ├── 25782.tsv
│   ├── 25873.tsv
│   ├── 25880.tsv
│   ├── 25881.tsv
│   ├── 25882.tsv
│   ├── 25883.tsv
│   ├── 25884.tsv
│   ├── 25885.tsv
│   ├── 25886.tsv
│   ├── 25985.tsv
│   ├── 25989.tsv
│   ├── 25992.tsv
│   ├── 25993.tsv
│   ├── 25994.tsv
│   ├── 25995.tsv
│   ├── 25996.tsv
│   ├── 25998.tsv
│   ├── 26000.tsv
│   ├── 26001.tsv
│   ├── 26003.tsv
│   ├── 26008.tsv
│   ├── 26009.tsv
│   ├── 26010.tsv
│   ├── 26011.tsv
│   ├── 26012.tsv
│   ├── 26013.tsv
│   ├── 26014.tsv
│   ├── 26019.tsv
│   ├── 26026.tsv
│   ├── 26029.tsv
│   └── 26030.tsv
├── relations
│   ├── 00000.tsv
│   └── 00001.tsv
├── sql
│   ├── nodes_import_000.sql
│   ├── nodes_import_001.sql
│   ├── nodes_import_002.sql
│   ├── nodes_import_003.sql
│   ├── ways_import_000.sql
│   ├── ways_import_001.sql
│   ├── ways_import_002.sql
│   ├── ways_import_003.sql
│   ├── y_multipoly_aa.sql
│   ├── y_multipoly_ab.sql
│   ├── y_multipoly_ac.sql
│   ├── y_multipoly_ad.sql
│   └── y_multipoly_ae.sql
├── static
│   ├── database_after_init.sql
│   ├── database_init.sql
│   ├── h3_poly.tsv.gz
│   ├── multipolygon.sql
│   ├── multipolygon_tables.sql
│   └── osmium_export.json
└── ways
    ├── 16717.tsv
    ├── 16744.tsv
    ├── 16745.tsv
    ├── 16747.tsv
    ├── 16748.tsv
    ├── 16749.tsv
    ├── 25620.tsv
    ├── 25622.tsv
    ├── 25634.tsv
    ├── 25638.tsv
    ├── 25648.tsv
    ├── 25649.tsv
    ├── 25650.tsv
    ├── 25651.tsv
    ├── 25652.tsv
    ├── 25653.tsv
    ├── 25654.tsv
    ├── 25728.tsv
    ├── 25729.tsv
    ├── 25730.tsv
    ├── 25731.tsv
    ├── 25732.tsv
    ├── 25733.tsv
    ├── 25734.tsv
    ├── 25744.tsv
    ├── 25745.tsv
    ├── 25748.tsv
    ├── 25749.tsv
    ├── 25756.tsv
    ├── 25760.tsv
    ├── 25761.tsv
    ├── 25762.tsv
    ├── 25763.tsv
    ├── 25764.tsv
    ├── 25765.tsv
    ├── 25766.tsv
    ├── 25774.tsv
    ├── 25776.tsv
    ├── 25777.tsv
    ├── 25778.tsv
    ├── 25779.tsv
    ├── 25780.tsv
    ├── 25781.tsv
    ├── 25782.tsv
    ├── 25873.tsv
    ├── 25880.tsv
    ├── 25881.tsv
    ├── 25882.tsv
    ├── 25883.tsv
    ├── 25884.tsv
    ├── 25886.tsv
    ├── 25984.tsv
    ├── 25985.tsv
    ├── 25989.tsv
    ├── 25992.tsv
    ├── 25993.tsv
    ├── 25994.tsv
    ├── 25995.tsv
    ├── 25996.tsv
    ├── 25998.tsv
    ├── 26000.tsv
    ├── 26001.tsv
    ├── 26003.tsv
    ├── 26008.tsv
    ├── 26009.tsv
    ├── 26010.tsv
    ├── 26011.tsv
    ├── 26012.tsv
    ├── 26013.tsv
    ├── 26014.tsv
    ├── 26017.tsv
    ├── 26019.tsv
    ├── 26026.tsv
    ├── 26030.tsv
    └── 32767.tsv

7 directories, 184 files

Run PostGIS and import data:

docker run --name postgis14-thailand --memory=12g --memory-swap=12g --memory-swappiness 0 --shm-size=1g -v /home/acc/dev/map/database/thailand:/var/lib/postgresql/data -v /home/acc/dev/map/thailand/thailand-latest_loc_ways:/input -e POSTGRES_PASSWORD=osmworld -e LD_LIBRARY_PATH=/usr/lib/jvm/java-11-openjdk-amd64/lib/server/ -d -p 5432:5432 -p 5005:5005 5d411c3be57f -c checkpoint_timeout='15 min' -c checkpoint_completion_target=0.9 -c shared_buffers='4096 MB' -c wal_buffers=-1 -c bgwriter_delay=200ms -c bgwriter_lru_maxpages=100 -c bgwriter_lru_multiplier=2.0 -c bgwriter_flush_after=0 -c max_wal_size='32768 MB' -c min_wal_size='16384 MB'

docker logs postgis14-thailand | tail -n 40
initdb: warning: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.
2023-01-29 10:22:09.841 UTC [1] LOG:  starting PostgreSQL 14.1 (Debian 14.1-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2023-01-29 10:22:09.841 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2023-01-29 10:22:09.841 UTC [1] LOG:  listening on IPv6 address "::", port 5432
2023-01-29 10:22:09.847 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-01-29 10:22:09.855 UTC [163] LOG:  database system was shut down at 2023-01-29 10:22:08 UTC
2023-01-29 10:22:09.863 UTC [1] LOG:  database system is ready to accept connections
Time: 16.011 ms
ANALYZE
Time: 49.590 ms
CREATE TABLE
Time: 1.509 ms
COPY 1
Time: 1.110 ms
ANALYZE
Time: 1.167 ms
CREATE TABLE
Time: 1.284 ms
COPY 566
Time: 2.092 ms
ANALYZE
Time: 1.880 ms
CREATE TABLE
Time: 1.210 ms
COPY 17482
Time: 19.023 ms
ANALYZE
Time: 27.044 ms
SELECT 80
Time: 8.333 ms
ANALYZE
Time: 1.533 ms
SELECT 75
Time: 9.650 ms
ANALYZE
Time: 1.213 ms

2023-01-29 10:22:07.219 UTC [49] LOG:  received fast shutdown request
waiting for server to shut down....2023-01-29 10:22:07.222 UTC [49] LOG:  aborting any active transactions
2023-01-29 10:22:07.223 UTC [49] LOG:  background worker "logical replication launcher" (PID 56) exited with exit code 1
2023-01-29 10:22:07.392 UTC [51] LOG:  shutting down
..2023-01-29 10:22:09.631 UTC [49] LOG:  database system is shut down
 done
server stopped

PostgreSQL init process complete; ready for start up.

docker exec -it postgis14-thailand psql -U postgres -d osmworld
psql (14.1 (Debian 14.1-1.pgdg110+1))
Type "help" for help.

osmworld=# \d
                       List of relations
 Schema |          Name          |       Type        |  Owner   
--------+------------------------+-------------------+----------
 public | geography_columns      | view              | postgres
 public | geometry_columns       | view              | postgres
 public | h3_3_bounds_complex    | table             | postgres
 public | multipolygon           | partitioned table | postgres
 public | multipolygon_000       | table             | postgres
 public | multipolygon_001       | table             | postgres
 public | multipolygon_002       | table             | postgres
 public | multipolygon_003       | table             | postgres
 public | multipolygon_004       | table             | postgres
 public | multipolygon_005       | table             | postgres
 public | multipolygon_006       | table             | postgres
 public | multipolygon_007       | table             | postgres
 public | multipolygon_008       | table             | postgres
 public | multipolygon_009       | table             | postgres
 public | multipolygon_010       | table             | postgres
 public | multipolygon_32767     | table             | postgres
 public | nodes                  | partitioned table | postgres
 public | nodes_000              | table             | postgres
 public | nodes_001              | table             | postgres
 public | nodes_002              | table             | postgres
 public | nodes_003              | table             | postgres
 public | nodes_004              | table             | postgres
 public | nodes_005              | table             | postgres
 public | nodes_006              | table             | postgres
 public | nodes_007              | table             | postgres
 public | nodes_008              | table             | postgres
 public | nodes_009              | table             | postgres
 public | nodes_010              | table             | postgres
 public | osm_file_block         | table             | postgres
 public | osm_file_block_content | table             | postgres
 public | osm_file_statistics    | table             | postgres
 public | osm_stat_nodes_3_3     | table             | postgres
 public | osm_stat_ways_3_3      | table             | postgres
 public | relation_members       | table             | postgres
 public | relations              | table             | postgres
 public | spatial_ref_sys        | table             | postgres
 public | ways                   | partitioned table | postgres
 public | ways_000               | table             | postgres
 public | ways_001               | table             | postgres
 public | ways_002               | table             | postgres
 public | ways_003               | table             | postgres
 public | ways_004               | table             | postgres
 public | ways_005               | table             | postgres
 public | ways_006               | table             | postgres
 public | ways_007               | table             | postgres
 public | ways_008               | table             | postgres
 public | ways_009               | table             | postgres
 public | ways_010               | table             | postgres
 public | ways_32767             | table             | postgres
(49 rows)

osmworld=# explain select h3_3, count(*) from ways group by 1;
                                        QUERY PLAN                                         
-------------------------------------------------------------------------------------------
 Gather  (cost=34456.99..97465.64 rows=76 width=10)
   Workers Planned: 4
   ->  Parallel Append  (cost=33456.99..96458.04 rows=19 width=10)
         ->  HashAggregate  (cost=96457.88..96457.95 rows=7 width=10)
               Group Key: ways_4.h3_3
               ->  Seq Scan on ways_004 ways_4  (cost=0.00..90269.92 rows=1237592 width=2)
         ->  HashAggregate  (cost=60359.46..60359.53 rows=7 width=10)
               Group Key: ways_9.h3_3
               ->  Seq Scan on ways_009 ways_9  (cost=0.00..56666.64 rows=738564 width=2)
         ->  HashAggregate  (cost=39018.48..39018.54 rows=6 width=10)
               Group Key: ways_7.h3_3
               ->  Seq Scan on ways_007 ways_7  (cost=0.00..36567.32 rows=490232 width=2)
         ->  HashAggregate  (cost=33456.99..33457.06 rows=7 width=10)
               Group Key: ways_2.h3_3
               ->  Seq Scan on ways_002 ways_2  (cost=0.00..31580.66 rows=375266 width=2)
         ->  HashAggregate  (cost=30029.44..30029.53 rows=9 width=10)
               Group Key: ways_5.h3_3
               ->  Seq Scan on ways_005 ways_5  (cost=0.00..28346.96 rows=336496 width=2)
         ->  HashAggregate  (cost=26775.62..26775.72 rows=10 width=10)
               Group Key: ways.h3_3
               ->  Seq Scan on ways_000 ways  (cost=0.00..25251.75 rows=304775 width=2)
         ->  HashAggregate  (cost=19023.96..19024.03 rows=7 width=10)
               Group Key: ways_1.h3_3
               ->  Seq Scan on ways_001 ways_1  (cost=0.00..17901.97 rows=224397 width=2)
         ->  HashAggregate  (cost=15682.35..15682.43 rows=8 width=10)
               Group Key: ways_6.h3_3
               ->  Seq Scan on ways_006 ways_6  (cost=0.00..14803.23 rows=175823 width=2)
         ->  HashAggregate  (cost=13881.85..13881.90 rows=5 width=10)
               Group Key: ways_3.h3_3
               ->  Seq Scan on ways_003 ways_3  (cost=0.00..13173.90 rows=141590 width=2)
         ->  HashAggregate  (cost=6970.89..6970.92 rows=3 width=10)
               Group Key: ways_8.h3_3
               ->  Seq Scan on ways_008 ways_8  (cost=0.00..6567.26 rows=80726 width=2)
         ->  HashAggregate  (cost=2385.09..2385.10 rows=1 width=10)
               Group Key: ways_11.h3_3
               ->  Seq Scan on ways_32767 ways_11  (cost=0.00..2327.06 rows=11606 width=2)
         ->  HashAggregate  (cost=37.55..37.59 rows=4 width=10)
               Group Key: ways_10.h3_3
               ->  Seq Scan on ways_010 ways_10  (cost=0.00..36.70 rows=170 width=2)
(39 rows)

osmworld=# select h3_3, count(*) from ways group by 1 order by 2 desc limit 20;
 h3_3  | count  
-------+--------
 25764 | 890643
 26010 | 329973
 25994 | 203645
 25730 | 188810
 25995 | 139373
 26011 | 136930
 25780 | 118165
 25765 | 105996
 26009 | 100196
 25762 |  91177
 26014 |  75555
 26003 |  71479
 26008 |  70353
 25883 |  60808
 25634 |  59705
 25777 |  58151
 16749 |  57887
 25782 |  57509
 25638 |  57296
 25880 |  56838
(20 rows)

Apache Parquet export

Parquet export activated with "-arrow_format PARQUET" command line parameter. OpenStreetMap H3 also allows you to convert data to Apache Parquet and process it in Hadoop and Apache Spark/Sedona. In this project, it allows you to do the same as osm-parquetizer but with additional data and collected geo coordinates for ways, as well as enriched h3 geo indexes. The files are successfully uploaded to the DuckDB Geo extension, I checked this and I will tell you the details in the publication.

./spark-shell --packages org.apache.sedona:sedona-python-adapter-3.0_2.12:1.3.1-incubating,org.apache.sedona:sedona-viz-3.0_2.12:1.3.1-incubating,org.datasyslab:geotools-wrapper:1.3.0-27.2 --conf spark.sql.extensions=org.apache.sedona.sql.SedonaSqlExtensions

23/03/26 21:36:19 WARN Utils: Your hostname, geohost resolves to a loopback address: 127.0.1.1; using 192.168.0.209 instead (on interface wlp0s20f3)
23/03/26 21:36:19 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/home/geo/dev/tools/spark-3.2.2-bin-hadoop3.2/jars/spark-unsafe_2.12-3.2.2.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
:: loading settings :: url = jar:file:/home/geo/dev/tools/spark-3.2.2-bin-hadoop3.2/jars/ivy-2.5.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /home/geo/.ivy2/cache
The jars for the packages stored in: /home/geo/.ivy2/jars
org.apache.sedona#sedona-python-adapter-3.0_2.12 added as a dependency
org.apache.sedona#sedona-viz-3.0_2.12 added as a dependency
org.datasyslab#geotools-wrapper added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-47c49091-20ed-49ad-88f1-026d43baeed8;1.0
	confs: [default]
	found org.apache.sedona#sedona-python-adapter-3.0_2.12;1.3.1-incubating in central
	found org.locationtech.jts#jts-core;1.18.2 in local-m2-cache
	found org.wololo#jts2geojson;0.16.1 in central
	found org.apache.sedona#sedona-core-3.0_2.12;1.3.1-incubating in central
	found org.apache.sedona#sedona-common;1.3.1-incubating in central
	found org.apache.sedona#sedona-sql-3.0_2.12;1.3.1-incubating in central
	found org.scala-lang.modules#scala-collection-compat_2.12;2.5.0 in central
	found org.apache.sedona#sedona-viz-3.0_2.12;1.3.1-incubating in central
	found org.beryx#awt-color-factory;1.0.0 in central
	found org.datasyslab#geotools-wrapper;1.3.0-27.2 in central
:: resolution report :: resolve 275ms :: artifacts dl 11ms
	:: modules in use:
	org.apache.sedona#sedona-common;1.3.1-incubating from central in [default]
	org.apache.sedona#sedona-core-3.0_2.12;1.3.1-incubating from central in [default]
	org.apache.sedona#sedona-python-adapter-3.0_2.12;1.3.1-incubating from central in [default]
	org.apache.sedona#sedona-sql-3.0_2.12;1.3.1-incubating from central in [default]
	org.apache.sedona#sedona-viz-3.0_2.12;1.3.1-incubating from central in [default]
	org.beryx#awt-color-factory;1.0.0 from central in [default]
	org.datasyslab#geotools-wrapper;1.3.0-27.2 from central in [default]
	org.locationtech.jts#jts-core;1.18.2 from local-m2-cache in [default]
	org.scala-lang.modules#scala-collection-compat_2.12;2.5.0 from central in [default]
	org.wololo#jts2geojson;0.16.1 from central in [default]
	---------------------------------------------------------------------
	|                  |            modules            ||   artifacts   |
	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
	---------------------------------------------------------------------
	|      default     |   10  |   0   |   0   |   0   ||   10  |   0   |
	---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-47c49091-20ed-49ad-88f1-026d43baeed8
	confs: [default]
	0 artifacts copied, 10 already retrieved (0kB/7ms)
23/03/26 21:36:19 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://192.168.0.209:4040
Spark context available as 'sc' (master = local[*], app id = local-1679855783524).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.2.2
      /_/
         
Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 11.0.18)
Type in expressions to have them evaluated.
Type :help for more information.

scala> 

scala> val ways = spark.read.parquet("/home/geo/dev/map/phuket3/phuket_loc_ways/arrow/ways/*.parquet")
ways: org.apache.spark.sql.DataFrame = [id: bigint, h33: smallint ... 17 more fields]

scala> ways.createOrReplaceTempView("ways")

scala> spark.time(sql("select id,h33,tags,ST_SetSRID(ST_GeomFromWKB(lineStringWkb),4326) wkb_str from ways where closed and (tags['building']='school' or (tags['building'] is not null and tags['amenity']='school')) limit 50")).show()
Time taken: 126 ms
+---------+-----+--------------------+--------------------+                     
|       id|  h33|                tags|             wkb_str|
+---------+-----+--------------------+--------------------+
| 31985155|25764|{building -> scho...|LINESTRING (100.5...|
| 31985189|25764|{building -> scho...|LINESTRING (100.5...|
| 31985207|25764|{building -> scho...|LINESTRING (100.5...|
| 31988377|25764|{building -> scho...|LINESTRING (100.5...|
| 31988502|25764|{amenity -> libra...|LINESTRING (100.5...|
| 31988658|25764|{building -> scho...|LINESTRING (100.5...|
| 31988661|25764|{building -> scho...|LINESTRING (100.5...|
| 31988692|25764|{building -> scho...|LINESTRING (100.5...|
| 31988865|25764|{building -> scho...|LINESTRING (100.5...|
| 31989612|25764|{building -> scho...|LINESTRING (100.5...|
|101939892|25880|{building -> school}|LINESTRING (100.5...|
|492451660|25880|{building -> school}|LINESTRING (100.5...|
|492451661|25880|{building -> school}|LINESTRING (100.5...|
|492451663|25880|{building -> school}|LINESTRING (100.5...|
|492451668|25880|{building -> school}|LINESTRING (100.5...|
|492451672|25880|{building -> school}|LINESTRING (100.5...|
|492483595|25880|{building -> school}|LINESTRING (100.5...|
|272560749|25764|{building -> scho...|LINESTRING (100.5...|
|272560751|25764|{building -> scho...|LINESTRING (100.5...|
|272560753|25764|{building -> scho...|LINESTRING (100.5...|
+---------+-----+--------------------+--------------------+
only showing top 20 rows

Node:

scala> spark.read.parquet("/home/geo/arrow/nodes/*.parquet").printSchema
root                                                                            
|-- id: long (nullable = true)
|-- h33: short (nullable = true)
|-- h38: integer (nullable = true)
|-- latitude: double (nullable = true)
|-- longitude: double (nullable = true)
|-- tags: map (nullable = true)
|    |-- key: string
|    |-- value: string (valueContainsNull = true)

Way:

scala> spark.read.parquet("/home/geo/arrow/ways/*.parquet").printSchema
root                                                                            
|-- id: long (nullable = true)
|-- h33: short (nullable = true)
|-- h38: integer (nullable = true)
|-- latitude: double (nullable = true)
|-- longitude: double (nullable = true)
|-- tags: map (nullable = true)
|    |-- key: string
|    |-- value: string (valueContainsNull = true)
|-- pointIdxs: array (nullable = true)
|    |-- element: long (containsNull = true)
|-- h33Center: short (nullable = true)
|-- closed: boolean (nullable = true)
|-- building: boolean (nullable = true)
|-- highway: boolean (nullable = true)
|-- scale: float (nullable = true)
|-- lineStringWkb: binary (nullable = true)
|-- bboxWkb: binary (nullable = true)
|-- h38Indexes: array (nullable = true)
|    |-- element: integer (containsNull = true)
|-- bboxMinX: double (nullable = true)
|-- bboxMaxX: double (nullable = true)
|-- bboxMinY: double (nullable = true)
|-- bboxMaxY: double (nullable = true)

Relation:

scala> spark.read.parquet("/home/geo/arrow/relations/*.parquet").printSchema
root
|-- id: long (nullable = true)
|-- tags: map (nullable = true)
|    |-- key: string
|    |-- value: string (valueContainsNull = true)
|-- memberId: array (nullable = true)
|    |-- element: long (containsNull = true)
|-- memberType: array (nullable = true)
|    |-- element: byte (containsNull = true)
|-- memberRole: array (nullable = true)
|    |-- element: string (containsNull = true)

Multipolygon:

scala> spark.read.parquet("/home/geo/arrow/multipolygon.parquet").printSchema
root
|-- id: long (nullable = true)
|-- wkb_hex: string (nullable = true)
|-- tags_json: string (nullable = true)

openstreetmap_h3's People

Contributors

igor-suhorukov avatar jimdi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

openstreetmap_h3's Issues

Cannot connect to the Docker daemon

Hi Igor,

I stumbled upon this project and I found it very exciting and wanted to give it a try. However I have a hard time getting to run the command. I understand the Java application starts some system processes to make use of existing tools, namely osmium.

This is when I get this error:

java -jar target/osm-to-pgsnapshot-schema-ng-1.0-SNAPSHOT.jar -source_pbf /Users/christoph/dev/osm/andorra-latest.osm.pbf
0
6  time 2392
diff between total and processing 3251
total thread time 8511
total processing time 5260
total save time 178
total waiting for save time 2382
thread max time 2122
processing max time 1734
nodes 11274
ways 18960
relations 495
relation members 33249
multipolygon count 269
0	[-32768,32695)	7674376	65463
1 7 7.674376

docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?.
See 'docker run --help'.

I am using MacOS 13.4 (22F66), Rancher Desktop 1.8.1, /etc/paths contains the directory with the docker command. Actually, when I run docker run -w /wkd -v /Users/christoph/dev… in bash myself the commands completes successfully.

I assume this is not specifically related to this project as it is about making system calls from Java. Maybe you still have an idea to help me and others that bump into the same problem.

Thanks

I can't build a docker container with postgresql-15-citus-11.2

=> ERROR [20/27] RUN apt-get -y install apt-utils postgresql-15-citus-11.2 1.1s ------

[20/27] RUN apt-get -y install apt-utils postgresql-15-citus-11.2:
0.252 Reading package lists...
0.766 Building dependency tree...
0.882 Reading state information...
1.069 E: Unable to locate package postgresql-15-citus-11.2
1.069 E: Couldn't find any package by glob 'postgresql-15-citus-11.2'
1.069 E: Couldn't find any package by regex 'postgresql-15-citus-11.2'


Dockerfile:28

26 | RUN curl https://install.citusdata.com/community/deb.sh | bash
27 | RUN apt-get purge -y --auto-remove curl
28 | >>> RUN apt-get -y install apt-utils postgresql-15-citus-11.2
29 | RUN echo "shared_preload_libraries='citus'" >> /usr/share/postgresql/postgresql.conf.sample
30 | RUN echo "shared_preload_libraries='citus'" >> /var/lib/postgresql/data/postgresql.conf

ERROR: failed to solve: process "/bin/sh -c apt-get -y install apt-utils postgresql-15-citus-11.2" did not complete successfully: exit code: 100

Add license information

Hello!

I found this project from https://habr.com/ru/post/714326/

Could you please add information about the license of this repository — is it allowed to copy, change and use the code in any purpose?
It would be great to see any popular open source license.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.