Coder Social home page Coder Social logo

geocrawler's Introduction

geocrawler

A crawler for geospatial files

###About: geocrawler is a command line tool to extract geospatial metadata in JSON format. geoparser is a command line tool to process the output of geocrawler and generate an indexable JSON representation of a file containing geospatial data. geoparser read from standard input and is intended to be used as a piped command after geocrawler.

A file or directory is passed to geocrawler as an argument returning a JSON document for the file or a list of JSON documents for the geospatial files under that directory (the crawler traverses the whole tree under the specified directory)

There is a flag -c to determine the level of concurrency of the crawler. This is intended to speed up traversing directories with a large number of files by using several cores in the machine. -c should be set to a number equal or lower to the number of cores in the machine doing the crawling.

This program is written in Go wrapping the C GDAL library. There is a linux precompiled version available at the bin directory.

###Examples:

Extracting metadata for one file:

geocrawler LC80640052015252LGN00_B1.TIF
{"file_type":"GTiff","datasets":[{"ds_name":"/landsat/L8/064/005/LC80640052015252LGN00/LC80640052015252LGN00_B1.TIF","raster_count":1,"array_type":"Uint16","x_size":9141,"y_size":9161,"proj4":"+proj=utm +zone=11 +datum=WGS84 +units=m +no_defs ","geotransform":[344085,30,0,8.689215e+06,0,-30]}]}

Extracting metadata for all the files under a directory:

geocrawler LC80640052015252LGN00/
/landsat/L8/064/005/LC80640052015252LGN00/LC80640052015252LGN00_B1.TIF	{"file_type":"GTiff","datasets":[{"ds_name":"/landsat/L8/064/005/LC80640052015252LGN00/LC80640052015252LGN00_B1.TIF","raster_count":1,"array_type":"Uint16","x_size":9141,"y_size":9161,"proj4":"+proj=utm +zone=11 +datum=WGS84 +units=m +no_defs ","geotransform":[344085,30,0,8.689215e+06,0,-30]}]}
/landsat/L8/064/005/LC80640052015252LGN00/LC80640052015252LGN00_B2.TIF	{"file_type":"GTiff","datasets":[{"ds_name":"/landsat/L8/064/005/LC80640052015252LGN00/LC80640052015252LGN00_B2.TIF","raster_count":1,"array_type":"Uint16","x_size":9141,"y_size":9161,"proj4":"+proj=utm +zone=11 +datum=WGS84 +units=m +no_defs ","geotransform":[344085,30,0,8.689215e+06,0,-30]}]}
/landsat/L8/064/005/LC80640052015252LGN00/LC80640052015252LGN00_B3.TIF	{"file_type":"GTiff","datasets":[{"ds_name":"/landsat/L8/064/005/LC80640052015252LGN00/LC80640052015252LGN00_B3.TIF","raster_count":1,"array_type":"Uint16","x_size":9141,"y_size":9161,"proj4":"+proj=utm +zone=11 +datum=WGS84 +units=m +no_defs ","geotransform":[344085,30,0,8.689215e+06,0,-30]}]}
...

Same example but using 4 processes extracting the metadata concurrently and writing the output to a file (to be later ingested in a database):

geocrawler -c 4 LC80640052015252LGN00/ > out.tsv

Extracting metadata for all the files under a directory and parsing into an indexable object:

geocrawler LC80640052015252LGN00/ | geoparser
/landsat/L8/102/078/LC81020782015199LGN00/LC81020782015199LGN00_B1.TIF	{"timestamp":"2015-06-17T00:00:00Z","filename_fields":{"band":"B1","julian_day":"199","mission":"8","path":"102","processing_level":"LGN00","row":"078","year":"2015"},"polygon":{"type":"Polygon","coordinates":[[[132.428274005358645,-24.930273120739798],[132.382199681744368,-27.027829648222031],[134.692598341469136,-27.051847101496204],[134.698015969944748,-24.952165411227803],[132.428274005358645,-24.930273120739798]]]},"array_type":"Uint16","x_size":7641,"y_size":7751,"proj_wkt":"PROJCS[\"WGS 84 / UTM zone 53N\",GEOGCS[\"WGS 84\",DATUM[\"WGS_1984\",SPHEROID[\"WGS 84\",6378137,298.257223563,AUTHORITY[\"EPSG\",\"7030\"]],AUTHORITY[\"EPSG\",\"6326\"]],PRIMEM[\"Greenwich\",0,AUTHORITY[\"EPSG\",\"8901\"]],UNIT[\"degree\",0.0174532925199433,AUTHORITY[\"EPSG\",\"9122\"]],AUTHORITY[\"EPSG\",\"4326\"]],PROJECTION[\"Transverse_Mercator\"],PARAMETER[\"latitude_of_origin\",0],PARAMETER[\"central_meridian\",135],PARAMETER[\"scale_factor\",0.9996],PARAMETER[\"false_easting\",500000],PARAMETER[\"false_northing\",0],UNIT[\"metre\",1,AUTHORITY[\"EPSG\",\"9001\"]],AXIS[\"Easting\",EAST],AXIS[\"Northing\",NORTH],AUTHORITY[\"EPSG\",\"32653\"]]","geotransform":[240285,30,0,-2.759685e+06,0,-30]}
/landsat/L8/102/078/LC81020782015199LGN00/LC81020782015199LGN00_B2.TIF	{"timestamp":"2015-06-17T00:00:00Z","filename_fields":{"band":"B2","julian_day":"199","mission":"8","path":"102","processing_level":"LGN00","row":"078","year":"2015"},"polygon":{"type":"Polygon","coordinates":[[[132.428274005358645,-24.930273120739798],[132.382199681744368,-27.027829648222031],[134.692598341469136,-27.051847101496204],[134.698015969944748,-24.952165411227803],[132.428274005358645,-24.930273120739798]]]},"array_type":"Uint16","x_size":7641,"y_size":7751,"proj_wkt":"PROJCS[\"WGS 84 / UTM zone 53N\",GEOGCS[\"WGS 84\",DATUM[\"WGS_1984\",SPHEROID[\"WGS 84\",6378137,298.257223563,AUTHORITY[\"EPSG\",\"7030\"]],AUTHORITY[\"EPSG\",\"6326\"]],PRIMEM[\"Greenwich\",0,AUTHORITY[\"EPSG\",\"8901\"]],UNIT[\"degree\",0.0174532925199433,AUTHORITY[\"EPSG\",\"9122\"]],AUTHORITY[\"EPSG\",\"4326\"]],PROJECTION[\"Transverse_Mercator\"],PARAMETER[\"latitude_of_origin\",0],PARAMETER[\"central_meridian\",135],PARAMETER[\"scale_factor\",0.9996],PARAMETER[\"false_easting\",500000],PARAMETER[\"false_northing\",0],UNIT[\"metre\",1,AUTHORITY[\"EPSG\",\"9001\"]],AXIS[\"Easting\",EAST],AXIS[\"Northing\",NORTH],AUTHORITY[\"EPSG\",\"32653\"]]","geotransform":[240285,30,0,-2.759685e+06,0,-30]}
/landsat/L8/102/078/LC81020782015199LGN00/LC81020782015199LGN00_B3.TIF	{"timestamp":"2015-06-17T00:00:00Z","filename_fields":{"band":"B3","julian_day":"199","mission":"8","path":"102","processing_level":"LGN00","row":"078","year":"2015"},"polygon":{"type":"Polygon","coordinates":[[[132.428274005358645,-24.930273120739798],[132.382199681744368,-27.027829648222031],[134.692598341469136,-27.051847101496204],[134.698015969944748,-24.952165411227803],[132.428274005358645,-24.930273120739798]]]},"array_type":"Uint16","x_size":7641,"y_size":7751,"proj_wkt":"PROJCS[\"WGS 84 / UTM zone 53N\",GEOGCS[\"WGS 84\",DATUM[\"WGS_1984\",SPHEROID[\"WGS 84\",6378137,298.257223563,AUTHORITY[\"EPSG\",\"7030\"]],AUTHORITY[\"EPSG\",\"6326\"]],PRIMEM[\"Greenwich\",0,AUTHORITY[\"EPSG\",\"8901\"]],UNIT[\"degree\",0.0174532925199433,AUTHORITY[\"EPSG\",\"9122\"]],AUTHORITY[\"EPSG\",\"4326\"]],PROJECTION[\"Transverse_Mercator\"],PARAMETER[\"latitude_of_origin\",0],PARAMETER[\"central_meridian\",135],PARAMETER[\"scale_factor\",0.9996],PARAMETER[\"false_easting\",500000],PARAMETER[\"false_northing\",0],UNIT[\"metre\",1,AUTHORITY[\"EPSG\",\"9001\"]],AXIS[\"Easting\",EAST],AXIS[\"Northing\",NORTH],AUTHORITY[\"EPSG\",\"32653\"]]","geotransform":[240285,30,0,-2.759685e+06,0,-30]}

This program has been developed at the NCI

geocrawler's People

Contributors

monkeybutter avatar

Stargazers

 avatar Peter Scarth avatar

Watchers

 avatar

geocrawler's Issues

todo

  • include lat/long polygon
    • check himawari
  • check himawari nulls
  • simplify/remove compact
  • review proj4 text to try to avoid error on ST_Transform fail

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.