Coder Social home page Coder Social logo

guilhem-dvr / sedona Goto Github PK

View Code? Open in Web Editor NEW

This project forked from apache/sedona

0.0 0.0 0.0 517.27 MB

A cluster computing framework for processing large-scale geospatial data

Home Page: https://sedona.apache.org/

License: Apache License 2.0

Shell 0.02% JavaScript 0.02% Python 2.26% C 0.24% Java 7.42% Scala 5.15% R 0.47% Makefile 0.01% HTML 0.02% Jupyter Notebook 84.39% Dockerfile 0.01%

sedona's Introduction

Apache Sedona

Scala and Java build Python build R build Docker image build Example project build Docs build

Download statistics Maven PyPI Conda-forge CRAN DockerHub
Apache Sedona 225k/month PyPI - Downloads Downloads Anaconda-Server Badge Docker pulls
Archived GeoSpark releases 10k/month PyPI - DownloadsDownloads

What is Apache Sedona?

Apache Sedona™ is a spatial computing engine that enables developers to easily process spatial data at any scale within modern cluster computing systems such as Apache Spark and Apache Flink. Sedona developers can express their spatial data processing tasks in Spatial SQL, Spatial Python or Spatial R. Internally, Sedona provides spatial data loading, indexing, partitioning, and query processing/optimization functionality that enable users to efficiently analyze spatial data at any scale.

Sedona Ecosystem

Features

Some of the key features of Apache Sedona include:

  • Support for a wide range of geospatial data formats, including GeoJSON, WKT, and ESRI Shapefile.
  • Scalable distributed processing of large vector and raster datasets.
  • Tools for spatial indexing, spatial querying, and spatial join operations.
  • Integration with popular geospatial python tools such as GeoPandas.
  • Integration with popular big data tools, such as Spark, Hadoop, Hive, and Flink for data storage and querying.
  • A user-friendly API for working with geospatial data in the SQL, Python, Scala and Java languages.
  • Flexible deployment options, including standalone, local, and cluster modes.

These are some of the key features of Apache Sedona, but it may offer additional capabilities depending on the specific version and configuration.

Click Binder and play the interactive Sedona Python Jupyter Notebook immediately!

When to use Sedona?

Use Cases:

Apache Sedona is a widely used framework for working with spatial data, and it has many different use cases and applications. Some of the main use cases for Apache Sedona include:

  • Automotive data analytics: Apache Sedona is widely used in geospatial analytics applications, where it is used to perform spatial analysis and data mining on large and complex datasets collected from fleets.
  • Urban planning and development: Apache Sedona is commonly used in urban planning and development applications to analyze and visualize spatial data sets related to urban environments, such as land use, transportation networks, and population density.
  • Location-based services: Apache Sedona is often used in location-based services, such as mapping and navigation applications, where it is used to process and analyze spatial data to provide location-based information and services to users.
  • Environmental modeling and analysis: Apache Sedona is used in many different environmental modeling and analysis applications, where it is used to process and analyze spatial data related to environmental factors, such as air quality, water quality, and weather patterns.
  • Disaster response and management: Apache Sedona is used in disaster response and management applications to process and analyze spatial data related to disasters, such as floods, earthquakes, and other natural disasters, in order to support emergency response and recovery efforts.

Code Example:

This example loads NYC taxi trip records and taxi zone information stored as .CSV files on AWS S3 into Sedona spatial dataframes. It then performs spatial SQL query on the taxi trip datasets to filter out all records except those within the Manhattan area of New York. The example also shows a spatial join operation that matches taxi trip records to zones based on whether the taxi trip lies within the geographical extents of the zone. Finally, the last code snippet integrates the output of Sedona with GeoPandas and plots the spatial distribution of both datasets.

Load NYC taxi trips and taxi zones data from CSV Files Stored on AWS S3

taxidf = sedona.read.format('csv').option("header","true").option("delimiter", ",").load("s3a://your-directory/data/nyc-taxi-data.csv")
taxidf = taxidf.selectExpr('ST_Point(CAST(Start_Lon AS Decimal(24,20)), CAST(Start_Lat AS Decimal(24,20))) AS pickup', 'Trip_Pickup_DateTime', 'Payment_Type', 'Fare_Amt')
zoneDf = sedona.read.format('csv').option("delimiter", ",").load("s3a://your-directory/data/TIGER2018_ZCTA5.csv")
zoneDf = zoneDf.selectExpr('ST_GeomFromWKT(_c0) as zone', '_c1 as zipcode')

Spatial SQL query to only return Taxi trips in Manhattan

taxidf_mhtn = taxidf.where('ST_Contains(ST_PolygonFromEnvelope(-74.01,40.73,-73.93,40.79), pickup)')

Spatial Join between Taxi Dataframe and Zone Dataframe to Find taxis in each zone

taxiVsZone = sedona.sql('SELECT zone, zipcode, pickup, Fare_Amt FROM zoneDf, taxiDf WHERE ST_Contains(zone, pickup)')

Show a map of the loaded Spatial Dataframes using GeoPandas

zoneGpd = gpd.GeoDataFrame(zoneDf.toPandas(), geometry="zone")
taxiGpd = gpd.GeoDataFrame(taxidf.toPandas(), geometry="pickup")

zone = zoneGpd.plot(color='yellow', edgecolor='black', zorder=1)
zone.set_xlabel('Longitude (degrees)')
zone.set_ylabel('Latitude (degrees)')

zone.set_xlim(-74.1, -73.8)
zone.set_ylim(40.65, 40.9)

taxi = taxiGpd.plot(ax=zone, alpha=0.01, color='red', zorder=3)

Docker image

We provide a Docker image for Apache Sedona with Python JupyterLab and a single-node cluster. The images are available on DockerHub

Building Sedona

  • To install the Python package:

    pip install apache-sedona
    
  • To Compile the source code, please refer to Sedona website

  • Modules in the source code

Name API Introduction
common Java Core geometric operation logics, serialization, index
spark Spark RDD/DataFrame Scala/Java/SQL Distributed geospatial data processing on Apache Spark
flink Flink DataStream/Table in Scala/Java/SQL Distributed geospatial data processing on Apache Flink
snowflake Snowflake SQL Distributed geospatial data processing on Snowflake
spark-shaded No source code shaded jar for Sedona Spark
flink-shaded No source code shaded jar for Sedona Flink
snowflake-tester Java tester program for Sedona Snowflake
python Spark RDD/DataFrame Python Distributed geospatial data processing on Apache Spark
R Spark RDD/DataFrame in R R wrapper for Sedona
Zeppelin Apache Zeppelin Plugin for Apache Zeppelin 0.8.1+

Documentation

Please visit Apache Sedona website for detailed information

Join the community

Follow Sedona on Twitter for fresh news: Sedona@Twitter

Join the Sedona Discord community:

Sedona JIRA: Bugs, Pull Requests, and other similar issues

Sedona Mailing Lists: [email protected]: project development, general questions or tutorials.

  • Please first subscribe and then post emails. To subscribe, please send an email (leave the subject and content blank) to [email protected]

Powered by

The Apache Software Foundation

sedona's People

Contributors

jiayuasu avatar kontinuation avatar zongsizhang avatar jinxuan avatar ign5117 avatar furqaankhan avatar jbampton avatar sekikn avatar mbasmanova avatar umartin avatar yyy1000 avatar imbruced avatar netanel246 avatar kanchanchy avatar dependabot[bot] avatar gregleleu avatar kimahriman avatar douglasdennis avatar sw186000 avatar prantogg avatar yitao-li avatar carolgit avatar madzik555 avatar edurdevic avatar omkarkaptan avatar michaelmerg avatar wrussia avatar tanelk avatar mosarwat avatar avshrepo avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.