Coder Social home page Coder Social logo

bigsheeper / arctern Goto Github PK

View Code? Open in Web Editor NEW

This project forked from arctern-io/arctern

0.0 1.0 0.0 62.2 MB

License: Apache License 2.0

Dockerfile 0.93% Groovy 0.71% Shell 6.37% CMake 2.72% Python 31.92% C 0.22% C++ 45.03% Cuda 2.03% TSQL 10.06%

arctern's Introduction

中文README

Notice: Arctern is still in development and the 0.1.0 version is expected to be released in April 2020.

Overview

Arctern is a geospatial analytics engine for massive-scale data. Compared with other geospatial analytics tools, Arctern aims at providing the following advantages:

  1. Provides domain-specific APIs to improve the development efficiency of upper-level applications.
  2. Provides extensible, low-cost distributed solutions.
  3. Provides GPU acceleration for geospatial analytics algorithms.
  4. Support hybrid analysis with GIS, SQL, and ML functionalities.

Architecture

The following figure shows the architecture of Arctern 0.1.0.

Arctern includes two components: GIS and Visualization. Arctern 0.1.0 supports most frequently used 35 GIS APIs in the OGC standard, including construction, access, correlation analysis, measurement for geometric objects. The visualization component is responsible for rendering geometry objects. It provides standard Vega rendering APIs. Different from traditional web rendering, Arctern uses server-side rendering and can render choropleths, heatmaps, and scatter plots for massive-scale data. With a set of unified APIs, Arctern provides both CPU and GPU based implementations for geospatial data analytics and visualization.

For data format, Arctern supports standard numeric types, WKB formats, and files with JSON, CSV, and parquet format. Arctern organizes data in the memory in a column-based manner according to the Arrow standard. In this way, Arctern supports zero-copy data exchange with external systems.

Arctern includes three types of column-based interface: C++ API, Python API, and Spark API. The C++ APIs pass arguments in Arrow format, Python and Spark APIs pass arguments in dataframe format. Because Spark will start to support GPU resource management since the 3.0 version, the Spark interface of Arctern only supports Spark 3.0.

Code example

# Invoke Arctern API in PySpark

from pyspark.sql import SparkSession
from arctern_pyspark import register_funcs, heatmap
from arctern.util import save_png
from arctern.util.vega import vega_heatmap 

if __name__== "__main__":
    spark = SparkSession \
            .builder \
            .appName("Arctern-PySpark example") \
            .getOrCreate()

    spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true")
    register_funcs(spark)

    df = spark.read.format("csv") \
         .option("header", True) \ 
         .option("delimiter", ",") \
         .schema("passenger_count long,  pickup_longitude double, pickup_latitude double") \
         .load("file:///tmp/0_5M_nyc_taxi_and_building.csv") \
         .cache()
    df.createOrReplaceTempView("nyc_taxi")
        
    res = spark.sql(
        "select ST_Point(pickup_longitude, pickup_latitude) as point, passenger_count as w \
        from nyc_taxi \
        where ST_Within(ST_Point(pickup_longitude, pickup_latitude), 'POLYGON ((-73.998427 40.730309, \
                                                                                -73.954348 40.730309, \
                                                                                -73.954348 40.780816, \
                                                                                -73.998427 40.780816, \
                                                                                -73.998427 40.730309))')")

    vega = vega_heatmap(1024, 896, 10.0, [-73.998427, 40.730309, -73.954348, 40.780816], 'EPSG:4326')
    res = heatmap(res, vega)
    save_png(res, '/tmp/heatmap.png')

    spark.catalog.dropTempView("nyc_taxi")

    spark.stop()

Visualization

Arctern will be open sourced along with Sulidae, which is a front-end visualization system developed by ZILLIZ and provides hybrid visualization solutions with both web frontend and server-side rendering. Sulidae combines the speed and flexibility of web frontend rendering and massive-scale data rendering capability of the backend.

Arctern 0.1.0 is compatible with Sulidae. The following figures show the visualization effects of a headmap and a choropleth with 10 million data.

Arctern roadmap

v0.1.0

  1. Support most frequently used 35 GIS APIs in the OGC standard.
  2. Support rendering choropleths, heatmaps, and scatter plots for massive-scale datasets.
  3. Provide C++, Python, and Spark APIs that comply with Arrow standard.
  4. Arctern engine with CPU based implementation.
  5. Arctern engine with GPU based implementation.
  6. Compatibility with Sulidae.
  7. Documentation for installation, deployment, and API reference.

v0.2.0

  1. Domain-specific API for trace analysis and geospatial data analysis.
  2. Geospatial indexes for domain-specific API.
  3. Performance optimization in Spark 3.0.
  4. Support more GIS APIs.
  5. Continuously improve system stability.

In progress:

Completed by 2020.03.10

  1. Support most frequently used 35 GIS APIs in the OGC standard.
  2. Support rendering horopleths, heatmaps, and scatter plots for massive-scale datasets.
  3. Support C++, Python, and Spark APIs that comply with Arrow standard.
  4. Arctern engine with CPU support.
  5. Arctern engine with GPU support.

Contact us

Email

[email protected]

ZILLIZ Wechat

arctern's People

Contributors

become-nice avatar bigsheeper avatar czpmango avatar czs007 avatar fluorinedog avatar guorentong avatar guoxiangzhou avatar jeffoverflow avatar liangliu avatar loguo avatar longjiquan avatar neza2017 avatar shengjh avatar superbigdove avatar talentan avatar xiaocai2333 avatar xige-16 avatar yxm1536 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.