Coder Social home page Coder Social logo

arctern-io / arctern Goto Github PK

View Code? Open in Web Editor NEW
103.0 14.0 48.0 68.18 MB

License: Apache License 2.0

CMake 2.62% Shell 6.14% Cuda 1.96% C++ 43.68% Python 34.50% C 0.21% Dockerfile 0.90% Groovy 0.30% TSQL 9.69%
gis gis-platform giscience geospatial geolocation gpu-acceleration gpu-programming

arctern's Introduction

Arctern Docs

Arctern 中文文档

Overview

Arctern is a fast scalable spatial-temporal analytics framework.

Scalability is key to building productive data science pipelines. To address the scalability challenge, we launched Arctern, an open-source spatial-temporal analytic framework for boosting end-to-end data science performance. Arctern aims to improve scalability from two aspects:

  • Unified data analytic and processing interface across different platforms, from laptops to clusters and cloud.
  • Rich and consistent algorithms and models, including trajectory processing, spatial clustering, and regression, etc., across different data science pipeline stages.

Arctern's approach and current progress

We adopt GeoPandas‘s interface and plan to build the GeoDataFrame/GeoSeries that scale both up and out. On top of GeoDataFrame/GeoSeries, we will develop a consistent spatial-temporal algorithm set across execution environments.

We have now developed an efficient multi-thread GeoSeries implementation, and the distributed version is in progress. In the latest version 0.2.0, Arctern achieves 24x speedup against GeoPandas. Even under single-thread execution, Arctern outperforms GeoPandas 7x on average. The detailed evaluation results are illustrated in the figure below.

We are also conducting experimental GPU acceleration for spatial-temporal data analysis and rendering. By now Arctern provides six GPU-accelerated rendering methods and eight spatial-relation operations, which outperform their CPU-based counterparts with up to 36x speedup.

In the next few releases, our team will focus on:

  • Developing a distributed version of GeoSeries. Our first distributed implementation of GeoDataFrame/GeoSeries will be based on Spark. It is developed in sync with Spark 3.0 since its preview release. Spark's supports on GPU scheduling and column-based processing is highly in line with our idea of high-performance spatial-temporal data processing. Besides, the introduced Koalas interface offers a promising option for implementing consistent GeoDataFrame/GeoSeries interfaces on Spark.
  • Enriching our spatial-temporal algorithm sets. We will concentrate on KNN search and trajectory analysis in the project's early stages.

arctern's People

Contributors

become-nice avatar bigsheeper avatar czpmango avatar czs007 avatar emma-song avatar fluorinedog avatar guorentong avatar guoxiangzhou avatar jeffoverflow avatar liangliu avatar loguo avatar longjiquan avatar neza2017 avatar shengjh avatar superbigdove avatar talentan avatar xiaocai2333 avatar xige-16 avatar yxm1536 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

arctern's Issues

WKT ambiguity

I found that the WKT form does not specify the coordinate system type, which means that you can convert a string of type WKT to a spatial object in any coordinate system. This may be an issue to consider, as arctern's current interfaces are defined in WKT form.

I did the following tests to verify the above view:

select st_distance('LINESTRING (11 2,3 4)'::geometry,'POLYGON ((0 0,0 1,3 3,1 0,0 0))'::geometry) ; -- sql1
select st_distance('LINESTRING (11 2,3 4)'::geography,'POLYGON ((0 0,0 1,3 3,1 0,0 0))'::geography) ; -- sql2
select st_distance('LINESTRING (11 2,3 4)'::geography,'POLYGON ((0 0,0 1,3 3,1 0,0 0))'::geometry) ; -- sql3

The results are :

sql1 : 0.970142500145332
sql2 : 107417.14877794
sql3 : 107417.14877794   (just same as sql2)

You can see that the sql1 and sql2 results are different.

Therefore, I tried to add extra information to the WKT string to avoid possible ambiguity caused by the above phenomenon.

Here is my test SQL statement (I chose POINT and LINESTRING to avoid possible errors):

SELECT st_distance(
ST_Transform(ST_GeomFromText('POINT (1 1)',4326),3857),
ST_Transform(ST_GeomFromText('LINESTRING (0 0,0 1)',4326),3857)
); -- sql4

select st_distance('POINT(1 1)'::geography,'LINESTRING(0 0,0 1)'::geography); -- sql5

The results are :

sql4 : 111319.490793272
sql5 : 111302.64933943

It can be found that the results of sql4 and sql5 are close to each other. I am not sure whether it is the error caused by the coordinate system mapping, but it can also be verified that adding additional information can avoid the above ambiguity.

Note: all tests are in postgis.

ST_IsValid bug and other function implementation related to IsValid

I found some differences in arctern's parsing rules for WKT strings. Some data that would be reported incorrectly in postgis, but not arctern.

I tested the ST_IsValid function in arctern :

def run_st_tmp(spark):
    register_funcs(spark)
    input = []

    input.extend([('POINT (1 8 2 4 )kdjff',)])
    input.extend([('POLYGON ((1 1,1 2,2 2,2 1,1 1)),((dkjfkjd0 0,1 -1,3 4,-2 3,0 0))',)])

    df = spark.createDataFrame(data=input, schema=['geos']).cache()
    df.createOrReplaceTempView("t1")
    spark.sql("select ST_IsValid_UDF(geos) from t1").show(100,0)

I got the following results :

+--------------------+
|ST_IsValid_UDF(geos)|
+--------------------+
|    true            |
|    true            |
+--------------------+

Our ST_IsValid implementation, first call OGRGeometryFactory: : createFromWkt, but OGRGeometryFactory: : createFromWkt input check is weak, so it produce the correct results.

I also looked at the implementation of other functions.There is no IsValid check before calling the gdal API. Refer to the gdal website API documentation as follows:

`
Geometry validity is not checked. In case you are unsure of the validity of the input geometries', call IsValid () before, otherwise the result took be wrong.

So here are two Suggestions:

  • OGRGeometryFactory: : createFromWkt legitimacy examination and OGR_G_IsValid examination priority need to question
  • Gdal function is not responsible for the validity examination. Our other functions should do an IsValid check before calling the gdal C API.

get following error while run spark tests: run_st_transform

get following error while run spark tests: run_st_transform(spark_session)

file path: GIS/spark/pyspark/example/gis/spark_udf_ex.py

ERROR 1: PROJ: proj_create_from_database: Open of /home/liangliu/anaconda3/envs/zgis_dev/share/proj failed
terminate called after throwing an instance of 'std::runtime_error*'

ST_Overlaps bug

I got different output when I use specific wkt as input to the ST_Overlaps function (compared to geospark).


  • geospark test :
    spark.sql("SELECT ST_Overlaps ( ST_GeomFromWKT('POLYGON ((0 0,0 1,1 1,1 0,0 0))') , ST_GeomFromWKT('MULTIPOLYGON ( ((0 0, 0 2, 2 3,2 0,0 0)) )') )").show(false)
    output : false

  • GIS test :
    wkt_arrow_array1 = { POLYGON ((0 0,0 1,1 1,1 0,0 0))}
    wkt_arrow_array2 = { MULTIPOLYGON ( ((0 0, 0 2, 2 3,2 0,0 0)) )}
    zilliz::gis::ST_Overlaps(wkt_arrow_array1,wkt_arrow_array2)
    output : true

  • postgis test :
    select st_overlaps('POLYGON ((0 0,0 1,1 1,1 0,0 0))'::geometry,'MULTIPOLYGON ( ((0 0, 0 2, 2 3,2 0,0 0)) )'::geometry);
    output : false

st_distance difference

in postgis, distance with an empty geometry is like 'empty'
postgis:
SELECT ST_distance('POINT EMPTY'::geometry,'POINT(1 2)'::geometry);

result:
postgres=# SELECT ST_distance('POINT EMPTY'::geometry,'POINT(1 2)'::geometry);
st_distance
||-------------

(1 row)

per actern, result is 0

different results for some st_equals for arctern and postgis

in arctern the results for the following specific data are all false:
select st_equals_udf(left, right) as geos from test_equals

in postgis, these sqls results are all true
select st_equals('LINESTRING (0 0, 10 10)'::geometry, 'LINESTRING (0 0, 5 5, 10 10)'::geometry);
select st_equals('LINESTRING (10 10, 0 0)'::geometry, 'LINESTRING (0 0, 5 5, 10 10)'::geometry);
select st_equals('LINESTRING(0 0, 1 1)'::geometry, 'LINESTRING(1 1, 0 0)'::geometry);

ST_Union_Aggr exception

I got an exception when run my test code below :

from pyspark.sql import SparkSession
from zilliz_pyspark import register_funcs

def run_st_union(spark):
    test_df = spark.read.json("/xxx/st_union.json").cache()
    test_df.createOrReplaceTempView("st_union")
    register_funcs(spark)
    spark.sql("select ST_Union_Aggr_UDF(geos) from (select ST_PolygonFromEnvelope_UDF(a,c,b,d) as geos from st_union) as foo").show(100,0)

#main here.

st_union.json is just like :

{"a": 13.9, "c": 82.2, "b": 19.1, "d": 83.4}
{"a": 10.1, "c": 91.9, "b": 19.7, "d": 98.3}
{"a": 16.1, "c": 93.3, "b": 16.6, "d": 94.0}
{"a": 11.0, "c": 88.3, "b": 18.7, "d": 98.2}
{"a": 13.9, "c": 82.2, "b": 19.1, "d": 83.4}
{"a": 12.0, "c": 81.5, "b": 16.2, "d": 90.6}
{"a": 10.4, "c": 87.5, "b": 11.7, "d": 92.2}
{"a": 15.5, "c": 88.7, "b": 18.6, "d": 98.4}
{"a": 14.8, "c": 83.0, "b": 16.9, "d": 85.6}
{"a": 10.8, "c": 83.9, "b": 16.5, "d": 84.4}
{"a": 12.5, "c": 80.8, "b": 14.8, "d": 97.1}

The messege is :

ERROR 1: TopologyException: Input geom 0 is invalid: Self-intersection at or near point 14.899999999999999 95.099999999999994 at 14.899999999999999 95.099999999999994
20/02/29 15:43:16 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3)

postgis test :

sql :
drop table t1;
create table t1 (a real,c real,b real,d real);
insert into t1 values 
(10.1,91.9,19.7,98.3),
(16.1,93.3,16.6,94.0),
(11.0,88.3,18.7,98.2),
(13.9,82.2,19.1,83.4),
(12.0,81.5,16.2,90.6),
(10.4,87.5,11.7,92.2),
(15.5,88.7,18.6,98.4),
(14.8,83.0,16.9,85.6),
(10.8,83.9,16.5,84.4),
(12.5,80.8,14.8,97.1)
;
select st_astext(st_union(geo)) from (select st_makeEnvelope(a,c,b,d) as geo from t1) as foo;

result :
 POLYGON((16.8999996185303 83.4000015258789,19.1000003814697 83.4000015258789,19.1000003814697 82.1999969482422,16.2000007629395 82.1999969482422,16.2000007629395 81.5,14.8000001907349 81.5,14.8000001907349 80.8000030517578,12.5 80.8000030517578,12.5 81.5,12 81.5,12 83.9000015258789,10.8000001907349 83.9000015258789,10.8000001907349 84.4000015258789,12 84.4000015258789,12 88.3000030517578,11.6999998092651 88.3000030517578,11.6999998092651 87.5,10.3999996185303 87.5,10.3999996185303 91.9000015258789,10.1000003814697 91.9000015258789,10.1000003814697 98.3000030517578,15.5 98.3000030517578,15.5 98.4000015258789,18.6000003814697 98.4000015258789,18.6000003814697 98.30
00030517578,19.7000007629395 98.3000030517578,19.7000007629395 91.9000015258789,18.7000007629395 91.9000015258789,18.7000007629395 88.3000030517578,16.2000007629395 88.
3000030517578,16.2000007629395 85.5999984741211,16.8999996185303 85.5999984741211,16.8999996185303 83.4000015258789))

ST_Buffer bug

I got different output when I use specific wkt as input to the ST_Buffer function (compared to geospark).


  • geospark test :
    spark.sql("SELECT ST_Buffer( ST_GeomFromWKT('MULTIPOLYGON ( ((0 0, 1 4, 1 0,0 0)), ((0 0,1 0,0 1,0 0)) )') , 0)").show(1,0)
    output : POLYGON ((0.2 0.8, 1 4, 1 0, 0.2 0.8))

  • GIS test :
    wkt_arrow_array = {MULTIPOLYGON ( ((0 0, 1 4, 1 0,0 0)), ((0 0,1 0,0 1,0 0)) ) }
    zilliz::gis::ST_Buffer(wkt_arrow_array,0)
    output : POLYGON ((0 0,0 1,0.2 0.8,1 4,1 0,0 0))

  • postgis test :
    select st_astext(st_buffer('MULTIPOLYGON ( ((0 0, 1 4, 1 0,0 0)), ((0 0,1 0,0 1,0 0)) )'::geometry,0))
    output : POLYGON((0 0,0 1,0.2 0.8,1 4,1 0,0 0))

st_isvalid difference

in postgis:

select st_isvalid("POINT (30)");
select st_isvalid("POINT (,)");
select st_isvalid("POINT (a b)");
select st_isvalid("MULTIPOINT ()");
select st_isvalid("MULTIPOINT (,)");
select st_isvalid("POINT(1 2 3 4 5 6 7)");
select st_isvalid("LINESTRING(1 1)");
select st_isvalid("MULTIPOINT(1 1, 2 2");

all return ERROR while executing these in psql

in arctern, all of them return FALSE

st_envelope_udf different result with postgis

in our st_envelope_udf function the result will be 'POINT (0 0)'

actually, it's different on envelope all empty geometry types

postgis:
select st_astext(st_envelope('POLYGON EMPTY'::geometry));
result:
st_astext

POLYGON EMPTY

Some geometry cases that are not valid

My test code :

from osgeo import ogr

p0 =ogr.CreateGeometryFromWkt('POINT (1 8)')
p1 =ogr.CreateGeometryFromWkt('MULTIPOINT (1 1,3 4)')
p2 =ogr.CreateGeometryFromWkt('LINESTRING (1 1,1 2,2 3)')
p3 =ogr.CreateGeometryFromWkt('MULTILINESTRING ((1 1,1 2),(2 4,1 9,1 8))' )
p4 =ogr.CreateGeometryFromWkt('MULTILINESTRING ((1 1,3 4))')
p5 =ogr.CreateGeometryFromWkt('POLYGON ((1 1,1 2,2 2,2 1,1 1))')
p6 =ogr.CreateGeometryFromWkt('POLYGON ((1 1,1 2,2 2,2 1,1 1)),((0 0,1 -1,3 4,-2 3,0 0))') 
p7 =ogr.CreateGeometryFromWkt('POLYGON ((1 1,1 2,2 2,2 1,1 1),(0 0,1 -1,3 4,-2 3,0 0))')
p8 =ogr.CreateGeometryFromWkt('MULTIPOLYGON (((1 1,1 2,2 2,2 1,1 1)),((0 0,1 -1,3 4,-2 3,0 0)) )')
p9 =ogr.CreateGeometryFromWkt('POINT EMPTY')
p10=ogr.CreateGeometryFromWkt('LINESTRING EMPTY')
p11=ogr.CreateGeometryFromWkt('POLYGON EMPTY')
p12=ogr.CreateGeometryFromWkt('MULTIPOINT EMPTY')
p13=ogr.CreateGeometryFromWkt('MULTILINESTRING EMPTY')
p14=ogr.CreateGeometryFromWkt('MULTIPOLYGON EMPTY')
p15=ogr.CreateGeometryFromWkt('GEOMETRYCOLLECTION EMPTY')
p16=ogr.CreateGeometryFromWkt('CIRCULARSTRING (0 2, -1 1,0 0, 0.5 0, 1 0, 2 1, 1 2, 0.5 2, 0 2)')
p17=ogr.CreateGeometryFromWkt('COMPOUNDCURVE(CIRCULARSTRING(0 2, -1 1,0 0),(0 0, 0.5 0, 1 0),CIRCULARSTRING( 1 0, 2 1, 1 2),(1 2, 0.5 2, 0 2))')
p18=ogr.CreateGeometryFromWkt('GEOMETRYCOLLECTION ( LINESTRING ( 90 190, 120 190, 50 60, 130 10, 190 50, 160 90, 10 150, 90 190 ), POINT(90 190) ) ')
p19=ogr.CreateGeometryFromWkt('MULTICURVE ((5 5, 3 5, 3 3, 0 3), CIRCULARSTRING (0 0, 0.2 1, 0.5 1.4), COMPOUNDCURVE (CIRCULARSTRING (0 0,1 1,1 0),(1 0,0 1)))')
p20=ogr.CreateGeometryFromWkt('CURVEPOLYGON(CIRCULARSTRING(0 0, 4 0, 4 4, 0 4, 0 0),(1 1, 3 3, 3 1, 1 1))')
p21=ogr.CreateGeometryFromWkt('CURVEPOLYGON(COMPOUNDCURVE(CIRCULARSTRING(0 0,2 0, 2 1, 2 3, 4 3),(4 3, 4 5, 1 4, 0 0)), CIRCULARSTRING(1.7 1, 1.4 0.4, 1.6 0.4, 1.6 0.5, 1.7 1) )')
p22=ogr.CreateGeometryFromWkt('MULTISURFACE(CURVEPOLYGON(CIRCULARSTRING(0 0, 4 0, 4 4, 0 4, 0 0),(1 1, 3 3, 3 1, 1 1)),((10 10, 14 12, 11 10, 10 10),(11 11, 11.5 11, 11 11.5, 11 11)))')
p23=ogr.CreateGeometryFromWkt('MULTISURFACE Z (CURVEPOLYGON Z (CIRCULARSTRING Z (-2 0 0, -1 -1 1, 0 0 2, 1 -1 3, 2 0 4, 0 2 2, -2 0 0), (-1 0 1, 0 0.5 2, 1 0 3, 0 1 3, -1 0 1)), ((7 8 7, 10 10 5, 6 14 3, 4 11 4, 7 8 7)))')
p24=ogr.CreateGeometryFromWkt('MULTISURFACE (CURVEPOLYGON (CIRCULARSTRING (-2 0, -1 -1, 0 0, 1 -1, 2 0, 0 2, -2 0), (-1 0, 0 0.5, 1 0, 0 1, -1 0)), ((7 8, 10 10, 6 14, 4 11, 7 8)))')
p25=ogr.CreateGeometryFromWkt('POLYHEDRALSURFACE (((0 0,0 0,0 1,0 0)),((0 0,0 1,1 0,0 0)),((0 0,1 0,0 0,0 0)),((1 0,0 1,0 0,1 0)))')
p26=ogr.CreateGeometryFromWkt('TRIANGLE ((1 2,4 5,7 8,1 2))')
p27=ogr.CreateGeometryFromWkt('TIN ( ((0 0, 0 0, 0 1, 0 0)), ((0 0, 0 1, 1 1, 0 0)) )')

isValid0 =p0.IsValid()
isValid1 =p1.IsValid()
isValid2 =p2.IsValid()
isValid3 =p3.IsValid()
isValid4 =p4.IsValid()
isValid5 =p5.IsValid()
isValid6 =p6.IsValid()
isValid7 =p7.IsValid()
isValid8 =p8.IsValid()
isValid9 =p9.IsValid()
isValid10=p10.IsValid()
isValid11=p11.IsValid()
isValid12=p12.IsValid()
isValid13=p13.IsValid()
isValid14=p14.IsValid()
isValid15=p15.IsValid()
isValid16=p16.IsValid()
isValid17=p17.IsValid()
isValid18=p18.IsValid()
isValid19=p19.IsValid()
isValid20=p20.IsValid()
isValid21=p21.IsValid()
isValid22=p22.IsValid()
isValid23=p23.IsValid()
isValid24=p24.IsValid()
isValid25=p25.IsValid()
isValid26=p26.IsValid()
isValid27=p27.IsValid()

isValid0 
isValid1 
isValid2 
isValid3 
isValid4 
isValid5 
isValid6 
isValid7 
isValid8 
isValid9 
isValid10
isValid11
isValid12
isValid13
isValid14
isValid15
isValid16
isValid17
isValid18
isValid19
isValid20
isValid21
isValid22
isValid23
isValid24
isValid25
isValid26
isValid27

test result :

>>> isValid0 
True
>>> isValid1 
True
>>> isValid2 
True
>>> isValid3 
True
>>> isValid4 
True
>>> isValid5 
True
>>> isValid6 
True
>>> isValid7 
False
>>> isValid8 
False
>>> isValid9 
True
>>> isValid10
True
>>> isValid11
True
>>> isValid12
True
>>> isValid13
True
>>> isValid14
True
>>> isValid15
True
>>> isValid16
True
>>> isValid17
True
>>> isValid18
True
>>> isValid19
True
>>> isValid20
True
>>> isValid21
True
>>> isValid22
False
>>> isValid23
True
>>> isValid24
True
>>> isValid25
False
>>> isValid26
False
>>> isValid27
False

ST_Union_Aggr_UDF error

ST_Union_Aggr_UDF throw exception when multipolygon is combined with others.

arctern test code :

def run_st_union(spark):
    register_funcs(spark)
    test_data1 = []
    test_data1.extend([('MULTIPOINT (1 1,3 4)',)])
    test_data1.extend([('LINESTRING (1 1,1 2,2 3)',)]) 
    test_data1.extend([('MULTILINESTRING ((1 1,1 2),(2 4,1 9,1 8))',)])
    test_data1.extend([('MULTILINESTRING ((1 1,3 4))',)])
    test_data1.extend([('POLYGON ((1 1,1 2,2 2,2 1,1 1))',)])
    test_data1.extend([('MULTIPOLYGON ( ((1 1,1 2,2 2,2 1,1 1)),((0 0,1 -1,3 4,-2 3,0 0)) )',)]) # topologyEX
    union_aggr_df1 = spark.createDataFrame(data=test_data1, schema=['geos']).cache()
    union_aggr_df1.createOrReplaceTempView("union_aggr1")
    rs = spark.sql("select ST_Union_Aggr_UDF(geos) from union_aggr1").show(100,0) 

postgis sql :

drop table if exists test_union;
create table test_union (geos geometry);
insert into test_union values 
('MULTIPOINT (1 1,3 4)'),
('LINESTRING (1 1,1 2,2 3)'),
('MULTILINESTRING ((1 1,1 2),(2 4,1 9,1 8))'), 
('MULTILINESTRING ((1 1,3 4))'),
('POLYGON ((1 1,1 2,2 2,2 1,1 1))'),
('MULTIPOLYGON (((1 1,1 2,2 2,2 1,1 1)),((0 0,1 -1,3 4,-2 3,0 0)) )')
;
select st_astext(st_union(geos)) from test_union;

arctern result :

ERROR 1: TopologyException: Input geom 1 is invalid: Self-intersection at or near point 1.8 1 at 1.8 1
ERROR 10: Pointer 'hGeom' is NULL in 'OGR_G_ExportToWkt'.

terminate called after throwing an instance of 'std::runtime_error'
  what():  gdal error code = 6

postgis result :

GEOMETRYCOLLECTION(LINESTRING(2 4,1 9,1 8),POLYGON((2 1.5,2 1,1.8 1,1 -1,0 0,-2 3,3 4,2 1.5)))

ST_Contains bug

I got different output when I use specific wkt as input to the ST_Contains function (compared to geospark).


  • geospark test :
    spark.sql("SELECT ST_Contains( ST_GeomFromWKT('POLYGON ((0 0,4 0,4 4,0 4,0 0))') , ST_GeomFromWKT('POINT (4 0)') )").show(false)
    output : true

  • GIS test :
    wkt_arrow_array1 = { POLYGON ((0 0,4 0,4 4,0 4,0 0))}
    wkt_arrow_array2 = { POINT (4 0)}
    zilliz::gis::ST_Contains(wkt_arrow_array1,wkt_arrow_array2)
    output : false

  • postgis test :
    select st_contains('POLYGON ((0 0,4 0,4 4,0 4,0 0))'::geometry,'POINT (4 0)'::geometry);
    output : false

Conda environment conflicts with the system environment

Describe the bug
The version of libprotobuf is 2.6.1 in the system environment,but it is 3.11.0 in the conda environment. When I execute unittest, the program reports an error.

Steps/Code to reproduce behavior

[libprotobuf FATAL google/protobuf/stubs/common.cc:87] This program was compiled against version 2.6.1 of the Protocol Buffer runtime library, which is not compatible with the installed version (3.11.0).  Contact the program author for an update.  If you compiled the program yourself, make sure that your headers are from the same version of Protocol Buffers as your link-time library.  (Version verification failed in "/build/mir-O8_xaj/mir-0.26.3+16.04.20170605/obj-x86_64-linux-gnu/src/protobuf/mir_protobuf.pb.cc".)

[2020-02-17T13:26:45.455Z] terminate called after throwing an instance of 'google::protobuf::FatalException'

[2020-02-17T13:26:45.455Z]   what():  This program was compiled against version 2.6.1 of the Protocol Buffer runtime library, which is not compatible with the installed version (3.11.0).  Contact the program author for an update.  If you compiled the program yourself, make sure that your headers are from the same version of Protocol Buffers as your link-time library.  (Version verification failed in "/build/mir-O8_xaj/mir-0.26.3+16.04.20170605/obj-x86_64-linux-gnu/src/protobuf/mir_protobuf.pb.cc".)

Expected behavior

Execute unittest and return correct results in docker

Environment details

  • Ubuntu 18.04 x86_64
  • Docker version 19.03.1
  • GIS v0.1.0 GPU build environment Docker image
  • conda branch

ST_IsValid crashed if input is not valid geometry

If the input is not valid geometry,like Im not polygon, ST_IsValid will crash and throw exception with error message

unknown file: Failure
C++ exception with description "gdal error code = 3" thrown in the test body.

This is the test code,and it will throw exception.

arrow::StringBuilder string_builder;
std::shared_ptr<arrow::Array> polygons;
string_builder.Append("my is not polygon");
string_builder.Finish(&polygons);
auto vaild_mark = ST_IsValid(polygons);

what would happen if cpp throw exception?

In the macro of CHECK_GDAL, it would throw an exception of std::runtime_error if gdal return error, would python catch this exception?
what happen in pyspark when cpp throw exception?

Add python wrapper for render engine

The following design issues need to be discussed:

  1. should we use pyarrow as interface?
  2. how to organize vega as part of interface?
  3. do we need a map for passing meta?

st_npoints difference

in postgis:
select st_npoints(st_geomfromtext('POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))'));
select st_npoints(st_geomfromtext('POLYGON ((1 2, 3 4, 5 6, 1 2))'));
select st_npoints(st_geomfromtext('POLYGON ((1 1, 3 1, 3 3, 1 3, 1 1))'));
select st_npoints(st_geomfromtext('MULTIPOINT(0 0, 7 7)'));
select st_npoints(st_geomfromtext('GEOMETRYCOLLECTION(POINT(1 1), LINESTRING( 1 1 , 2 2, 3 3))'));
select st_npoints(st_geomfromtext('POINT EMPTY'));

results
5
4
5
2
4
0

in arctern:
results
0
0
0
0
0
1

st_intersection issue

for data below:
{"left": "POLYGON ((40 21, 40 22, 40 23, 40 21))", "right": "POLYGON ((2 2, 9 2, 9 9, 2 9, 2 2))"}
{"left": "POINT(1 3)", "right": "LINESTRING (0 0, 10 10)"}
{"left": "POINT(-1 4)", "right": "LINESTRING (0 0, 10 10)"}
{"left": "POINT(10 1)", "right": "LINESTRING (0 0, 10 10)"}
{"left": "POINT(7 9)", "right": "LINESTRING (0 0, 10 10)"}

in arctern:
{"ST_Intersection_UDF(left, right)":"POLYGON EMPTY"}
{"ST_Intersection_UDF(left, right)":"POINT EMPTY"}
{"ST_Intersection_UDF(left, right)":"POINT EMPTY"}
{"ST_Intersection_UDF(left, right)":"POINT EMPTY"}
{"ST_Intersection_UDF(left, right)":"POINT EMPTY"}

in postgis:
GEOMETRYCOLLECTION EMPTY
GEOMETRYCOLLECTION EMPTY
GEOMETRYCOLLECTION EMPTY
GEOMETRYCOLLECTION EMPTY
GEOMETRYCOLLECTION EMPTY

ST_Length bug

I got different output when I use polygon's wkt as input to the ST_Length function (compared to geospark).


  • geospark test :
    spark.sql("SELECT ST_Length(ST_GeomFromWKT('POLYGON ((0 0, 1 0, 1 1, 0 1, 0 0))'))").show(1,0)
    output : 4.0

    spark.sql("SELECT ST_Length(ST_GeomFromWKT('MULTIPOLYGON ( ((0 0, 1 4, 1 0,0 0)))'))").show(1,0)
    output : 9.123105625617661

    spark.sql("SELECT ST_Length(ST_GeomFromWKT('MULTIPOLYGON ( ((0 0, 0 4, 4 4, 4 0, 0 0)), ((0 0, 0 1, 4 1, 4 0, 0 0)) )'))").show(1,0)
    output : 26.0


  • GIS test :
    wkt_arrow_array = {POLYGON ((0 0, 1 0, 1 1, 0 1, 0 0)) ,MULTIPOLYGON ( ((0 0, 1 4, 1 0,0 0)) ) , MULTIPOLYGON ( ((0 0, 0 4, 4 4, 4 0, 0 0)), ((0 0, 0 1, 4 1, 4 0, 0 0)) )}
    zilliz::gis::ST_Area(wkt_arrow_array)
    output : 0 , 0 , 0

  • postgis test
    output : 0 , 0 , 0

st_envelope_udf empty geoms results different with postgis

our sql:
select st_envelope_udf(geos) as geos from test_envelope
input:
{"geos": "POLYGON EMPTY"}
{"geos": "LINESTRING EMPTY"}
{"geos": "POINT EMPTY"}
{"geos": "MULTIPOLYGON EMPTY"}
{"geos": "MULTILINESTRING EMPTY"}
{"geos": "MULTIPOINT EMPTY"}
{"geos": "GEOMETRYCOLLECTION EMPTY"}

result:
{"geos":"POINT (0 0)"}
{"geos":"POINT (0 0)"}
{"geos":"POINT (0 0)"}
{"geos":"POINT (0 0)"}
{"geos":"POINT (0 0)"}
{"geos":"POINT (0 0)"}
{"geos":"POINT (0 0)"}

in POSTGIS
sqls:
select st_astext(st_envelope('POLYGON EMPTY'::geometry));
select st_astext(st_envelope('LINESTRING EMPTY'::geometry));
select st_astext(st_envelope('POINT EMPTY'::geometry));
select st_astext(st_envelope('MULTIPOLYGON EMPTY'::geometry));
select st_astext(st_envelope('MULTILINESTRING EMPTY'::geometry));
select st_astext(st_envelope('MULTIPOINT EMPTY'::geometry));
select st_astext(st_envelope('GEOMETRYCOLLECTION EMPTY'::geometry));

result:
POLYGON EMPTY
LINESTRING EMPTY
POINT EMPTY
MULTIPOLYGON EMPTY
MULTILINESTRING EMPTY
MULTIPOINT EMPTY
GEOMETRYCOLLECTION EMPTY

conda branch cannot be compiled with multiple threads

I encountered the following problem when compiling with “make -j10”:
/GIS/cpp/src/render/utils/my_zlib_compress.h:1:33: fatal error: stb/stb_image_write.h: No such file or directory.
However,compiling passed when using "make".

check geometry type before call ST_Area and ST_Length

It print followed warning when I run unittest

[ RUN      ] geometry_test.test_ST_Area
Warning 1: OGR_G_Area() called against non-surface geometry type.
Warning 1: OGR_G_Area() called against non-surface geometry type.
Warning 1: OGR_G_Area() called against non-surface geometry type.
[       OK ] geometry_test.test_ST_Area (1 ms)
[ RUN      ] geometry_test.test_ST_Centroid
[       OK ] geometry_test.test_ST_Centroid (0 ms)
[ RUN      ] geometry_test.test_ST_Length
Warning 1: OGR_G_Length() called against a non-curve geometry type.
Warning 1: OGR_G_Length() called against a non-curve geometry type.
Warning 1: OGR_G_Length() called against a non-curve geometry type.
Warning 1: OGR_G_Length() called against a non-curve geometry type.
Warning 1: OGR_G_Length() called against a non-curve geometry type.
Warning 1: OGR_G_Length() called against a non-curve geometry type.
[       OK ] geometry_test.test_ST_Length (0 ms)

So, I suggest to check geometry type before call ST_Area and ST_Length

st_area_udf of a linestring should be 0

sql: select st_area_udf(geos) as my_area from test_area

data: {"geos": "LINESTRING (77.29 29.07,77.42 29.26,77.27 29.31,77.29 29.07)"}

result: {"my_area":0.01750000000000007}

expected: 0.0

guess in this case it was treated as a polygon

Add pod tolerations to Jenkins slave pods

Is your feature request related to a problem? Please describe.

Describe the solution you'd like
Add pod tolerations to Jenkins slave pods

Describe alternatives you've considered

Additional context

st_issimple difference

postgis
SELECT ST_isSimple('POLYGON ((1 2, 3 4, 5 6, 1 2))'::geometry);
result:
t (means true)

per actern, the result is false

select st_isvalid_udf(null) raise exception

sql:
select st_isvalid_udf(null)

raise exception

however per GeoSpark, should not raise exception here
log:

org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/usr/local/bin/spark/python/lib/pyspark.zip/pyspark/worker.py", line 577, in main
    eval_type = read_int(infile)
  File "/usr/local/bin/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 837, in read_int
    raise EOFError
EOFError

        at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:484)
        at org.apache.spark.sql.execution.python.PythonArrowOutput$$anon$1.read(PythonArrowOutput.scala:99)
        at org.apache.spark.sql.execution.python.PythonArrowOutput$$anon$1.read(PythonArrowOutput.scala:49)
        at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:437)
        at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:489)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
        at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:726)
        at org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anon$1.hasNext(InMemoryRelation.scala:132)
        at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221)
        at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:299)
        at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1370)
        at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1297)
        at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1361)
        at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1185)
        at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:360)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:311)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:127)
        at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:441)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:444)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.UnsupportedOperationException: Unsupported data type: null

ST_Centroid bug

I got different output when I use specific wkt as input to the ST_Centroid function (compared to geospark).


  • geospark test :
    spark.sql("SELECT ST_Centroid(ST_GeomFromWKT('MULTIPOLYGON ( ((0 0, 1 4, 1 0,0 0)), ((0 0,1 0,0 1,0 0)) )'))").show(1,0)
    output : POINT (0.7777777777777778 1.6666666666666667)

  • GIS test :
    wkt_arrow_array = {MULTIPOLYGON ( ((0 0, 1 4, 1 0,0 0)), ((0 0,1 0,0 1,0 0)) )}
    zilliz::gis::ST_Centroid(wkt_arrow_array)
    output : POINT (0.6 1.13333333333333)

  • postgis test :
    select st_astext(st_centroid('MULTIPOLYGON ( ((0 0, 1 4, 1 0,0 0)), ((0 0,1 0,0 1,0 0)) )'::geometry));
    output : POINT(0.6 1.13333333333333)

ST_Area bug

I got different output when I use specific wkt as input to the ST_Area function (compared to geospark).


  • geospark test :
    spark.sql("SELECT ST_Area(ST_GeomFromWKT('LINESTRING (0 0, 1 0, 1 1, 0 0)'))").show(1,0)
    output : 0

    spark.sql("SELECT ST_Area(ST_GeomFromWKT('MULTIPOLYGON ( ((0 0, 1 4, 1 0,0 0)), ((0 0,1 0,0 1,0 0)) ) '))").show(1,0)
    output : 1.5


  • GIS test :
    wkt_arrow_array = {LINESTRING (0 0, 1 0, 1 1, 0 0) , MULTIPOLYGON ( ((0 0, 1 4, 1 0,0 0)), ((0 0,1 0,0 1,0 0)) ) }
    zilliz::gis::ST_Area(wkt_arrow_array)
    output : 0.5 ,2.5

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.