dbis-ilm / stark Goto Github PK

View Code? Open in Web Editor NEW

50.0 7.0 13.0 9.52 MB

A framework for Spatio-Temporal Data Analytics on Spark

Scala 92.71% Python 0.81% Java 6.47%

spatial rdd spatial-data-analysis spatio-temporal-data apache-spark scala data-analysis

stark's People

Contributors

Stargazers

Watchers

Forkers

exlimit darcular giserh geoheil skmbw gaimjkp shuhaozhangtony hogwartsrico ql1028 alphairys suhashimmareddy agent001 aocalderon

stark's Issues

Question about Cluster operation

Hi, Thank you for this great application. I am using it for clustering certain geospatial data with lon/lat. I have a question about the unit of epsilon parameter in your DBSCAN implementation. If I want to set 0.5 km for eps, should I calculate the epsilon by 0.5 / kms_per_radian ? I have tried it and got memory overflow or exceed kryoserializer buffer size. I think the reason is the CellSize for BSPartitioner is too small due to the input epsilon value.

Question about data size limits

Hi, first of all thanks for the application! I've been trying it out with different datasets and it works great with the smaller ones! But the application stalls with bigger datasets. My particular case is a dataset of 120GB with 2000 million records, and I want to run DBScan with an eps of 0.0001. I don't know if maybe I'm configuring the parameter ppd badly (with a value of 100 it stalls indefinitely, but with smaller values there seems to be a progress...even though it still hangs), or if it won't work with this large dataset and such a small eps.
Is there any chance that I'm configuring it wrong? Thanks in advance!

Unresolved dependencies when building Stark

Hi, thanks for this tool!
I downloaded the zip file from GitHub and execute sbt assembly within Stark directory but I've got the attached Unresolved Dependencies error:

StartBuildingError.txt

It seems that set is searching sbt-assembly and sbt-scoverage with wrong Scala version (2.12) but I don't know how to solve this issue

Outputting incorrect data

I’ve successfully gotten the library to output data after taking in a RDD of points and a RDD of polygons, but after manually testing the results of the join (contains, and intersect) operations, the results don’t actually seem to be accurate when plotted on a map.

The RDDs are of the format

(STObject(WKTstring), (arr(id).toLong, WKTstring))

. The Point RDD has 10,000 items, while the Polygon RDD has 500,000+. My join command is
polygonsRDDA.join(pointsRDDA, JoinPredicate.CONTAINS)
I'm fairly certain the format is correct, as are the WKTstrings, since I'm getting a valid
[(polygon_id, WKT)(point_id, WKT)] output RDD, with substantial data.

Here is one row of the output:

[7968,POINT (77.2221885273425 28.5089347347766)]|
|[929587445047033467,POLYGON ((77.24398775026202 28.61936221830547, 77.24380536004901 28.61944234929979, 77.24360687658191 28.61956941895187, 77.24423987790942 28.620445327833295, 77.24442763254046 28.62033703364432, 77.24459392949939 28.620238127186894, 77.2441808693111 28.61965893767774, 77.24398775026202 28.61936221830547))]

Plugging into a WKT visualizer, you can observe that the polygon and point are in fact far away from eachother.

Any help would be appreciated!

dbis-ilm / stark Goto Github PK

stark's People

Contributors

Stargazers

Watchers

Forkers

stark's Issues

Question about Cluster operation

Question about data size limits

Unresolved dependencies when building Stark

Outputting incorrect data

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent