dbis-ilm / stark Goto Github PK
View Code? Open in Web Editor NEWA framework for Spatio-Temporal Data Analytics on Spark
A framework for Spatio-Temporal Data Analytics on Spark
Hi, Thank you for this great application. I am using it for clustering certain geospatial data with lon/lat. I have a question about the unit of epsilon parameter in your DBSCAN implementation. If I want to set 0.5 km for eps, should I calculate the epsilon by 0.5 / kms_per_radian ? I have tried it and got memory overflow or exceed kryoserializer buffer size. I think the reason is the CellSize for BSPartitioner is too small due to the input epsilon value.
Hi, first of all thanks for the application! I've been trying it out with different datasets and it works great with the smaller ones! But the application stalls with bigger datasets. My particular case is a dataset of 120GB with 2000 million records, and I want to run DBScan with an eps of 0.0001. I don't know if maybe I'm configuring the parameter ppd badly (with a value of 100 it stalls indefinitely, but with smaller values there seems to be a progress...even though it still hangs), or if it won't work with this large dataset and such a small eps.
Is there any chance that I'm configuring it wrong? Thanks in advance!
Hi, thanks for this tool!
I downloaded the zip file from GitHub and execute sbt assembly
within Stark directory but I've got the attached Unresolved Dependencies error:
It seems that set is searching sbt-assembly
and sbt-scoverage
with wrong Scala version (2.12) but I don't know how to solve this issue
I’ve successfully gotten the library to output data after taking in a RDD of points and a RDD of polygons, but after manually testing the results of the join (contains, and intersect) operations, the results don’t actually seem to be accurate when plotted on a map.
The RDDs are of the format
(STObject(WKTstring), (arr(id).toLong, WKTstring))
. The Point RDD has 10,000 items, while the Polygon RDD has 500,000+. My join command is
polygonsRDDA.join(pointsRDDA, JoinPredicate.CONTAINS)
I'm fairly certain the format is correct, as are the WKTstrings, since I'm getting a valid
[(polygon_id, WKT)(point_id, WKT)]
output RDD, with substantial data.
Here is one row of the output:
[7968,POINT (77.2221885273425 28.5089347347766)]|
|[929587445047033467,POLYGON ((77.24398775026202 28.61936221830547, 77.24380536004901 28.61944234929979, 77.24360687658191 28.61956941895187, 77.24423987790942 28.620445327833295, 77.24442763254046 28.62033703364432, 77.24459392949939 28.620238127186894, 77.2441808693111 28.61965893767774, 77.24398775026202 28.61936221830547))]
Plugging into a WKT visualizer, you can observe that the polygon and point are in fact far away from eachother.
Any help would be appreciated!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.