Coder Social home page Coder Social logo

stark's People

Contributors

hag0r avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

stark's Issues

Question about Cluster operation

Hi, Thank you for this great application. I am using it for clustering certain geospatial data with lon/lat. I have a question about the unit of epsilon parameter in your DBSCAN implementation. If I want to set 0.5 km for eps, should I calculate the epsilon by 0.5 / kms_per_radian ? I have tried it and got memory overflow or exceed kryoserializer buffer size. I think the reason is the CellSize for BSPartitioner is too small due to the input epsilon value.

Question about data size limits

Hi, first of all thanks for the application! I've been trying it out with different datasets and it works great with the smaller ones! But the application stalls with bigger datasets. My particular case is a dataset of 120GB with 2000 million records, and I want to run DBScan with an eps of 0.0001. I don't know if maybe I'm configuring the parameter ppd badly (with a value of 100 it stalls indefinitely, but with smaller values there seems to be a progress...even though it still hangs), or if it won't work with this large dataset and such a small eps.
Is there any chance that I'm configuring it wrong? Thanks in advance!

Unresolved dependencies when building Stark

Hi, thanks for this tool!
I downloaded the zip file from GitHub and execute sbt assembly within Stark directory but I've got the attached Unresolved Dependencies error:

StartBuildingError.txt

It seems that set is searching sbt-assembly and sbt-scoverage with wrong Scala version (2.12) but I don't know how to solve this issue

Outputting incorrect data

I’ve successfully gotten the library to output data after taking in a RDD of points and a RDD of polygons, but after manually testing the results of the join (contains, and intersect) operations, the results don’t actually seem to be accurate when plotted on a map.

The RDDs are of the format

(STObject(WKTstring), (arr(id).toLong, WKTstring))

. The Point RDD has 10,000 items, while the Polygon RDD has 500,000+. My join command is
polygonsRDDA.join(pointsRDDA, JoinPredicate.CONTAINS)
I'm fairly certain the format is correct, as are the WKTstrings, since I'm getting a valid
[(polygon_id, WKT)(point_id, WKT)] output RDD, with substantial data.

Here is one row of the output:

[7968,POINT (77.2221885273425 28.5089347347766)]|
|[929587445047033467,POLYGON ((77.24398775026202 28.61936221830547, 77.24380536004901 28.61944234929979, 77.24360687658191 28.61956941895187, 77.24423987790942 28.620445327833295, 77.24442763254046 28.62033703364432, 77.24459392949939 28.620238127186894, 77.2441808693111 28.61965893767774, 77.24398775026202 28.61936221830547))]

Plugging into a WKT visualizer, you can observe that the polygon and point are in fact far away from eachother.

Any help would be appreciated!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.