Coder Social home page Coder Social logo

spatial-group-detection's Introduction

Installation

  • Setup environment
virtualenv venv
source venv/bin/activate
pip install -r requirements.txt
  • Download the data
mkdir data
mkdir images

Download the file into the data folder

Then run the program :

python run.py

The output will be the number of cluster found at each second, and the event (json format) at every change.

Example

{
    "type": "decrease",
    "date": 2,
    "centroid": (1.0, 1.0),
    "people_out": set([17]),
    "previous_cluster_id": 15,
    "cluster_id": 16,
    "population": set([16, 15]),
    "density": 0.0
}

If you change the value of the variable PLOT in run.py, you will have the plot of each step (with colored clusters) in the folder images.

To run the tests :

nosetests --rednose --force-color tests/
  • IPython Notebook

You can browse the file exploration.ipynb which explain the choice of some variables. (PS: The first graph doesn't seem to work)

Main goal

The main goal of this project is to detect groups creation/modification/deletion on temporal and geo-spatial data.

Problematics

The Accuracy of the data

Each position comes with an accuracy which can be very low (high value in meters of imprecision). We then face multiple problems :

  • Distance metric
  • Shape computation
  • Centroid localization

The sampling of the data

The series of points for a user is not regularly sampled and like the accuracy there can be long moment (many hours) between two points. We have to :

  • Handle time series with big gaps (cut in multiple parts)
  • Predict the short term position when the gap is reasonable (we can use the information of the past trajectory)

The clustering

The main problem is to group people without knowing the number of group. We can only use the repartition of the points (and its evolutions) to extract this information.

Approach

Sampling and accuracy

  • Discard the outliers

    • Using the exploration.ipynb, we chose 60 meters as a minimum viable accuracy
    • We also removed series with elapsed time > 1 minute
  • Stationary interpolation of the position if the last position was less than 1 minute before

With this interpolation, we can re-use the series we removed earlier : if no point was detected for 1 minute, the algorithm will not consider it.

Clustering

Unsupervised algorithm to cluster people based on their closeness :

  • We do not want to put everyone in a cluster
  • The cluster must be very dense
  • The number k of clusters can change every second

We used the DBSCAN algorithm for multiple reasons and mainly for its weaknesses :

  • Handle only regularly dense clusters
  • Comes with the parameter epsilon which defines the maximum distance between two samples
  • Is less effective with high dimensional problems

See A Review: Comparative Study of Various Clustering Techniques in Data Mining (short and precise)

Distance metric :

We chose a distance metric that could be easily understandable, to be able to choose the best epsilon for DBSCAN.

  • First we used the Haversine formula between two points
  • Then if the accuracy was very low, we weighted it by using the max and min distances possible between points

Results

Parameters

The only parameters would be the accuracy filter and the epsilon of DBSCAN. The latter is easy to choose when there are no accuracy problems : because we used the Haversine distance, we can reasonably select a minimum distance between two people in the same groups (for instance 4 meters). With the accuracy, we chose 50 meters empirically.

Moreover the sparse positions in the dataset allow us to be less strict on the minimum distance between dots.

Screenshot

Here is an example group evolutions :

Group Evolution

There is also a video of the first evolutions :

Video of the 15241 first seconds

To be done (short term)

  • Validate group modification only if a certain amount of time has passed (sample not just passing through)
  • Enhance accuracy handling in distance metric
  • Better graphical output
  • Bug fixing / Refacto

Future work

Trajectories

  • The density of the accuracy circle could be higher in the point 'global' direction. It would change the distance computation. We could use the Ramer-Douglas-Peucker algorithm to extract the general path of the point.
  • Could be use have a better interpolation.

Contextual informations

Graphical

  • Add circle of the size of the accuracy with a corresponding alpha
  • Add the map of Paris

Handle mobility

Bibliography / Sources / To read

spatial-group-detection's People

Contributors

tillwf avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.