Running DBScan with MinPts=1 and eps = 50 on the attached data returns 11 clusters, i

DBSCAN does 29 clusters with 0 noise. <div class="snippet-clipboard-content notran

Counting one too many clusters? about dbscan HOT 4 CLOSED

mhahsler commented on May 23, 2024

Counting one too many clusters?

from dbscan.

Comments (4)

peekxc commented on May 23, 2024

Perhaps there is something right with the file linked, but could you clarify the question a bit?

Running dbscan with the first two columns of your data results in 32 clusters with 0 noise points for me:

data <- read.csv("~/Downloads/42041320940000.csv")
data <- apply(as.matrix(na.omit(data))[, 1:4], 2, as.numeric)
res <- dbscan(data, eps=50, minPts = 1)

After converting the spreadsheet to csv. Looking at the spreadsheet attached, there are a number of things I notice immediately:

dimensions 3 and 4 are blank for record 1
dimension 4 is blank for record 2
There's a random 'x' character for dimension 5 at record 19, I assume this was unintentional?

I used second statement (apply) removes this data and cleans the data set, as dbscan expects a numeric matrix as input.

In your comments, you only mentioned dimensions A and B as the data. What about dimensions C and D? Also, may I ask why you are expecting 11 clusters?

from dbscan.

jdd112 commented on May 23, 2024

I apologize for the confusing excel document i have clarified it in the most recent version. I think the cluster count should be 29. I am mirroring the logic in Columns C & D. Column c calculates the distance to the next point and column d counts the clusters when these points are more than 50 away. could you please look this over and see if i am missing something?
Copy of 42041320940000.xlsx

from dbscan.

peekxc commented on May 23, 2024

DBSCAN does 29 clusters with 0 noise.

data <- read.csv("~/Documents/Copy.of.42041320940000.csv")
res <- dbscan(as.matrix(data[, 1:2]), eps=50, minPts = 1) # also works with just column 1
res

res
DBSCAN clustering for 191 objects.
Parameters: eps = 50, minPts = 1
The clustering contains 29 cluster(s) and 0 noise points.
...

It matches your fourth column as well

all(res$cluster == data[, 4]) # true

from dbscan.

jdd112 commented on May 23, 2024

yup, you are right ... I had max(res$cluster)+1 on my output ... so sorry to waste your time. Great work on this project.

from dbscan.

Recommend Projects

Counting one too many clusters? about dbscan HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent