Coder Social home page Coder Social logo

bigdist's Introduction

Welcome! I am Srikanth Komala Sheshachala

  • Interests:

    • Causal inference
    • Interpretable machine learning
    • Setting up, running and infering from AB tests and multi-armed bandits
    • Recommender Systems and Personalization
    • Optimization (Operations Research)
    • Geospatial analysis / services
    • Graph theoretic approaches to problems in data science
  • Job: Senior manager (data scientist) at Walmart Gobal Tech, India.
    Previously, Olacabs, DISH corporation, Diet Code, Cognizant, Infosys, ...

  • Languages: Python, R, sprinking of C++

  • Contributions: Author and maintainer of these open source projects:

Drop me a message: gmail sri.teach, linkedin srikanthks01, stackoverflow

bigdist's People

Contributors

privefl avatar talegari avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

privefl ermueller

bigdist's Issues

Is it posisble to use a bigdist FM as an input for fastcluster::hclust?

Dear Prof Shrikanth,

I am faced with a problem that requires the production of a very large distance matrix (6.9 gB) and I wish to create a hierarchical clustering using the Ward method.

So far I have been able to successfully utilize your R library (BigDist) to create a FMB of the distance matrix and store it on a local drive. However, I have been searching for a hierarchical clustering solution for such a large distance matrix, and have yet to find a solution. I started with the obvious choice of fastcluster::hclust() as follows:

d3bigdist <- bigdist(mat = d3fordist, file = file.path("Output/distYFTbig")) ## note d3fordist is a large matrix with 257370 elements, size = 2Mb

can't get bigDist FBM to work with hclust

##Connect to that big dist object on file
temp2 <- bigdist(file = file.path("Output/distYFTbig_42895_float"))

hcYFT <- fastcluster::hclust(temp2$fbm, method = "ward.D2")
print(Sys.time())

Error in fastcluster::hclust(temp2$fbm, method = "ward.D2") :
'N' must be a single integer.

Do you know of an approach that allows a function, like hclust, to access the data within the FBM and piece, by piece, build a hierarchical tree? Or must I appropriately sample the FBM, build a tree, and then append remaining data from the FBM?

I have been working on a Windows machine with 8 gB of RAM - would working on a linux platform make any difference?

post note: Why am I trying to do this? I have been handed some legacy code, written in SAS, that has successfully used the Ward method to hierarchically cluster 42895 observations of 6 variables.
I have not been able to find/construct a solution that mirrors this process in R. I have had success using partition-based clustering (from the Kmeans, CLARA packages) โ€“ however, it would be great if I could also compare these approaches with the hierarchical approach used in the original SAS code.

Warm regards

Jim Dell

bigdist_subset fails for last index

If you have a bigdist-object created as follows:

D = as_bigdist(dist(c(1:12)), file='somefile')

and you try to create a subset including the last row...

D_sub = bigdist_subset(D, 10:12, file='subset_file')

you will get an error:

max(index) not less than sz

I assume that one would just have to replace a '<' by a '<=' somewhere in the code in order to fix this

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.