Coder Social home page Coder Social logo

ssq / coursera-uw-machine-learning-clustering-retrieval Goto Github PK

View Code? Open in Web Editor NEW
27.0 2.0 28.0 83.91 MB

License: MIT License

Python 100.00%
k-nearest-neighbors word-count tf-idf approximate-nearest-neighbor-search locality-sensitive-hashing kd-tree mapreduce k-means k-means-plus-plus mixture-model

coursera-uw-machine-learning-clustering-retrieval's Introduction

Coursera UW Machine Learning Clustering & Retrieval

Course can be found in Coursera

Notebook for quick search can be found in my blog SSQ

Videos in Bilibili(to which I post it)

  • Week 1 Intro

  • Week 2 Nearest Neighbor Search: Retrieving Documents

    • Implement nearest neighbor search for retrieval tasks
    • Contrast document representations (e.g., raw word counts, tf-idf,…)
      • Emphasize important words using tf-idf
    • Contrast methods for measuring similarity between two documents
      • Euclidean vs. weighted Euclidean
      • Cosine similarity vs. similarity via unnormalized inner product
    • Describe complexity of brute force search
    • Implement KD-trees for nearest neighbor search
    • Implement LSH for approximate nearest neighbor search
    • Compare pros and cons of KD-trees and LSH, and decide which is more appropriate for given dataset
    • Choosing features and metrics for nearest neighbor search
    • Implementing Locality Sensitive Hashing from scratch
  • Week 3 Clustering with k-means

    • Describe potential applications of clustering
    • Describe the input (unlabeled observations) and output (labels) of a clustering algorithm
    • Determine whether a task is supervised or unsupervised
    • Cluster documents using k-means
    • Interpret k-means as a coordinate descent algorithm
    • Define data parallel problems
    • Explain Map and Reduce steps of MapReduce framework
    • Use existing MapReduce implementations to parallelize kmeans, understanding what’s being done under the hood
    • Clustering text data with k-means
  • Week 4 Mixture Models: Model-Based Clustering

    • Interpret a probabilistic model-based approach to clustering using mixture models
    • Describe model parameters
    • Motivate the utility of soft assignments and describe what they represent
    • Discuss issues related to how the number of parameters grow with the number of dimensions
      • Interpret diagonal covariance versions of mixtures of Gaussians
    • Compare and contrast mixtures of Gaussians and k-means
    • Implement an EM algorithm for inferring soft assignments and cluster parameters
      • Determine an initialization strategy
      • Implement a variant that helps avoid overfitting issues
    • Implementing EM for Gaussian mixtures
    • Clustering text data with Gaussian mixtures
  • Week 5 Latent Dirichlet Allocation: Mixed Membership Modeling

    • Compare and contrast clustering and mixed membership models
    • Describe a document clustering model for the bagof-words doc representation
    • Interpret the components of the LDA mixed membership model
    • Analyze a learned LDA model
      • Topics in the corpus
      • Topics per document
    • Describe Gibbs sampling steps at a high level
    • Utilize Gibbs sampling output to form predictions or estimate model parameters
    • Implement collapsed Gibbs sampling for LDA
    • Modeling text topics with Latent Dirichlet Allocation
  • Week 6 Hierarchical Clustering & Closing Remarks

    • Bonus content: Hierarchical clustering
      • Divisive clustering
      • Agglomerative clustering
        • The dendrogram for agglomerative clustering
        • Agglomerative clustering details
    • Hidden Markov models (HMMs): Another notion of “clustering”
    • Modeling text data with a hierarchy of clusters

coursera-uw-machine-learning-clustering-retrieval's People

Contributors

ssq avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

coursera-uw-machine-learning-clustering-retrieval's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.