Coder Social home page Coder Social logo

knn's Introduction

KNN

To know more about this project, please refer to https://dl.dropboxusercontent.com/u/37572555/Github/KNN/KNN.pdf

This is a course assignment, and we are supposed to implement the basic K-Nearest Neighbor algorithm. Below are the basic classes deisgned in the initial phase:

  • FileManager
    in charge of file operations
    • ReadFile: read training files and test files
    • ProcessData: normalize and standaize the data
    • WriteFile: output the predicted labels
  • Metric
    interface for defining different measurement methods
    • CosineSimilarity
    • L1Distance
    • EuclideanDistance
  • Record
    abstract class which contains attributes and class label
    • TrainRecord: + distance
    • TestRecord: + predictedLabel
  • knn
    the main class implementing the KNN algorithm
    • For each TestRecord, use a k-size container (like a heap) to maintain k TrainRecords which are the nearest to that Testrecord while going through all TrainRecords. (No need to store all the distances which is a kind of wasting memory)
    • During the classification phase, weigh the vote according to distance and assign the class lable with the largest weight.
    • Whether there is need to think about outlier???
    • If there are similar labels, enlarge the container's size dynamically

Questions:

  1. When normalizing data, do we need to consider TrainingData and TestData together or seperately?
    For different datasets, the max and min values might be different. As a result, the normalization results for TrainData and TestData are not unified.
    The Prof. said that the test data set is normalized according to the scheme deduced by training set.

  2. During KNN, is there a need to set a threshold to test outliers?
    Should be No since there are so many different datasets. We may nesure that all test cases have valid class labels.

knn's People

Contributors

wihoho avatar nawrasg avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.