Coder Social home page Coder Social logo

daffidwilde / kmodes-init-paper Goto Github PK

View Code? Open in Web Editor NEW
0.0 0.0 0.0 10.05 MB

A repository to accompany the paper entitled "A novel initialisation based on hospital-resident assignment for the k-modes algorithm"

TeX 79.45% Jupyter Notebook 15.82% Python 4.73%

kmodes-init-paper's People

Contributors

daffidwilde avatar

Watchers

 avatar  avatar  avatar

kmodes-init-paper's Issues

Rewrite abstract and opening

The abstract is not appropriate for the results of the paper anymore and needs updating.

In addition, the opening paragraphs of the text should be revised to be punchier and focus more on the success of the new method.

New title needed

The current title is long and reads weird.

"A novel game-theoretic initialisation process for the k-modes algorithm using the hospital-resident assignment problem"

To do (week beginning 19/02/18)

  • Start writing section on preference lists
    • Best, worst, random
    • Is there scope to incorporate prior knowledge this way?
    • Could there be a matching where we always have the initialisation from Huang's and Cao's methods?
  • Get some numerics down for all of these implementations
    • Separate scripts for matching initialisations?
  • Write definitions
  • Light editing of algorithms and examples

Figure out the fitted distributions

If the fitted distributions are there for their parameters, bimodal models need to be figured out in Python. Otherwise, if they are there to accentuate the shape of the histogram, is a KDE sufficient?

Screenshot 2020-02-03 at 17 57 31

Archive the data and source code

  • DOI for top percentile data, source code and notebooks
  • Write README for source code/data archive (#44)
    • Details of directory tree
    • Instructions for installing conda environment

To do

  • Genetic algorithm for suitable dataset generation; identifying datasets primed for a particular algorithm against its rival
  • Is there a computational gain from using this process on large (complex) datasets? Not only on real data but large artificial datasets
  • Objective graphs
    • Plot of the function itself
    • Plots over time
  • Start exploring preference lists; this needs to be done in tandem with everything else
  • Continue with the data analysis
    • A presentation would be good (for Kendal)

Make mention to software.

  • Mention the matching library when the HR algorithm is introduced (citation)
  • Create a release for my fork of kmodes and update the environment.yml
  • Mention the release in the results section (DOI)

To do (week beginning: 12/02/18)

  • Fine tuning preference lists to hopefully get better results
  • Is there scope to introduce 'expert' knowledge? Or is this just an analogue of weighting the data?
  • What were the characteristics of the data on good runs?
  • Change the way results are gathered. We don't want to classify/predict anything - we want to find intrinsic properties
  • Does starting at a smarter point make the final cost 'better'? Then we can talk about its quality as a predictive model
  • The bug in the capacitated algorithm needs to be investigated as it has performed better for predicting

Diagrammatic example

It would be great to have a diagrammatic example to go alongside this paper (or the respective chapters in the thesis) so dump it/any discussion about it here.

Comments 05/02/18

  • Table (chart) showing notation
  • Add m, N to example 1
  • Blocking up algorithms? Makes them easier to compare and read themselves. An alternative could be to give a higher level version in the main text and the full version as an appendix.
  • Side-by-side steps for Huang/Cao examples
  • So what? Is there something this can do that the other intialisation processes can't?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.