daffidwilde / kmodes-init-paper Goto Github PK

A repository to accompany the paper entitled "A novel initialisation based on hospital-resident assignment for the k-modes algorithm"

TeX 79.45% Jupyter Notebook 15.82% Python 4.73%

kmodes-init-paper's Issues

Brush up the notebooks

If making this repo public (#44) then we need to make sure the notebooks are presentable

Rewrite abstract and opening

The abstract is not appropriate for the results of the paper anymore and needs updating.

~~In addition, the opening paragraphs of the text should be revised to be punchier and focus more on the success of the new method.~~

Refer to Figure 9 (Huang's success)

New title needed

The current title is long and reads weird.

"A novel game-theoretic initialisation process for the k-modes algorithm using the hospital-resident assignment problem"

To do (week beginning 19/02/18)

Start writing section on preference lists
- Best, worst, random
- Is there scope to incorporate prior knowledge this way?
- Could there be a matching where we always have the initialisation from Huang's and Cao's methods?
Get some numerics down for all of these implementations
- Separate scripts for matching initialisations?
Write definitions
Light editing of algorithms and examples

Build on justification for HR

This change should be concise as this entire subsection is getting gutted re: #48.

Figure out the fitted distributions

If the fitted distributions are there for their parameters, bimodal models need to be figured out in Python. Otherwise, if they are there to accentuate the shape of the histogram, is a KDE sufficient?

Archive the data and source code

DOI for top percentile data, source code and notebooks
Write README for source code/data archive (#44)
- Details of directory tree
- Instructions for installing conda environment

To do

Genetic algorithm for suitable dataset generation; identifying datasets primed for a particular algorithm against its rival
Is there a computational gain from using this process on large (complex) datasets? Not only on real data but large artificial datasets
Objective graphs
- Plot of the function itself
- Plots over time
Start exploring preference lists; this needs to be done in tandem with everything else
Continue with the data analysis
- A presentation would be good (for Kendal)

Make mention to software.

Mention the matching library when the HR algorithm is introduced (citation)
Create a release for my fork of kmodes and update the environment.yml
Mention the release in the results section (DOI)

To do (week beginning: 12/02/18)

Fine tuning preference lists to hopefully get better results
Is there scope to introduce 'expert' knowledge? Or is this just an analogue of weighting the data?
What were the characteristics of the data on good runs?
Change the way results are gathered. We don't want to classify/predict anything - we want to find intrinsic properties
Does starting at a smarter point make the final cost 'better'? Then we can talk about its quality as a predictive model
The bug in the capacitated algorithm needs to be investigated as it has performed better for predicting

Diagrammatic example

It would be great to have a diagrammatic example to go alongside this paper (or the respective chapters in the thesis) so dump it/any discussion about it here.

Comments 05/02/18

Table (chart) showing notation
Add m, N to example 1
Blocking up algorithms? Makes them easier to compare and read themselves. An alternative could be to give a higher level version in the main text and the full version as an appendix.
Side-by-side steps for Huang/Cao examples
So what? Is there something this can do that the other intialisation processes can't?

daffidwilde / kmodes-init-paper Goto Github PK

kmodes-init-paper's People

Contributors

Watchers

kmodes-init-paper's Issues

Brush up the notebooks

Rewrite abstract and opening

Refer to Figure 9 (Huang's success)

New title needed

To do (week beginning 19/02/18)

Build on justification for HR

Figure out the fitted distributions

Archive the data and source code

To do

Make mention to software.

To do (week beginning: 12/02/18)

Diagrammatic example

Comments 05/02/18

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent