Coder Social home page Coder Social logo

neurodata / checklists Goto Github PK

View Code? Open in Web Editor NEW
12.0 6.0 8.0 2.98 MB

this repo contains a number of checklists that we find very useful

Home Page: http://docs.neurodata.io/checklists

License: Apache License 2.0

HTML 86.16% Jupyter Notebook 13.80% MATLAB 0.04%

checklists's Introduction

checklists

Surgeons have a simple checklist that they fill out for every surgery. Evidence suggests that these checklists, while tedious, are extremely effective.

We thought data scientists should have checklists too, so we started making some. please add more checklists, or improve the existing ones!

checklists's People

Contributors

gkiar avatar jovo avatar mrae avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

checklists's Issues

algorithms.md feedback

  • pull the pseudocode from MGC, you made your own format, but better to use standard format if possible?
  • p-value: you don't specific what your test statistic is.
  • "Summary plot" a histogram of what??
  • "5. Quantitative evaluation code per trial" doesn't implement the function you said, which permutes the labels based on their means?
  • i don't understand your quantitative analysis over all trials. are you computing a p-value per trial? how are you getting a single number out of that? or are you not?
  • you skipped a bunch of sections: real data, synthetic data, etc....??

more later.

┆Issue is synchronized with this Asana task

update notebook

  • in input 1, why is it "A \sim \element \sim \Real^{n \times d}"? i would think it would be "X \in \Real^{n \times d}"
  • does k-means not output the means?
  • is "<" equivalent to "comment" in some programming language? which? i expected a "%" or "#". oh, because latex uses a triangle? maybe use that same symbol then?
  • step 3 missing paren
  • step 5 missing curly
  • step 5Bc: "endfor" is on wrong line, and there is a 3 in front of it.
  • endfunction belongs on its own line

pipelines for each exploitation task

description

  • MEDA
  • GEDA
  • TEDA

representation

  • prototyping (points or dimensions)
  • embedding (points or dimensions)
  • clustering (points or dimensions) = assigning points/dimensions to nearest representative

prediction

  • various hypothesis testing scenarios
  • classification
  • regression

manipulation

  • optimal decision
  • control

new algorithms.md feedback

The only part I disliked was repeating the data simulation generation and plotting 10 times in steps 10 and 11 in the Simulation Analysis. The simulation data is often high dimensional and multiple plots are needed to show correctness, which results in the need for (in my case) 40+ high dimensional plots which take up a lot of space on Github (30+ megabytes), eat up a lot of RAM making the notebooks difficult to work in, and force people to scroll a lot in the notebook. I think that plotting a histogram of the 'metric' defined in Simulation Analysis step 7, and verbally reporting whether the qualitative plots on repeat trials appeared similar to the original could achieve the same goal.

Overall I think the blocked structure is better than the linear structure in the previous algorithms.md. The only deviation I found myself making was creating a new preceding section where I described at a high level the assumptions, guarantees, and basic theoretical properties of the algorithm which helped collect thoughts about what distributions should work well and which would fail. I don't know how others would feel about adding work, but I found that this step saved me time because I was more confident in my choices in distributions and didn't have to backtrack and change them.

make 1 list for matrix.md

  1. What fraction of features of each kind (binary, integer, non-negative, character, string etc.)
  2. What is the distribution of NaNs per row? Per column? Infs per row? Per column?
  3. if d<100, heat map of raw data, if d>>100, heatmap of randomly selected 100 dimensions?
if n<1000, d<100       
    then Heat map of raw data
 if n>>1000, d<100   
    then .....
  1. "Violin" plot
Jittered scatter plot with opacity overlaid on Violin plots of each dimension
  1. Outlier plot
  2. Correlation matrix of features
  3. Cumulative variance (with elbows)
  4. Pairs plots for top ~8 dimensions
  5. mclust++ for k=1,...10 for all 10 models, plot BIC curves
  6. color points in pairs plot by best cluster estimates

┆Issue is synchronized with this Asana task

simple graph EDA

what do you think about making a checklist that says all the stuff?

┆Issue is synchronized with this Asana task

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.