Light

neurodata / checklists Goto Github PK

View Code? Open in Web Editor NEW

12.0 6.0 8.0 2.98 MB

this repo contains a number of checklists that we find very useful

Home Page: http://docs.neurodata.io/checklists

License: Apache License 2.0

HTML 86.16% Jupyter Notebook 13.80% MATLAB 0.04%

checklists's Introduction

checklists

Surgeons have a simple checklist that they fill out for every surgery. Evidence suggests that these checklists, while tedious, are extremely effective.

We thought data scientists should have checklists too, so we started making some. please add more checklists, or improve the existing ones!

checklists's People

Contributors

Stargazers

Watchers

Forkers

mrae jbrowne6 tsunwong123 zheng-da 02agarwalt mlee156 bradparks shravantata

checklists's Issues

algorithms.md feedback

pull the pseudocode from MGC, you made your own format, but better to use standard format if possible?
p-value: you don't specific what your test statistic is.
"Summary plot" a histogram of what??
"5. Quantitative evaluation code per trial" doesn't implement the function you said, which permutes the labels based on their means?
i don't understand your quantitative analysis over all trials. are you computing a p-value per trial? how are you getting a single number out of that? or are you not?
you skipped a bunch of sections: real data, synthetic data, etc....??

more later.

┆Issue is synchronized with this Asana task

generate classify.md

┆Issue is synchronized with this Asana task

think about replacing randomly selected points/dimensions

with dimensions selected using CUR or http://epubs.siam.org/doi/pdf/10.1137/12086755X

and point selected using leverage scores or something like that

┆Issue is synchronized with this Asana task

update notebook

in input 1, why is it "A \sim \element \sim \Real^{n \times d}"? i would think it would be "X \in \Real^{n \times d}"
does k-means not output the means?
is "<" equivalent to "comment" in some programming language? which? i expected a "%" or "#". oh, because latex uses a triangle? maybe use that same symbol then?
step 3 missing paren
step 5 missing curly
step 5Bc: "endfor" is on wrong line, and there is a 3 in front of it.
endfunction belongs on its own line

good example following code.md

pipelines for each exploitation task

description

MEDA
GEDA
TEDA

representation

prototyping (points or dimensions)
embedding (points or dimensions)
clustering (points or dimensions) = assigning points/dimensions to nearest representative

prediction

various hypothesis testing scenarios
classification
regression

manipulation

optimal decision
control

generate poster.md

┆Issue is synchronized with this Asana task

generate grant.md

┆Issue is synchronized with this Asana task

update python notebook following code.md

┆Issue is synchronized with this Asana task

link is broken

http://www.who.int/patientsafety/safesurgery/checklist_saves_lives/en/ in the readme

Was gonna just fix it myself, but a brief search through the site didn't make it clear which article this was intended to point to

generate checklists for each modality

EEG
M3RI
CLARITY
AT

┆Issue is synchronized with this Asana task

time-series EDA

┆Issue is synchronized with this Asana task

Good and bad examples

Every single issue should link to a good and a bad example.

┆Issue is synchronized with this Asana task

generate slides.md

┆Issue is synchronized with this Asana task

add background into draft.md

┆Issue is synchronized with this Asana task

provide feedback on algorithms.md

if you think it is better, make an improved notebook?

┆Issue is synchronized with this Asana task

make short algorithms.md?

┆Issue is synchronized with this Asana task

The only part I disliked was repeating the data simulation generation and plotting 10 times in steps 10 and 11 in the Simulation Analysis. The simulation data is often high dimensional and multiple plots are needed to show correctness, which results in the need for (in my case) 40+ high dimensional plots which take up a lot of space on Github (30+ megabytes), eat up a lot of RAM making the notebooks difficult to work in, and force people to scroll a lot in the notebook. I think that plotting a histogram of the 'metric' defined in Simulation Analysis step 7, and verbally reporting whether the qualitative plots on repeat trials appeared similar to the original could achieve the same goal.

Overall I think the blocked structure is better than the linear structure in the previous algorithms.md. The only deviation I found myself making was creating a new preceding section where I described at a high level the assumptions, guarantees, and basic theoretical properties of the algorithm which helped collect thoughts about what distributions should work well and which would fail. I don't know how others would feel about adding work, but I found that this step saved me time because I was more confident in my choices in distributions and didn't have to backtrack and change them.

make 1 list for matrix.md

What fraction of features of each kind (binary, integer, non-negative, character, string etc.)
What is the distribution of NaNs per row? Per column? Infs per row? Per column?
if d<100, heat map of raw data, if d>>100, heatmap of randomly selected 100 dimensions?

if n<1000, d<100       
    then Heat map of raw data
 if n>>1000, d<100   
    then .....

"Violin" plot

Jittered scatter plot with opacity overlaid on Violin plots of each dimension

Outlier plot
Correlation matrix of features
Cumulative variance (with elbows)
Pairs plots for top ~8 dimensions
mclust++ for k=1,...10 for all 10 models, plot BIC curves
color points in pairs plot by best cluster estimates

┆Issue is synchronized with this Asana task

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.