Coder Social home page Coder Social logo

dat_homework's People

Contributors

ashishterp avatar

Watchers

 avatar  avatar

dat_homework's Issues

HW1 Feedback

Status: Pass (Homework is graded on "Pass" or "Needs Improvement")
Comments:
Great work Ashish! Your methods are efficient and easy to follow, with great supporting visuals which are clearly labeled and effective. I would encourage you to add some written analysis to each question communicate your conclusions to your reader, even when the numbers are "obvious", it can go a long way. Well done!

HW2 Feedback

Status: Pass (Homework is graded on "Pass" or "Needs Improvement")
Comments:
Great work Ashish, comments on each section below!

Describe the content of the dataset and its goals
Good! The data dictionary is clear with effective visuals and observations. Would be good to also state a brief "goal" such as "using the data to predict diabetes", so it will frame the objective of the rest of your analysis.
Describe the features and formulate a hypothesis on which might be relevant in predicting diabetes
Nice, good idea to group by class and visualize the change in distributions.
Describe the missing/NULL values. Decide if you should impute or drop them and justify your choice.
Great, your process is well organized and easy to understand. Some people also tried replacing 0 values for Insulin. It's a bit unclear from the data whether 0 insulin is impossible. One idea would be to try with and without adjusted insulin.
Come up with a benchmark for the minimum performance that an algorithm should have on this dataset
35% would be a good benchmark for making sure you detect all cases of diabetes (which IS important here, so I don't disagree). However if you make a dummy classifier, which predicts NOT DIABETIC for everyone, you'd score 65%! As a rule of thumb, the "dumb" benchmark for classification would be the % of your largest group ("not diabetic" in this case).
What's the best performance you can get with kNN?
Is kNN a good choice for this dataset?
Great use of classification_report! Depending on the dataset, some scores may be more relevant than others. Many people found better performance with higher n than 3 - In general it's good to try gridsearch for each model on a new dataset.
What's the best performance you can get with Naive Bayes? Is NB a good choice for this dataset?
Good use of gridsearch here! However you should try Gaussian NB instead of Multinomial NB here, since you are dealing with numerical inputs rather than categorical inputs. You might see some improvement!
What's the best performance you can get with Logistic Regression? Is LR a good choice for this dataset?
You can try gridsearch here with both L1 and L2, as well as a range of C values, some people saw better outcomes with L1. Another good thing to try with LR is to check the coefficients, as they can indicate feature importance, and can help you down the line!
What's the best performance you can get with Random Forest? Is RF a good choice for this dataset?
Excellent, great to loop through the param values, as well as good use of RF to check feature importance.
If you could only choose one, which classifier from the above that you already ran is best? How do you define best? (hint: could be prediction accuracy, running time, interpretability, etc)
Clear reasoning for you selection weighing performance and time, nice work!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.