Coder Social home page Coder Social logo

advanced-comp-2017's People

Contributors

betatim avatar maximeschubiger avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

advanced-comp-2017's Issues

Definition of correlation of trees in a random forest

Hello,

I am going down the rabbit hole of the definition of correlation of decision trees in a random forest.

For those who don't have time to read this wall of text, here's a quick summary.

tl; dr: what is the exact definition of correlation between trees in a random forest?

And how do we interpret this value?

long explanation

At first, I naively thought one could define it as

definition 1: correlation = correlation in the predictions of all the trees in a forest

However, I was having some doubts about my intuition, and Shaina Race's comment on this Quora question confirmed my doubts.
Essentially, the way I understand her comment is: this definition is not in line with intuition, because why would I want correlation in the predictions to be low? Actually, If most of the trees are getting the right answers most of the time, correlation would be high but the model itself would be pretty good!
Moreover, this would not give any indication about the robustness nor the generalisation power of the ensemble. She seems to suggest another definition:

definition 2: correlation = correlation in the errors

This definition seems nicer, because it seems to be intuitively closer to a notion of robustness of the total ensemble.

Unfortunately, I was down the rabbit hole and could not stop sliding. I started thinking whether the exercise asked for correlation, but instead meant variance. What confused me is that on many sources (e.g. sklearn user guide) random forests are cited as a method to reduce variance, and not correlation. Now variance in this case has a pretty precise meaning

variance = variance in the predictions

However, I was a bit lost because I wasn't sure how to extend this notion from a regression problem to a classification problem (especially a multi-class classification problem). I found this paper by P. Domingos about the bias-variance-noise decomposition in a general setting, it seemed quite math-heavy but ultimately proved to be decently readable. However, I have questions about it, in particular how the constants in front of the bias and variance terms ($c_1$ and $c_2$ in the paper) affect our interpretation of their values.

Exercise from 24 April

The issue for the exercise from last week. Sorry for forgetting to create it. Feel free to remind me or just create one if it doesn't exist.

Final projects

Deadline: 29 May 2017 at 11:00am

If you are working on a final project and need credit for the course please post here with your name and what topic you are working on.

To hand in the project also post in this thread with a link to your work on github. It should contain the code to run the analysis as well as a short written report. The report should be in the style of a journal article that reports on your research.

Looking forward to seeing the results.

If you are doing a project but don't need credit, you can also post here but please make a little note saying "I don't need credit".

Lecture notes

It would be nice to have a scanned copy of the notes used during the lectures of May 1st and 8th (even a simple photo would work). Is that possible? Thank you.

Question on course: "Useless" variables, decision tree vs neural networks

Hello,

I had a follow-up question to today's discussion, although it may be covered in the next lecture

Today we saw that for a decision tree / random forest, it is best to not have "useless" variables, i.e. variables that offer little or no discriminating power. Therefore if we want to implement such an algorithm, we have to study beforehand the input variables and remove the useless ones. Right?

What about deep neural networks? At the end of the course, you mentioned how deep learning works best with raw data instead of high level features. Can we conclude that it "ignores" useless variables? Or would a large number of useless variables skew the training and result on e.g. overtraining?

No module named 'keras'

Hi,
Does keras come together with the installation of Anaconda or you're supposed to install it separately ?

Exercises from 10 April

Post questions and if you need credit a link to your notebook in your repository on GitHub in this thread.

Exercise from 3 April

Post questions and if you need credit a link to your notebook in your repository on GitHub in this thread.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.