wildtreetech / advanced-comp-2017 Goto Github PK
View Code? Open in Web Editor NEW๐ป Material for a course on applied machine-learning for scientists. Taught at EPFL in spring 2017
๐ป Material for a course on applied machine-learning for scientists. Taught at EPFL in spring 2017
https://github.com/ageron/handson-ml is the GitHub repository for Hands-on Machine Learning with Scikit-Learn and TensorFlow
https://github.com/amueller/introduction_to_ml_with_python GitHub repository for Introduction to Machine Learning with Python
Elements of statistical learning
Introduction to statistical learning
If you find a book you like, lecture notes or other reading material feel free to post it here.
Does anyone happen to come upon any good reference on time series analysis using machine learning techniques?
Hello,
I am going down the rabbit hole of the definition of correlation of decision trees in a random forest.
For those who don't have time to read this wall of text, here's a quick summary.
And how do we interpret this value?
At first, I naively thought one could define it as
definition 1: correlation = correlation in the predictions of all the trees in a forest
However, I was having some doubts about my intuition, and Shaina Race's comment on this Quora question confirmed my doubts.
Essentially, the way I understand her comment is: this definition is not in line with intuition, because why would I want correlation in the predictions to be low? Actually, If most of the trees are getting the right answers most of the time, correlation would be high but the model itself would be pretty good!
Moreover, this would not give any indication about the robustness nor the generalisation power of the ensemble. She seems to suggest another definition:
definition 2: correlation = correlation in the errors
This definition seems nicer, because it seems to be intuitively closer to a notion of robustness of the total ensemble.
Unfortunately, I was down the rabbit hole and could not stop sliding. I started thinking whether the exercise asked for correlation, but instead meant variance. What confused me is that on many sources (e.g. sklearn user guide) random forests are cited as a method to reduce variance, and not correlation. Now variance in this case has a pretty precise meaning
variance = variance in the predictions
However, I was a bit lost because I wasn't sure how to extend this notion from a regression problem to a classification problem (especially a multi-class classification problem). I found this paper by P. Domingos about the bias-variance-noise decomposition in a general setting, it seemed quite math-heavy but ultimately proved to be decently readable. However, I have questions about it, in particular how the constants in front of the bias and variance terms (
The issue for the exercise from last week. Sorry for forgetting to create it. Feel free to remind me or just create one if it doesn't exist.
Deadline: 29 May 2017 at 11:00am
If you are working on a final project and need credit for the course please post here with your name and what topic you are working on.
To hand in the project also post in this thread with a link to your work on github. It should contain the code to run the analysis as well as a short written report. The report should be in the style of a journal article that reports on your research.
Looking forward to seeing the results.
If you are doing a project but don't need credit, you can also post here but please make a little note saying "I don't need credit".
It would be nice to have a scanned copy of the notes used during the lectures of May 1st and 8th (even a simple photo would work). Is that possible? Thank you.
Hello,
I had a follow-up question to today's discussion, although it may be covered in the next lecture
Today we saw that for a decision tree / random forest, it is best to not have "useless" variables, i.e. variables that offer little or no discriminating power. Therefore if we want to implement such an algorithm, we have to study beforehand the input variables and remove the useless ones. Right?
What about deep neural networks? At the end of the course, you mentioned how deep learning works best with raw data instead of high level features. Can we conclude that it "ignores" useless variables? Or would a large number of useless variables skew the training and result on e.g. overtraining?
http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004967 via @sharkovsky
This thread is to collect links and discussions related to the ideas in the paper.
First thoughts: contextual bandits, reinforcement learning (in particular A3C), and the various "learning to learn" approaches.
Hi,
Does keras come together with the installation of Anaconda or you're supposed to install it separately ?
Post questions and if you need credit a link to your notebook in your repository on GitHub in this thread.
http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/
Neural networks transform input variables using the non-linearities in each layer until at the last layer the problem becomes linearly separable.
After the bad job of explaining the splitting criteria motivation in the lecture I prepared a short notebook that illustrates what happens in the case of accuracy and why criteria like the entropy or gini are preferable.
https://github.com/wildtreetech/advanced-comp-2017/blob/master/02-trees/splitting-criteria.ipynb
There is also a graphical explanation here: https://sebastianraschka.com/faq/docs/decisiontree-error-vs-entropy.html
Post questions and if you need credit a link to your notebook in your repository on GitHub in this thread.
Francesco Cremonesi
https://github.com/sharkovsky/advanced-comp-2017/tree/exercise-four
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.