Coder Social home page Coder Social logo

sklearn_tutorial's Introduction

Machine Learning for Astronomical Data Analysis

Note: this content is extremely out-of-date, and I would not recommend using it

If you would like a more up-to-date machine learning tutorial that grew from this content, I'd recommend the [Python Data Science Handbook](http://github.com/jakevdp/PythonDataScienceHandbook).

sklearn_tutorial's People

Contributors

gaelvaroquaux avatar jakevdp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sklearn_tutorial's Issues

numpy.random.shuffle not work in Exercise 1

I follow the guide here. After

import os
DATA_HOME = os.path.abspath('../../../github/sklearn_tutorial/doc/data/sdss_colors')
import numpy as np
train_data = np.load(os.path.join(DATA_HOME,
                                  'sdssdr6_colors_class_train.npy'))
test_data = np.load(os.path.join(DATA_HOME,
                                 'sdssdr6_colors_class.200000.npy'))
print sum(train_data['redshift'] < 0.01)

I got 430827 row with redshift = 0. The problem is when I shuffle train_data, it fails!

nseed = 0
np.random.seed(nseed)
np.random.shuffle(train_data)
print sum(train_data['redshift'] < 0.01)

Now the result is just 288442! I change nseed to some else values, e.g. nseed = 1000000, it still doesn't work. The data changes after shuffle. Indeed, after repeat shuffle for a while, I got the data repeat itself. I mean that train_data[0] == train_data[1] == .... I believe that the shuffle function not work right. Do I miss something? I realize this after I follow the guide and scatter the train_data, and got all things are Quasars! (no Stars?).

Small bug in notebook 06_learning_curves.ipynb

In the file doc/notebooks/06_learning_curves.ipynb, I believe there is a small mistake in the code box directly under the section "Cross-validation and Testing". The data that is supposed to be used for testing is defined by:

xtest = x[Ntrain:-Ntest]
ytest = y[Ntrain:-Ntest]

Making it the same data that is used for the Cross-Validation set. I think the above code should be replaced with:

xtest = x[-Ntest:]
ytest = y[-Ntest:]

I hope this helps and thanks for putting up this great tutorial. Your book has also been really helpful

Not able to download data sets

Hello ,
I am trying to follow the scikit-learn tutorial on your website . I've setup the software as you mentioned . But i am unable to download the data sets using this command

python fetch_data.py

It says the following and stays like that until i cancel the ongoing query

downloading data from http://www.astro.washington.edu/users/vanderplas/pydata/sdssdr6_colors_class.200000.dat

What should i do to download the data sets ?

Thank you

chaithuzz2

Broken Link on Tutorial Home Page

There are two links to Scikit-learn's home page opening paragraphs:

These learning tasks are enabled by the tools available in the open-source package scikit-learn.

scikit-learn is a Python module integrating classic machine learning algorithms in the tightly-knit world of scientific Python packages (numpy, scipy, matplotlib).

These two are broken and point to http://www.scikit-learn.org/ instead of http://scikit-learn.org/stable/

Small thing but thought I'd mention it

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.