Coder Social home page Coder Social logo

ml_pocket_reference's Introduction

Welcome

Here you will find the source code for the book Machine Learning Pocket Reference

Code Examples

Every chapter has a notebook with the code from that notebook.

Thanks!

Thanks to readers for their support. If you enjoyed the book, please consider leaving a review on Amazon, or sharing it on social media.

Comments?

If you have comments or issues with the book, please consider filing an issue. The digital version may recieve updates. Big updates could be addressed in future versions of the book.

Thanks again! Matt Harrison

ml_pocket_reference's People

Contributors

mattharrison avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ml_pocket_reference's Issues

ch10.ipynb Cell #6

ch10.ipynb Cell #6
----> 1 from yellowbrick.features.importances import (
2 FeatureImportances,
3 )
4 fig, ax = plt.subplots(figsize=(6, 4))
5 fi_viz = FeatureImportances(lr)

ModuleNotFoundError: No module named 'yellowbrick.features.importances'

About Cell 28 of ch07.ipynb

HI,Harrison
Page 85 of your great book: you write "For example, to convert the Titanic survival column to a blend of posterior probability of the target and the prior probability given the title (categorical) information, use the following code:"
but in Cell 28,you convert the Title column in the line te = ce.TargetEncoder(cols="Title"). 1) Do you mean to convert the Title column? 2) In this sentence, "the target" means survival? 3) "prior probability" means the probability of each kind of title's survival in the training data? Thanks.

chap3 broken link for titanic3.xls

http://biostat.mc.vanderbilt.edu seems to be down, thus the link http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic3.xls does not work.

It might be good to take a version of the dataset hosted on a public library like Zenodo (or other).

I am wondering if Kaggle has a license over the dataset that prevents from making it available publicly, or if they have a license that prevents such practices.

I used: https://github.com/joanby/python-ml-course/raw/master/datasets/titanic/titanic3.xls instead. It seems that it is the same dataset.

ch12.ipynb Cell #16

ch12.ipynb Cell #16
----> 1 import scikitplot
2 fig, ax = plt.subplots(figsize=(6, 6))
3 y_probas = dt.predict_proba(X_test)
4 scikitplot.metrics.plot_cumulative_gain(
5 y_test, y_probas, ax=ax

ModuleNotFoundError: No module named 'scikitplot'

ch13.ipynb Cell #10

ch13.ipynb Cell #10
3 rf5, X, X.columns, features
4 )
----> 5 fig, _ = pdp.pdp_interact_plot(p, features)

TypeError: clabel() got an unexpected keyword argument 'contour_label_fontsize'

C:\ProgramData\Anaconda3\envs\tf_SSJ_gpu\lib\site-packages\pdpbox\pdp.py in pdp_interact_plot(pdp_interact_out, feature_names, plot_type, x_quantile, plot_pdp, which_classes, figsize, ncols, plot_params)
773 fig.add_subplot(inter_ax)
774 _pdp_inter_one(pdp_interact_out=pdp_interact_plot_data[0], inter_ax=inter_ax, norm=None,
--> 775 feature_names=feature_names_adj, **inter_params)
776 else:
777 wspace = 0.3

C:\ProgramData\Anaconda3\envs\tf_SSJ_gpu\lib\site-packages\pdpbox\pdp_plot_utils.py in _pdp_inter_one(pdp_interact_out, feature_names, plot_type, inter_ax, x_quantile, plot_params, norm, ticks)
330 # for numeric not quantile
331 X, Y = np.meshgrid(pdp_interact_out.feature_grids[0], pdp_interact_out.feature_grids[1])
--> 332 im = _pdp_contour_plot(X=X, Y=Y, **inter_params)
333 elif plot_type == 'grid':
334 im = _pdp_inter_grid(**inter_params)

C:\ProgramData\Anaconda3\envs\tf_SSJ_gpu\lib\site-packages\pdpbox\pdp_plot_utils.py in _pdp_contour_plot(X, Y, pdp_mx, inter_ax, cmap, norm, inter_fill_alpha, fontsize, plot_params)
249 c1 = inter_ax.contourf(X, Y, pdp_mx, N=level, origin='lower', cmap=cmap, norm=norm, alpha=inter_fill_alpha)
250 c2 = inter_ax.contour(c1, levels=c1.levels, colors=contour_color, origin='lower')
--> 251 inter_ax.clabel(c2, contour_label_fontsize=fontsize, inline=1)
252 inter_ax.set_aspect('auto')
253

C:\ProgramData\Anaconda3\envs\tf_SSJ_gpu\lib\site-packages\matplotlib\axes_axes.py in clabel(self, CS, *args, **kwargs)
6338
6339 def clabel(self, CS, *args, **kwargs):
-> 6340 return CS.clabel(*args, **kwargs)
6341 clabel.doc = mcontour.ContourSet.clabel.doc
6342

ch08.ipynb Cell #4

ch08.ipynb Cell #4
TypeError: _generate_unsampled_indices() missing 1 required positional argument: 'n_samples_bootstrap'

  1 import rfpimp
  2 rfpimp.plot_dependence_heatmap(

----> 3 rfpimp.feature_dependence_matrix(X_train),
4 value_fontsize=12,
5 label_fontsize=14,

C:\ProgramData\Anaconda3\envs\tf_SSJ_gpu\lib\site-packages\rfpimp.py in feature_dependence_matrix(X_train, rfmodel, zero, sort_by_dependence, n_samples)
712 rf = clone(rfmodel)
713 rf.fit(X,y)
--> 714 imp = permutation_importances_raw(rf, X, y, oob_regression_r2_score, n_samples)
715 """
716 Some importances could come back > 1.0 because removing that feature sends R^2

C:\ProgramData\Anaconda3\envs\tf_SSJ_gpu\lib\site-packages\rfpimp.py in permutation_importances_raw(rf, X_train, y_train, metric, n_samples)
398 rf.fit(X_sample, y_sample)
399
--> 400 baseline = metric(rf, X_sample, y_sample)
401 X_train = X_sample.copy(deep=False) # shallow copy
402 y_train = y_sample

C:\ProgramData\Anaconda3\envs\tf_SSJ_gpu\lib\site-packages\rfpimp.py in oob_regression_r2_score(rf, X_train, y_train)
453 n_predictions = np.zeros(n_samples)
454 for tree in rf.estimators_:
--> 455 unsampled_indices = _generate_unsampled_indices(tree.random_state, n_samples)
456 tree_preds = tree.predict(X[unsampled_indices, :])
457 predictions[unsampled_indices] += tree_preds

ch14.ipynb Cell #19

ch14.ipynb Cell #19

KeyError: 'weight'
2 fi_viz = FeatureImportances(xgr)
3 fi_viz.fit(bos_X_train, bos_y_train)
----> 4 fi_viz.poof()
5 #fig.savefig("images/mlpr_1406.png", dpi=300)

C:\ProgramData\Anaconda3\envs\tf_SSJ_gpu\lib\site-packages\yellowbrick\base.py in poof(self, *args, **kwargs)
259 "this method is deprecated, please use show() instead", DeprecationWarning
260 )
--> 261 return self.show(*args, **kwargs)
262
263 ## ////////////////////////////////////////////////////////////////////

C:\ProgramData\Anaconda3\envs\tf_SSJ_gpu\lib\site-packages\yellowbrick\base.py in show(self, outpath, clear_figure, **kwargs)
239
240 # Finalize the figure
--> 241 self.finalize()
242
243 if outpath is not None:

C:\ProgramData\Anaconda3\envs\tf_SSJ_gpu\lib\site-packages\yellowbrick\model_selection\importances.py in finalize(self, **kwargs)
283
284 # Set the xlabel
--> 285 self.ax.set_xlabel(self._get_xlabel())
286
287 # Remove the ygrid

C:\ProgramData\Anaconda3\envs\tf_SSJ_gpu\lib\site-packages\yellowbrick\model_selection\importances.py in get_xlabel(self)
332
333 # Label for coefficients
--> 334 if hasattr(self.estimator, "coef
"):
335 if self.relative:
336 return "relative coefficient magnitude"

C:\ProgramData\Anaconda3\envs\tf_SSJ_gpu\lib\site-packages\xgboost\sklearn.py in coef_(self)
714 .format(self.booster))
715 b = self.get_booster()
--> 716 coef = np.array(json.loads(b.get_dump(dump_format='json')[0])['weight'])
717 # Logic for multiclass classification
718 n_classes = getattr(self, 'n_classes_', None)

troubles in install packages

can anyone provide a requirements.txt or a Pipefile ?

its very dificult run some notebooks without any problem of lib compatibilities.

ch07.ipynb Cell #10

ch07.ipynb Cell #10
ModuleNotFoundError: No module named 'fastai.structured'
1 X3 = X2.copy()
----> 2 from fastai.structured import scale_vars
3 scale_vars(X3, mapper=None)
4 X3.std()
5 X3.mean()

ch03 error with pandas_profiling

run on google colab

import pandas_profiling
pandas_profiling.ProfileReport(df)


TypeError Traceback (most recent call last)

in ()
1 import pandas_profiling
----> 2 pandas_profiling.ProfileReport(df)

1 frames

/usr/local/lib/python3.7/dist-packages/pandas_profiling/describe.py in describe(df, bins, check_correlation, correlation_threshold, correlation_overrides, check_recoded, pool_size, **kwargs)
390 if name not in names:
391 names.append(name)
--> 392 variable_stats = pd.concat(ldesc, join_axes=pd.Index([names]), axis=1)
393 variable_stats.columns.names = df.columns.names

data leakage issue

In the notebook for chapter no. 14 the cell 12 has been scaling the variable using the standard scaler and they way it uses the whole feature set then there is a possibility of the data leakage after the splitting

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.