lesteve / scikit-learn-tutorial Goto Github PK
View Code? Open in Web Editor NEWThis repo has moved to https://github.com/INRIA/scikit-learn-mooc/
License: Creative Commons Zero v1.0 Universal
This repo has moved to https://github.com/INRIA/scikit-learn-mooc/
License: Creative Commons Zero v1.0 Universal
https://github.com/lesteve/scikit-learn-tutorial/runs/315274429. This has failed for a few days see https://github.com/lesteve/scikit-learn-tutorial/actions. There seems to be a TimeoutError, the PDF export takes 50s and does not complete. Maybe there is a timeout that can be increased with github actions ? ping @brospars.
Full error:
Run for f in *.html ; do remarkjs-pdf "$f"; done
Convert file:///home/runner/work/scikit-learn-tutorial/scikit-learn-tutorial/dist/index.html to index.pdf ...
Finished.
Convert file:///home/runner/work/scikit-learn-tutorial/scikit-learn-tutorial/dist/ml_concepts.html to ml_concepts.pdf ...
Finished.
Convert file:///home/runner/work/scikit-learn-tutorial/scikit-learn-tutorial/dist/overfit.html to overfit.pdf ...
TimeoutError: Navigation timeout of 30000 ms exceeded
at /opt/hostedtoolcache/node/12.13.1/x64/lib/node_modules/remarkjs-pdf/node_modules/puppeteer/lib/LifecycleWatcher.js:142:21
-- ASYNC --
at Frame.<anonymous> (/opt/hostedtoolcache/node/12.13.1/x64/lib/node_modules/remarkjs-pdf/node_modules/puppeteer/lib/helper.js:111:15)
at Page.goto (/opt/hostedtoolcache/node/12.13.1/x64/lib/node_modules/remarkjs-pdf/node_modules/puppeteer/lib/Page.js:675:49)
at Page.<anonymous> (/opt/hostedtoolcache/node/12.13.1/x64/lib/node_modules/remarkjs-pdf/node_modules/puppeteer/lib/helper.js:112:23)
at convertPdf (/opt/hostedtoolcache/node/12.13.1/x64/lib/node_modules/remarkjs-pdf/remarkjs-pdf.js:80:14)
at processTicksAndRejections (internal/process/task_queues.js:93:5)
at async main (/opt/hostedtoolcache/node/12.13.1/x64/lib/node_modules/remarkjs-pdf/remarkjs-pdf.js:69:5) {
name: 'TimeoutError'
}
##[error]Process completed with exit code 10.
This is not very structured, so feel free to edit, comment, open other issues for bigger chunks of work:
print(
f"The accuracy using a {model.__class__.__name__} is "
f"{model.score(data_test, target_test):.3f} with a fitting time of "
f"{elapsed_time:.3f} seconds in {model[-1].n_iter_} iterations"
)
The accuracy using a Pipeline is 0.818 with a fitting time of 0.809 seconds in [13] iterations
I have some minor suggestions:
01_tabular_data_exploration:
adult_census.profile_report()
tells us that there are a few duplicate rows. It may be worthwhile explaining how these duplicate entries may affect/not affect prediction?02_basic_preprocessing
StandardScaler
does? Maybe not everyone knows the equation?04_basic_parameters_tuning:
C
. Maybe even just give them a useful link to read on regularisation and overfitting?model = make_pipeline(
preprocessor, LogisticRegressionCV(max_iter=1000, solver='lbfgs', cv=5)
)
score = cross_val_score(model, data, target, n_jobs=4, cv=5)
you don't provide a Cs
argument like you do above and it might be worth mentioning that by default it tests a grid of 10 C
values.
You would expect the high age
high hours-per-week
to be "high-earning". This is what the 2d plot shows (in red). The plot_tree
(tree diagram) says value = [637, 572]
so in the same order as all the other leaves.
Not clear where the problem is, plot_tree
(scikit-learn) or plot_tree_decision_function
(function in the notebook).
Maybe instead we could have a CI job that check that the notebook render without errors and another CI jobs that run the rendering of the notebooks only on merge commits in master and commits and push the results into a dedicated "rendered" branch that we then use for nbviewer links. We could also render the HTML to make the rendered notebooks directly browsable as a website on https://lesteve.github.io/scikit-learn-tutorial
This way we would never have the diff with spurious output changes in pull request.
WDYT?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.