Comments (6)
This is not inherently a mistake, as 1 10 foldCV is maybe not enough and with 10 repetitions we get a much more exact result. The general issue is rather, that for algorithms with many parameters (like xgboost) we should do much more runs than for kknn. Where our surrogate models are already quite exact.
from omlbots.
But isn't the train/test-split not always the same for one openML-Task?
from omlbots.
That seems to be true, I did not know that. 👎
from omlbots.
Well it' actually 👍 for reproduciblity but 👎 for your purposes 😉
from omlbots.
We are currently not using knn anymore because of this.
I feel like this should be done by the OpenML Website by rejecting identical uploads.
If we would want to control it with the bot, we would need to download the rather large database on every machine we run the bot on.
from omlbots.
The general issue is rather, that for algorithms with many parameters (like xgboost) we should do much more runs than for kknn. Where our surrogate models are already quite exact.
My 2cents:
-
Why dont you REALLY upweight model with many param in sampling? I mean the bot was specifically constructed in a way that that can be done?
-
For stuff like knn @jakob-r is kinda right. It is just very annoying to check whether the epxeriment already exist.
Problems:
a) OML will not do that for you, I am pretty sure. If you want that, I think you have to do that yourself, for now.
b) 2nd problem: That makes the random sampling less nice. But cant you easily check from the DB whether the exact experiment already exists? Or even from the OML server?
from omlbots.
Related Issues (20)
- Why is there a number behind flows? HOT 1
- Make model with ranks instead of measure
- Data conversion for xgboost HOT 9
- Do not hardcode cluster functions HOT 1
- Literature HOT 12
- Error runs HOT 5
- Conversion of Hyperparameters HOT 11
- do not use print sprintf HOT 1
- please clean up which functions go into which file
- document all functions at least briefly
- Use OpenML Snapshot database HOT 2
- Find a way to upload errors
- runTime not downloadable for some runs HOT 1
- Run defaults implementieren, falls defaults noch nicht vorhanden
- Rpart & svm fails? HOT 3
- please link from the OML user account to this repo HOT 2
- Regression datasets
- min.node.size in ranger HOT 1
- n does not equal n in resampling HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from omlbots.