eharmony / spotz Goto Github PK
View Code? Open in Web Editor NEWSpark Parameter Optimization and Tuning
Spark Parameter Optimization and Tuning
Hi,
We're interested in Spotz, but we have a need for a param to depend on previously-sampled ones (i.e. if mode is ON, vary between 0 to 5, otherwise 5 to 10).
The change would probably need to happen
where instead of a map, we'd fold the params list by passing the previously-calculated params along with the (current argument) rng.This would be a breaking change as it is (since all samplers need their apply functions to receive an extra parameter), but maybe there's a way to make it smoother?
We'd provide a PR if that's acceptable.
I'm interested in your suggestions, if there are others&simpler ways. Thanks!
There is overlap in common code that can be reused
Implement a VW objective that accepts a training set and test set on which to train and evalutate a VW model.
Write up details about the various supported use cases for VW
If during a spotz optimization run, the run is killed intentionally, crashes, or stops before normal completion, allow the optimizer to continue running where it left off
There's a slowdown with VW cache distribution during at the beginning of the Spark job. Refactor this logic to zip, and distribute the vw dataset to the executors before VW cache generation begins
This is primarily for VW feature interactions
Partition a dataset into k-folds and create VW train and test cache files for every fold. Distribute these cache files to the executor so that they can be used by the objective function.
Refactor functionality to allow mixin of the backend compute framework so that users can choose to use Spark or something else, ie. potentially executors.
Currently, hyper parameter values are materialized through the sample method of the Space trait. Look into possibly implementing RDDs that materialize the values instead of invoking sc.parallelize() in conjunction with the sample method.
Tune the batch size adaptively such that the user does not need to specify it. The batch size becomes important when the caller desires the optimizer to finish within some maximum duration. Too large a batch size will delay duration checks while processing occurs on the cluster. Too small a batch size will cause frequent return trips back to the driver which incur some constant time overhead.
Particle Swarm
Tree of Parzen Estimators
Nelder-Mead Simplex
CMA-ES
Sobol Sequences
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.