Comments (4)
Hey @tomviering,
when n_jobs
is given in the experiment configuration files, as many parallel jobs are created and executed individually. Each job is given the experiment function and pulls a separate experiment from the data base. Therefore it is not necessary that the jobs have equal runtime, you just have to make sure not to exceed the hardware resources of your machine ;)
The writing process to the database is handled within the experiment function, which you can define however you need. Usually it is beneficial to have multiple writing points to prevent local collection of results which might all get lost in case an error occurs.
After a job has finished an experiment, that job is terminating. Afterwards a new job is started, pulling a new experiment from the database, in case max_experiments > n_jobs
.
If you want to check the code, have a look at the execute
method: https://github.com/tornede/py_experimenter/blob/develop/py_experimenter/experimenter.py#L313
from py_experimenter.
Hi @tornede,
I'm currently using n_jobs = 2
, however, I find that the PyExperimenter is twice running the same job. When I look at the database, I indeed only see 1 row with "running", the others are all "created". On the other hand, I did verify that 2 CPU's are working (they are both hitting >98% utilization)... Any clue what could be wrong? I am on PyExperimenter version 1.2.
Many thanks,
Tom
from py_experimenter.
Hey @tomviering,
I just tried to reproduce this problem and noticed two things:
- On sqlite, I do not have this issue
- For mysql, I was able to reproduce this. However, the branch #151 fixes this issue for me. Can you please verify this? We currently aim to merge this branch soon
import os
import random
import time
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import cross_validate
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from py_experimenter.experimenter import PyExperimenter
from py_experimenter.result_processor import ResultProcessor
content = """
[PY_EXPERIMENTER]
provider = mysql
database = py_experimenter
table = example_general_usage
n_jobs=2
keyfields = dataset, cross_validation_splits:int, seed:int, kernel
dataset = iris
cross_validation_splits = 5
seed = 2:6:2
kernel = linear, poly, rbf, sigmoid
resultfields = pipeline:LONGTEXT, train_f1:DECIMAL, train_accuracy:DECIMAL, test_f1:DECIMAL, test_accuracy:DECIMAL
resultfields.timestamps = false
[CUSTOM]
path = sample_data
[codecarbon]
offline_mode = False
measure_power_secs = 25
tracking_mode = process
log_level = error
save_to_file = True
output_dir = output/CodeCarbon
"""
# Create config directory if it does not exist
if not os.path.exists('config'):
os.mkdir('config')
# Create config file
experiment_configuration_file_path = os.path.join('config', 'example_general_usage.cfg')
with open(experiment_configuration_file_path, "w") as f:
f.write(content)
def run_ml(parameters: dict, result_processor: ResultProcessor, custom_config: dict):
seed = parameters['seed']
random.seed(seed)
np.random.seed(seed)
data = load_iris()
# In case you want to load a file from a path
# path = os.path.join(custom_config['path'], parameters['dataset'])
# data = pd.read_csv(path)
X = data.data
y = data.target
model = make_pipeline(StandardScaler(), SVC(kernel=parameters['kernel'], gamma='auto'))
result_processor.process_results({
'pipeline': str(model)
})
if parameters['dataset'] != 'iris':
raise ValueError("Example error")
scores = cross_validate(model, X, y,
cv=parameters['cross_validation_splits'],
scoring=('accuracy', 'f1_micro'),
return_train_score=True
)
result_processor.process_results({
'train_f1': np.mean(scores['train_f1_micro']),
'train_accuracy': np.mean(scores['train_accuracy'])
})
result_processor.process_results({
'test_f1': np.mean(scores['test_f1_micro']),
'test_accuracy': np.mean(scores['test_accuracy'])
})
time.sleep(15)
experimenter = PyExperimenter(experiment_configuration_file_path=experiment_configuration_file_path, name='example_notebook')
experimenter.fill_table_from_config()
experimenter.execute(run_ml, max_experiments=2)
from py_experimenter.
This is clsoed with the merge of #164
from py_experimenter.
Related Issues (20)
- Weights and Biases for Logtables
- [Usability] More explicit than key error HOT 2
- [Feature] `n_jobs` as paramter to `execute()` HOT 2
- Update Logger Documentation
- Delete .codecarbon.config after execution
- Booelan Handling in SQLITE
- Problem when executing fill_table twice
- n_jobs not working as expected
- Documentation for distributed execution HOT 3
- Ssh Minor Problems
- Improve logging when initializing database
- Documentation usage example is slightly wrong HOT 1
- Incorrect Docstring/ behaviour for use_ssh_tunnel HOT 3
- Feature Requests: Add new experiment, and start immidiately
- Feature Request: Connect to Running Experiment
- Feature Request: Logging Syntax
- SQLITE Test execution
- Adapt Database Tables
- Add LogLogTables
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from py_experimenter.