When running jobs in parallel via the n_jobs parameter, does it matter if jobs have va

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

This is clsoed with the merge of <a class="issue-link js-issue-link" data-error-text="

Question: how do parallel jobs work in PyExperimenter? about py_experimenter HOT 4 CLOSED

tomviering commented on August 16, 2024

Question: how do parallel jobs work in PyExperimenter?

from py_experimenter.

Comments (4)

tornede commented on August 16, 2024

Hey @tomviering,

when n_jobs is given in the experiment configuration files, as many parallel jobs are created and executed individually. Each job is given the experiment function and pulls a separate experiment from the data base. Therefore it is not necessary that the jobs have equal runtime, you just have to make sure not to exceed the hardware resources of your machine ;)

The writing process to the database is handled within the experiment function, which you can define however you need. Usually it is beneficial to have multiple writing points to prevent local collection of results which might all get lost in case an error occurs.

After a job has finished an experiment, that job is terminating. Afterwards a new job is started, pulling a new experiment from the database, in case max_experiments > n_jobs.

If you want to check the code, have a look at the execute method: https://github.com/tornede/py_experimenter/blob/develop/py_experimenter/experimenter.py#L313

from py_experimenter.

tomviering commented on August 16, 2024

Hi @tornede,

I'm currently using n_jobs = 2, however, I find that the PyExperimenter is twice running the same job. When I look at the database, I indeed only see 1 row with "running", the others are all "created". On the other hand, I did verify that 2 CPU's are working (they are both hitting >98% utilization)... Any clue what could be wrong? I am on PyExperimenter version 1.2.

Many thanks,
Tom

from py_experimenter.

LukasFehring commented on August 16, 2024

Hey @tomviering,
I just tried to reproduce this problem and noticed two things:

On sqlite, I do not have this issue
For mysql, I was able to reproduce this. However, the branch #151 fixes this issue for me. Can you please verify this? We currently aim to merge this branch soon

import os
import random
import time

import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import cross_validate
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC

from py_experimenter.experimenter import PyExperimenter
from py_experimenter.result_processor import ResultProcessor

content = """
[PY_EXPERIMENTER]
provider = mysql 
database = py_experimenter
table = example_general_usage 
n_jobs=2

keyfields = dataset, cross_validation_splits:int, seed:int, kernel
dataset = iris
cross_validation_splits = 5
seed = 2:6:2 
kernel = linear, poly, rbf, sigmoid

resultfields = pipeline:LONGTEXT, train_f1:DECIMAL, train_accuracy:DECIMAL, test_f1:DECIMAL, test_accuracy:DECIMAL
resultfields.timestamps = false

[CUSTOM] 
path = sample_data

[codecarbon]
offline_mode = False
measure_power_secs = 25
tracking_mode = process
log_level = error
save_to_file = True
output_dir = output/CodeCarbon
"""
# Create config directory if it does not exist
if not os.path.exists('config'):
    os.mkdir('config')

# Create config file
experiment_configuration_file_path = os.path.join('config', 'example_general_usage.cfg')
with open(experiment_configuration_file_path, "w") as f:
    f.write(content)


def run_ml(parameters: dict, result_processor: ResultProcessor, custom_config: dict):
    seed = parameters['seed']
    random.seed(seed)
    np.random.seed(seed)

    data = load_iris()
    # In case you want to load a file from a path
    # path = os.path.join(custom_config['path'], parameters['dataset'])
    # data = pd.read_csv(path)

    X = data.data
    y = data.target

    model = make_pipeline(StandardScaler(), SVC(kernel=parameters['kernel'], gamma='auto'))
    result_processor.process_results({
        'pipeline': str(model)
    })

    if parameters['dataset'] != 'iris':
        raise ValueError("Example error")

    scores = cross_validate(model, X, y,
                            cv=parameters['cross_validation_splits'],
                            scoring=('accuracy', 'f1_micro'),
                            return_train_score=True
                            )

    result_processor.process_results({
        'train_f1': np.mean(scores['train_f1_micro']),
        'train_accuracy': np.mean(scores['train_accuracy'])
    })

    result_processor.process_results({
        'test_f1': np.mean(scores['test_f1_micro']),
        'test_accuracy': np.mean(scores['test_accuracy'])
    })

    time.sleep(15)


experimenter = PyExperimenter(experiment_configuration_file_path=experiment_configuration_file_path, name='example_notebook')

experimenter.fill_table_from_config()

experimenter.execute(run_ml, max_experiments=2)

from py_experimenter.

LukasFehring commented on August 16, 2024

This is clsoed with the merge of #164

from py_experimenter.

Question: how do parallel jobs work in PyExperimenter? about py_experimenter HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent