Coder Social home page Coder Social logo

Comments (4)

tornede avatar tornede commented on August 16, 2024

Hey @tomviering,

when n_jobs is given in the experiment configuration files, as many parallel jobs are created and executed individually. Each job is given the experiment function and pulls a separate experiment from the data base. Therefore it is not necessary that the jobs have equal runtime, you just have to make sure not to exceed the hardware resources of your machine ;)

The writing process to the database is handled within the experiment function, which you can define however you need. Usually it is beneficial to have multiple writing points to prevent local collection of results which might all get lost in case an error occurs.

After a job has finished an experiment, that job is terminating. Afterwards a new job is started, pulling a new experiment from the database, in case max_experiments > n_jobs.

If you want to check the code, have a look at the execute method: https://github.com/tornede/py_experimenter/blob/develop/py_experimenter/experimenter.py#L313

from py_experimenter.

tomviering avatar tomviering commented on August 16, 2024

Hi @tornede,

I'm currently using n_jobs = 2, however, I find that the PyExperimenter is twice running the same job. When I look at the database, I indeed only see 1 row with "running", the others are all "created". On the other hand, I did verify that 2 CPU's are working (they are both hitting >98% utilization)... Any clue what could be wrong? I am on PyExperimenter version 1.2.

Many thanks,
Tom

from py_experimenter.

LukasFehring avatar LukasFehring commented on August 16, 2024

Hey @tomviering,
I just tried to reproduce this problem and noticed two things:

  1. On sqlite, I do not have this issue
  2. For mysql, I was able to reproduce this. However, the branch #151 fixes this issue for me. Can you please verify this? We currently aim to merge this branch soon
import os
import random
import time

import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import cross_validate
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC

from py_experimenter.experimenter import PyExperimenter
from py_experimenter.result_processor import ResultProcessor

content = """
[PY_EXPERIMENTER]
provider = mysql 
database = py_experimenter
table = example_general_usage 
n_jobs=2

keyfields = dataset, cross_validation_splits:int, seed:int, kernel
dataset = iris
cross_validation_splits = 5
seed = 2:6:2 
kernel = linear, poly, rbf, sigmoid

resultfields = pipeline:LONGTEXT, train_f1:DECIMAL, train_accuracy:DECIMAL, test_f1:DECIMAL, test_accuracy:DECIMAL
resultfields.timestamps = false

[CUSTOM] 
path = sample_data

[codecarbon]
offline_mode = False
measure_power_secs = 25
tracking_mode = process
log_level = error
save_to_file = True
output_dir = output/CodeCarbon
"""
# Create config directory if it does not exist
if not os.path.exists('config'):
    os.mkdir('config')

# Create config file
experiment_configuration_file_path = os.path.join('config', 'example_general_usage.cfg')
with open(experiment_configuration_file_path, "w") as f:
    f.write(content)


def run_ml(parameters: dict, result_processor: ResultProcessor, custom_config: dict):
    seed = parameters['seed']
    random.seed(seed)
    np.random.seed(seed)

    data = load_iris()
    # In case you want to load a file from a path
    # path = os.path.join(custom_config['path'], parameters['dataset'])
    # data = pd.read_csv(path)

    X = data.data
    y = data.target

    model = make_pipeline(StandardScaler(), SVC(kernel=parameters['kernel'], gamma='auto'))
    result_processor.process_results({
        'pipeline': str(model)
    })

    if parameters['dataset'] != 'iris':
        raise ValueError("Example error")

    scores = cross_validate(model, X, y,
                            cv=parameters['cross_validation_splits'],
                            scoring=('accuracy', 'f1_micro'),
                            return_train_score=True
                            )

    result_processor.process_results({
        'train_f1': np.mean(scores['train_f1_micro']),
        'train_accuracy': np.mean(scores['train_accuracy'])
    })

    result_processor.process_results({
        'test_f1': np.mean(scores['test_f1_micro']),
        'test_accuracy': np.mean(scores['test_accuracy'])
    })

    time.sleep(15)


experimenter = PyExperimenter(experiment_configuration_file_path=experiment_configuration_file_path, name='example_notebook')

experimenter.fill_table_from_config()

experimenter.execute(run_ml, max_experiments=2)

from py_experimenter.

LukasFehring avatar LukasFehring commented on August 16, 2024

This is clsoed with the merge of #164

from py_experimenter.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.