- Install Hatch if not installed on your system:
pip install hatch
- Clone the repo:
git clone [email protected]:rlebret/sdsc_flour_quality.git
cd sdsc_flour_quality
Start by splitting the dataset into train/test sets:
hatch run python scripts/create_train_test_file.py \
data/flour_dataset.csv \
--test-size 0.2 \
--random-state 42 \
--output-path data
Remove missing and negative values:
mkdir data/preprocessed
z_threshold=2
hatch run preprocess \
"data/train.csv" \
--z-threshold $z_threshold \
--remove-empty-rows \
--remove-negative-rows \
--output-path "data/preprocessed/flour_z_${z_threshold}.csv"
Impute missing and negative values:
z_threshold=3
hatch run preprocess \
"data/train.csv" \
--z-threshold $z_threshold \
--impute-missing-values \
--impute-negative-values \
--output-path "data/preprocessed/flour_z_${z_threshold}_impute.csv"
mkdir data/hyperparameters
Randomized hyper-parameters search:
scaling="standard"
filename="flour_z_2_impute"
model_name="SVRFlour"
input_filename="data/preprocessed/${filename}.csv"
output_filename="data/hyperparameters/${filename}_${model_name}_${scaling}.json"
hatch run hyperparameters \
"$input_filename" \
--scaling_method $scaling \
--model_name $model_name \
--output_path "$output_filename" \
--regression
mkdir checkpoints
Train the classification model:
scaling="none"
filename="flour_z_2_impute"
model_name="rf"
hatch run train "configs/${model_name}_${filename}_${scaling}.yaml"
scaling="none"
filename="flour_z_2_impute"
model_name="rf"
hatch run evaluate data/test.csv "configs/${model_name}_${filename}_${scaling}.yaml"
Run the demo with a chosen model:
scaling="standard"
filename="flour_z_3_impute"
model_name="lr"
hatch run demo:run -- --config-file=configs/${model_name}_${filename}_${scaling}_cv.yaml
flour
is distributed under the terms of the MIT license.