- Set up github
- new environment
- setup.py
- requirements.txt
- src folder and build the package
Create empty repo in github
Create folder and open in vs code. In that folder, create virtual environment.
conda create -p venv python==3.8 -y
conda activate venv
From terminal in local:
Init git repo
git init
Create and add readme file
git add README.md
git commit -m "First commit"
Connect local repo with github. when you create an empty repo in github, it shows you this code to connect. Make sure before push that you have git.config updated with email.
git branch -M main
git remote add origin https://github.com/josrodand/mlproject.git
git push -u origin main
You can do this from github. select create new file an write .gitignore. Select python as language and file will be filled automatically.
Make a commit from github and add .gitignore.
After that, you have to make a pull in local
git pull
This process can be automated. later on.
This allows to create our machine learning model as a package. You will make updates and install this package in our projects. After that you will upload your package in pypl.
Create src folder and __init__.py
file
Put all needed modules. You can put -e .
at the end to help setup.py to install al requirements.
After that, install requirements
pip install -r requirements.txt
Having -e .
in requirements file will do that instalation connects with setup.py and will create a package metadata folder: mlproject.egg-info
We create components folder with files:
- init file
- data_ingestion.py
- data_transformation.py
- model_trainer.py
We cretate pipeline folder with files:
- init file
- train_pipeline.py
- predict_pipeline.py
- Create logger.py, utils.py and exception.py in src folder
We have created a custom exception handler that takes errors and shows file, line and type of error
We create logging code that allows the code to make log files in a directory
We can test logging code with python logger.py. It will create logs directory, with a folder with the date and file.
first download data from github and put un folder notebook/data
Create notebooks: eda and model training.
- Metric code, model selection, etc in utils
- Training code in model_trainer
We will see
Creamos clase DataIngestionconfig con los paths. Usamos dataclass para crear clases solo con atributos. Para meter metodos mejor hacerlo normal
Aqui lo que hemos hecho es generar una clase que incluya toda la ingesta. De momento es basicamente leer desde un csv y dividir en train y test.
Esto genera un directorio artifacts con los ficheros en csv de raw, train y test. con la clase generamos automaticamente los directorios a partir del path
En gitignore conviene añadir la carpeta artifacts para que no la suba a github. este tio la ha puesto pero se ha subido hay que mirarlo