Coder Social home page Coder Social logo

gokumohandas / mlops-course Goto Github PK

View Code? Open in Web Editor NEW
2.7K 54.0 473.0 7.33 MB

Learn how to design, develop, deploy and iterate on production-grade ML applications.

Home Page: https://madewithml.com

License: MIT License

Jupyter Notebook 97.89% Makefile 0.01% Python 2.02% Shell 0.08%
machine-learning deep-learning pytorch mlops data-engineering data-quality data-science distributed-ml llms natural-language-processing

mlops-course's Introduction

Hi, I'm Goku Mohandas

I create platforms that enable people to solve problems.

🔥 We're among the top MLOps repositories on GitHub

Connect with me via   Twitter or   LinkedIn
  to Made With ML for monthly updates on new content!


mlops-course's People

Contributors

gokumohandas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mlops-course's Issues

pip install error from guide

Hi Goku, when I use your method of creating a new virtual environment and run:

python -m pip install -e ".[dev]" --no-cache-dir

I get an error of "ModuleNotFoundError: No module named 'Cython'".

  • Tried again with pipenv and still fails.
  • The only way I can install the requirements are using poetry, and deleting all dependency specific versions (as followed here). But even using poetry, the numpy and snorkel packages cannot be installed.
  • After using poetry and generate the requirements.txt again, then pip install -r requirements.txt now works (Without snorkel)

More info: Poetry dependency resolution in pyproject.toml:

[tool.poetry.dependencies]
python = "^3.8"
apache-airflow = "^2.1.0"
airflow-provider-great-expectations = "^0.0.6"
dvc = "^2.3.0"
fastapi = "^0.65.2"
feast = "^0.10.7"
matplotlib = "^3.4.2"
mlflow = "^1.17.0"
nltk = "^3.6.2"
numpyencoder = "^0.3.0"
optuna = "^2.8.0"
pandas = "^1.2.4"
pretty-errors = "^1.2.21"
rich = "^10.3.0"
scikit-multilearn = "^0.2.0"
seaborn = "^0.11.1"
sklearn = "^0.0"
streamlit = "^0.82.0"
torch = "^1.9.0"
typer = "^0.3.2"
uvicorn = {extras = ["standard"], version = "^0.14.0"}
watchdog = "^2.1.2"
wordcloud = "^1.8.1"
great-expectations = "^0.13.19"
pytest = "^6.2.4"
pytest-cov = "^2.12.1"
black = "^21.6b0"
flake8 = "^3.9.2"
isort = "^5.8.0"
jupyterlab = "^3.0.16"
pre-commit = "^2.13.0"
mkdocs-macros-plugin = "^0.5.5"
mkdocs-material = "^7.1.8"
mkdocstrings = "^0.15.2"

[tool.poetry.dev-dependencies]

got an unexpected keyword argument 'handler' from mkdocs serve

This is my error when use python3 -m mkdocs serve
Plese help me

INFO     -  Building documentation...
INFO     -  Cleaning site directory
ERROR    -  Error reading page 'tagifai/data.md': __init__() got an unexpected keyword argument
            'handler'
.
.
.
TypeError: __init__() got an unexpected keyword argument 'handler'

Chapter annotation - url of project.json needs to be changed

makefile possible error

makefile typically uses bin/sh that does not support source in contrast to bin/bash
You could set SHELL := /bin/bash in the makefile to make sure that bin/bash is used always

Getting error while loading projects.json file

Load projects

url = "https://raw.githubusercontent.com/GokuMohandas/Made-With-ML/main/datasets/projects.json"
projects = json.loads(urlopen(url).read())
print (f"{len(projects)} projects")
print (json.dumps(projects[0], indent=2))


HTTPError Traceback (most recent call last)
Input In [15], in <cell line: 3>()
1 # Load projects
2 url = "https://raw.githubusercontent.com/GokuMohandas/Made-With-ML/main/datasets/projects.json"
----> 3 projects = json.loads(urlopen(url).read())
4 print (f"{len(projects)} projects")
5 print (json.dumps(projects[0], indent=2))

File ~/.conda/envs/Ml_Flow/lib/python3.8/urllib/request.py:222, in urlopen(url, data, timeout, cafile, capath, cadefault, context)
220 else:
221 opener = _opener
--> 222 return opener.open(url, data, timeout)

File ~/.conda/envs/Ml_Flow/lib/python3.8/urllib/request.py:531, in OpenerDirector.open(self, fullurl, data, timeout)
529 for processor in self.process_response.get(protocol, []):
530 meth = getattr(processor, meth_name)
--> 531 response = meth(req, response)
533 return response

File ~/.conda/envs/Ml_Flow/lib/python3.8/urllib/request.py:640, in HTTPErrorProcessor.http_response(self, request, response)
637 # According to RFC 2616, "2xx" code indicates that the client's
638 # request was successfully received, understood, and accepted.
639 if not (200 <= code < 300):
--> 640 response = self.parent.error(
641 'http', request, response, code, msg, hdrs)
643 return response

File ~/.conda/envs/Ml_Flow/lib/python3.8/urllib/request.py:569, in OpenerDirector.error(self, proto, *args)
567 if http_err:
568 args = (dict, 'default', 'http_error_default') + orig_args
--> 569 return self._call_chain(*args)

File ~/.conda/envs/Ml_Flow/lib/python3.8/urllib/request.py:502, in OpenerDirector._call_chain(self, chain, kind, meth_name, *args)
500 for handler in handlers:
501 func = getattr(handler, meth_name)
--> 502 result = func(*args)
503 if result is not None:
504 return result

File ~/.conda/envs/Ml_Flow/lib/python3.8/urllib/request.py:649, in HTTPDefaultErrorHandler.http_error_default(self, req, fp, code, msg, hdrs)
648 def http_error_default(self, req, fp, code, msg, hdrs):
--> 649 raise HTTPError(req.full_url, code, msg, hdrs, fp)

HTTPError: HTTP Error 404: Not Found

mlflow error when deploying the app

"Run '%s' not found" % run_uuid, databricks_pb2.RESOURCE_DOES_NOT_EXIST
mlflow.exceptions.MlflowException: Run '0f2eebfa38674a7b9e38ad443ea8cd64' not found

image

venv using python3 -m venv venv issue

venv using "python3 -m venv venv" creates python version based on python version of host system. e.g.
if I have conda env py36 which uses python 3.6 and running "Makefile venv" will install venv of python 3.6 but if I have conda env py37 which uses python 3.7 and running "Makefile venv" will install venv of python 3.7. Here python_requires in setup.py isn't having an impact on venv version.

test_data.py fixtures are executed multiple times

In MLOps/tests/tagifai/test_data.py:

The two fixtures:

@pytest.fixture
def tags():
...
@pytest.fixture
def df():
...
are executed multiple times (for every test that uses df or tags).
In your context this is not a big deal as there is no real resource consumption, however, you can set the scope to module to make sure your fixtures are called only once.

@pytest.fixture(scope = 'module')
def tags():

@pytest.fixture(scope = 'module')
def df():

Package version conflicts

Hi Goku,

The project dependencies cannot be installed as is due to those all too common python package dependency problems.

Edit: using python 3.10.0

I had the exact same problem with the previous version of your course (great work btw.!)

Installing:

The conflict is caused by:
    tagifai 0.1 depends on numpy==1.19.5
    imbalanced-learn 0.8.1 depends on numpy>=1.13.3
    mlflow 1.13.1 depends on numpy
    numpyencoder 0.3.0 depends on numpy>=1.14.3
    optuna 2.10.0 depends on numpy
    pandas 1.3.5 depends on numpy>=1.21.0; python_version >= "3.10"

I tried just removing the version requirement on numpy, and now I can see that it's snorkel that is blocking:

ERROR: Cannot install tagifai and tagifai==0.1 because these package versions have conflicting dependencies.

The conflict is caused by:
    tagifai 0.1 depends on numpy
    imbalanced-learn 0.8.1 depends on numpy>=1.13.3
    mlflow 1.13.1 depends on numpy
    numpyencoder 0.3.0 depends on numpy>=1.14.3
    optuna 2.10.0 depends on numpy
    pandas 1.3.5 depends on numpy>=1.21.0; python_version >= "3.10"
    scikit-learn 0.24.2 depends on numpy>=1.13.3
    snorkel 0.9.8 depends on numpy<1.20.0 and >=1.16.5

Commenting out snorkel from requirements works, but of course we need that.

After a lot of version juggling I've found that this combo works:

diff --git a/requirements.txt b/requirements.txt
index 29c7af6..b96617f 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -3,10 +3,10 @@ fastapi==0.78.0
 imbalanced-learn==0.8.1
 mlflow==1.13.1
 nltk==3.7
-numpy==1.19.5
+numpy==1.19.3
 numpyencoder==0.3.0
 optuna==2.10.0
-pandas==1.3.5
+pandas==1.3.3
 rich==12.4.4
 scikit-learn==0.24.2
 snorkel==0.9.8

Or at least it would if my mac M1 was able to build numpy from a wheel :(

error: Command "clang ....  lots of stuff with the wrong architecture 
ERROR: Could not build wheels for numpy

numpy/numpy#17807

Install numpy is no problem (I use it all the time), but installing this version of pandas results in trying and failing to build the numpy wheel, and I haven't solved that yet.

MLFlow Dashboard from Colab not showing

By Running the MLFlow cell from Colab to see the dashboard...

get_ipython().system_raw("mlflow server -h 0.0.0.0 -p 8000 --backend-store-uri $PWD/experiments/ &")
!npx localtunnel --port 8000

...I get an URL like this:

https://sixty-kiwis-chew.loca.lt/

When I click, I'm redirected to a page that asks me for an Endpoint IP.

image

I got the IP by running the following code

!wget -q -O - ipv4.icanhazip.com

or this

!curl ipv4.icanhazip.com

However, it does not work. How can I run the code in Colab to see the Dashboard of MLFlow?

Error while working in mkdocs

Hi, I would like to ask for some help in the document creation code

python -m mkdocs serve

while running this in the specified folder, I am getting this error.

INFO - Building documentation...
WARNING - Config value: 'selection'. Warning: Unrecognised configuration name: selection
INFO - Cleaning site directory
D:\personal\Study\MadeWithML\mlops\mwml\lib\site-packages\mkdocstrings\handlers\python_init_.py:13: UserWarning: The 'python-legacy' extra of mkdocstrings will become mandatory in the next release. We have no way to detect if you already specify it, so if you do, please ignore this warning. You can globally disable it with the PYTHONWARNINGS environment variable: PYTHONWARNINGS=ignore::UserWarning:mkdocstrings.handlers.python
warnings.warn(
ERROR - mkdocstrings: module 'tagifai' has no attribute 'data'
Traceback (most recent call last):
File "D:\personal\Study\MadeWithML\mlops\mwml\lib\site-packages\pytkdocs\cli.py", line 205, in main
output = json.dumps(process_json(line))
File "D:\personal\Study\MadeWithML\mlops\mwml\lib\site-packages\pytkdocs\cli.py", line 114, in process_json
return process_config(json.loads(json_input))
File "D:\personal\Study\MadeWithML\mlops\mwml\lib\site-packages\pytkdocs\cli.py", line 91, in process_config
obj = loader.get_object_documentation(path, members)
File "D:\personal\Study\MadeWithML\mlops\mwml\lib\site-packages\pytkdocs\loader.py", line 355, in get_object_documentation
leaf = get_object_tree(dotted_path, self.new_path_syntax)
File "D:\personal\Study\MadeWithML\mlops\mwml\lib\site-packages\pytkdocs\loader.py", line 281, in get_object_tree
obj = getattr(current_node.obj, obj_name)
AttributeError: module 'tagifai' has no attribute 'data'
ERROR - Error reading page 'tagifai\data.md':
ERROR - Could not collect 'tagifai.data'

Aborted with a BuildError!

Please provide your assistance.

Thanks

Conflicting directory for config.py

The file config.py is in the app directory, which is consisted with the "Directory structure" described in the readme.
At the same time, the first line of config.py indicates that the file should be in the tagifai directory (# tagifai/config.py). Also, looking at the first code snippet in the Testing->Data section, there's from tagifai import config. Finally, the snippet in Scripting->Organization->Organizing->Operations has from config import config.

Unable to install dependencies on Mac M1

System Details

python-version: 3.9.16
pip: 23.0
MAC OS Monterey 12.2.1

After running the command python3 -m pip install -e ".[dev]", I get the following error (Some of the error trace has been trimmed since it is too long)

            error: Command "clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX12.sdk -DNPY_INTERNAL_BUILD=1 -DHAVE_NPY_CONFIG_H=1 -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE=1 -D_LARGEFILE64_SOURCE=1 -DNO_ATLAS_INFO=3 -DHAVE_CBLAS -Ibuild/src.macosx-12-arm64-3.9/numpy/core/src/umath -Ibuild/src.macosx-12-arm64-3.9/numpy/core/src/npymath -Ibuild/src.macosx-12-arm64-3.9/numpy/core/src/common -Inumpy/core/include -Ibuild/src.macosx-12-arm64-3.9/numpy/core/include/numpy -Inumpy/core/src/common -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/src/npysort -I/Users/ganesh/projects/MLops/mlops-course/venv/include -I/opt/homebrew/Cellar/[email protected]/3.9.16/Frameworks/Python.framework/Versions/3.9/include/python3.9 -Ibuild/src.macosx-12-arm64-3.9/numpy/core/src/common -Ibuild/src.macosx-12-arm64-3.9/numpy/core/src/npymath -c numpy/core/src/multiarray/buffer.c -o build/temp.macosx-12-arm64-3.9/numpy/core/src/multiarray/buffer.o -MMD -MF build/temp.macosx-12-arm64-3.9/numpy/core/src/multiarray/buffer.o.d -faltivec -I/System/Library/Frameworks/vecLib.framework/Headers" failed with exit status 1
            [end of output]

        note: This error originates from a subprocess, and is likely not a problem with pip.
        ERROR: Failed building wheel for numpy
      Failed to build numpy
      ERROR: Could not build wheels for numpy, which is required to install pyproject.toml-based projects
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

python 3.8.16 also generates the same error.

With python 3.10.9, I get the following error

ERROR: Cannot install tagifai and tagifai[dev]==0.1 because these package versions have conflicting dependencies.

The conflict is caused by:
    tagifai[dev] 0.1 depends on numpy==1.19.5
    great-expectations 0.15.15 depends on numpy>=1.14.1
    imbalanced-learn 0.8.1 depends on numpy>=1.13.3
    mlflow 1.23.1 depends on numpy
    numpyencoder 0.3.0 depends on numpy>=1.14.3
    optuna 2.10.0 depends on numpy
    pandas 1.3.5 depends on numpy>=1.21.0; python_version >= "3.10"

I saw that this issue refers to a similar problem. However, python3.7 seems to be deprecated for my system.

Any help would be appreciated, thank you!

error with arguments in sns.barplot()

Error with arguments in sns.barplot() in EDA section.

https://github.com/GokuMohandas/mlops-course/blob/main/notebooks/tagifai.ipynb


TypeError Traceback (most recent call last)
in <cell line: 4>()
2 tags, tag_counts = zip(*Counter(df.tag.values).most_common())
3 plt.figure(figsize=(10, 3))
----> 4 ax = sns.barplot(list(tags), list(tag_counts))
5 plt.title("Tag distribution", fontsize=20)
6 plt.xlabel("Tag", fontsize=16)

TypeError: barplot() takes from 0 to 1 positional arguments but 2 were given

the seaborn version in the colab notebook is 0.12.2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.