Coder Social home page Coder Social logo

Comments (7)

denimalpaca avatar denimalpaca commented on July 24, 2024 1

I was able to reproduce this, but I'm still not sure how this is directly related to this operator and not to the GX project as a whole. There's nothing I can tell about the operator in particular that uses protobufs.

Tagging @kaxil because he may know more about what is the underlying issue here.

from airflow-provider-great-expectations.

denimalpaca avatar denimalpaca commented on July 24, 2024

Hey @diman82 , I don't think this is an issue with the GX Operator. In Airflow, current main constrains have 3.20.0 for protobufs, and the Google provider also requires this version. So it is likely you are seeing this as a symptom of an underlying problem with your protobuf version and Airflow.

You can always try installing protobufs 4.x.x in a virtual environment with these other Google packages and running those within the venv.

from airflow-provider-great-expectations.

diman82 avatar diman82 commented on July 24, 2024

@denimalpaca I'm facing this problem only when installing airflow-provider-great-expectations package with combination of google-ads package.
When installing google-ads package only (without airflow-provider-great-expectations package), there are no errors (protobuf is upgraded to >=4.21.5 and there are no problems with that), but when I add airflow-provider-great-expectations package and fixing to protobuf==3.20.1 version, I get the error below:

#9 47.08 ERROR: Cannot install -r requirements.txt (line 25) and protobuf==3.20.1 because these package versions have conflicting dependencies. #9 47.08 #9 47.08 The conflict is caused by: #9 47.08 The user requested protobuf==3.20.1 #9 47.08 google-ads 19.0.0 depends on protobuf>=4.21.5 #9 47.08 #9 47.08 To fix this you could try to: #9 47.08 1. loosen the range of package versions you've specified #9 47.08 2. remove package versions to allow pip attempt to solve the dependency conflict #9 47.08 #9 47.08 ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

To sum-up - it's not a problem with airflow/google package combinations, because I haven't experienced any problems in my setup before, but only when I add the airflow-provider-great-expectations package, I get the errors described above.
BTW, I've verified, that installing google-ads package (latest version) on latest airflow bumps up protobuf version to 4.21.12

from airflow-provider-great-expectations.

denimalpaca avatar denimalpaca commented on July 24, 2024

I'm trying to reproduce this with a requirements.txt file that looks like:

google-ads==19.0.0
protobuf==4.21.12
airflow-provider-great-expectations==0.2.4

and using the astro runtime image 7.2.0, running Airflow 2.5.1. I did not experience any errors when I started Airflow, nor when I manually installed these packages into a virtual environment.

Could you please share your Airflow version and the complete requirements.txt file you're using?

from airflow-provider-great-expectations.

diman82 avatar diman82 commented on July 24, 2024

Sure.
requirements.txt:
apache-airflow-providers-amazon==7.1.0 apache-airflow-providers-github==2.2.0 airflow-provider-great-expectations==0.2.4 apache-airflow-providers-http==4.1.1 apache-airflow-providers-imap==3.1.1 apache-airflow-providers-mongo==3.1.1 apache-airflow-providers-mysql==4.0.0 apache-airflow-providers-pagerduty==3.1.0 apache-airflow-providers-papermill==3.1.1 apache-airflow-providers-postgres==5.4.0 apache-airflow-providers-redis==3.1.0 apache-airflow-providers-sendgrid==3.1.0 apache-airflow-providers-sftp==4.2.1 apache-airflow-providers-slack==7.2.0 apache-airflow-providers-sqlite==3.3.1 apache-airflow-providers-ssh==3.4.0 apache-airflow-providers-vertica==3.3.1 authlib==1.2.0 datetime==5.0 Fraction==2.2.0 great-expectations==0.15.47 ipython==8.9.0 ipykernel==6.21.1 google-ads==19.0.0 google-auth==2.16.0 google-auth-httplib2==0.1.0 google-auth-oauthlib==0.8.0 google-cloud-storage==2.7.0 matplotlib==3.6.3 openpyxl==3.1.0 pandas==1.5.3 pyarrow==11.0.0 s3fs==2023.1.0 scipy==1.10.0 smart-open==6.3.0 statistics==1.0.3.5 statsd==4.0.1 toolz==0.12.0 icecream==2.1.3

Please try to run add the following sample dag, you should see a dag import/load error (as described in the initial message of the thread):

import os
from datetime import datetime
from pathlib import Path

from airflow import DAG
from airflow.models.baseoperator import chain
from great_expectations_provider.operators.great_expectations import \
    GreatExpectationsOperator

from utils.helpers import sql_templates_dir

table = "country_codes"
base_path = Path(__file__).parents[2]
ge_root_dir = os.path.join(base_path, "include", "great_expectations")

with DAG(
    "great_expectations.athena",
    start_date=datetime(2023, 2, 2),
    description="Example DAG showcasing loading and data quality checking with Athena and Great Expectations.",
    doc_md=__doc__,
    schedule_interval=None,
    template_searchpath=os.path.join(sql_templates_dir, 'great_expectations'),
    catchup=False,
) as dag:
    """
    #### Great Expectations suite
    Run the Great Expectations suite on the table.
    """
    ge_athena_validation = GreatExpectationsOperator(
        task_id="ge_athena_validation",
        data_context_root_dir=ge_root_dir,
        conn_id="aws_default",
        expectation_suite_name="dbr.country_codes",
        data_asset_name=table,
        fail_task_on_validation_failure=False,
    )

    chain(
        ge_athena_validation,
    )

from airflow-provider-great-expectations.

denimalpaca avatar denimalpaca commented on July 24, 2024

I ran pip show protobuf to see the dependencies, this is the output (does not contain Great Expectations):

astro@9f92f792dc78:/usr/local/airflow$ pip show protobuf
Name: protobuf
Version: 4.21.12
Summary: 
Home-page: https://developers.google.com/protocol-buffers/
Author: [email protected]
Author-email: [email protected]
License: 3-Clause BSD License
Location: /usr/local/lib/python3.9/site-packages
Requires: 
Required-by: apache-airflow-providers-google, google-ads, google-api-core, google-cloud-aiplatform, google-cloud-appengine-logging, google-cloud-audit-log, google-cloud-automl, google-cloud-bigquery, google-cloud-bigquery-datatransfer, google-cloud-bigquery-storage, google-cloud-bigtable, google-cloud-build, google-cloud-container, google-cloud-datacatalog, google-cloud-dataform, google-cloud-dataplex, google-cloud-dataproc, google-cloud-dataproc-metastore, google-cloud-dlp, google-cloud-kms, google-cloud-language, google-cloud-logging, google-cloud-memcache, google-cloud-monitoring, google-cloud-orchestration-airflow, google-cloud-os-login, google-cloud-pubsub, google-cloud-redis, google-cloud-resource-manager, google-cloud-secret-manager, google-cloud-spanner, google-cloud-speech, google-cloud-tasks, google-cloud-texttospeech, google-cloud-translate, google-cloud-videointelligence, google-cloud-vision, google-cloud-workflows, googleapis-common-protos, grpcio-status, proto-plus

I'm not sure why this would affect GX or the provider, except that there is an incompatibility with this Airflow version and you're seeing a cryptic error here. I definitely think this should be raised in an Airflow issue.

from airflow-provider-great-expectations.

diman82 avatar diman82 commented on July 24, 2024

OK, found the cause of this issue - I've been using standalone packages (e.g. 'google-ads'), instead of using apache-airflow-providers-google package.
After I sawitched to apache-airflow-providers-google package, problem was gone.

from airflow-provider-great-expectations.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.