Comments (7)
I was able to reproduce this, but I'm still not sure how this is directly related to this operator and not to the GX project as a whole. There's nothing I can tell about the operator in particular that uses protobufs.
Tagging @kaxil because he may know more about what is the underlying issue here.
from airflow-provider-great-expectations.
Hey @diman82 , I don't think this is an issue with the GX Operator. In Airflow, current main constrains have 3.20.0 for protobufs, and the Google provider also requires this version. So it is likely you are seeing this as a symptom of an underlying problem with your protobuf version and Airflow.
You can always try installing protobufs 4.x.x in a virtual environment with these other Google packages and running those within the venv.
from airflow-provider-great-expectations.
@denimalpaca I'm facing this problem only when installing airflow-provider-great-expectations package with combination of google-ads package.
When installing google-ads package only (without airflow-provider-great-expectations package), there are no errors (protobuf is upgraded to >=4.21.5 and there are no problems with that), but when I add airflow-provider-great-expectations package and fixing to protobuf==3.20.1 version, I get the error below:
#9 47.08 ERROR: Cannot install -r requirements.txt (line 25) and protobuf==3.20.1 because these package versions have conflicting dependencies. #9 47.08 #9 47.08 The conflict is caused by: #9 47.08 The user requested protobuf==3.20.1 #9 47.08 google-ads 19.0.0 depends on protobuf>=4.21.5 #9 47.08 #9 47.08 To fix this you could try to: #9 47.08 1. loosen the range of package versions you've specified #9 47.08 2. remove package versions to allow pip attempt to solve the dependency conflict #9 47.08 #9 47.08 ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts
To sum-up - it's not a problem with airflow/google package combinations, because I haven't experienced any problems in my setup before, but only when I add the airflow-provider-great-expectations package, I get the errors described above.
BTW, I've verified, that installing google-ads package (latest version) on latest airflow bumps up protobuf version to 4.21.12
from airflow-provider-great-expectations.
I'm trying to reproduce this with a requirements.txt file that looks like:
google-ads==19.0.0
protobuf==4.21.12
airflow-provider-great-expectations==0.2.4
and using the astro runtime image 7.2.0, running Airflow 2.5.1. I did not experience any errors when I started Airflow, nor when I manually installed these packages into a virtual environment.
Could you please share your Airflow version and the complete requirements.txt file you're using?
from airflow-provider-great-expectations.
Sure.
requirements.txt:
apache-airflow-providers-amazon==7.1.0 apache-airflow-providers-github==2.2.0 airflow-provider-great-expectations==0.2.4 apache-airflow-providers-http==4.1.1 apache-airflow-providers-imap==3.1.1 apache-airflow-providers-mongo==3.1.1 apache-airflow-providers-mysql==4.0.0 apache-airflow-providers-pagerduty==3.1.0 apache-airflow-providers-papermill==3.1.1 apache-airflow-providers-postgres==5.4.0 apache-airflow-providers-redis==3.1.0 apache-airflow-providers-sendgrid==3.1.0 apache-airflow-providers-sftp==4.2.1 apache-airflow-providers-slack==7.2.0 apache-airflow-providers-sqlite==3.3.1 apache-airflow-providers-ssh==3.4.0 apache-airflow-providers-vertica==3.3.1 authlib==1.2.0 datetime==5.0 Fraction==2.2.0 great-expectations==0.15.47 ipython==8.9.0 ipykernel==6.21.1 google-ads==19.0.0 google-auth==2.16.0 google-auth-httplib2==0.1.0 google-auth-oauthlib==0.8.0 google-cloud-storage==2.7.0 matplotlib==3.6.3 openpyxl==3.1.0 pandas==1.5.3 pyarrow==11.0.0 s3fs==2023.1.0 scipy==1.10.0 smart-open==6.3.0 statistics==1.0.3.5 statsd==4.0.1 toolz==0.12.0 icecream==2.1.3
Please try to run add the following sample dag, you should see a dag import/load error (as described in the initial message of the thread):
import os
from datetime import datetime
from pathlib import Path
from airflow import DAG
from airflow.models.baseoperator import chain
from great_expectations_provider.operators.great_expectations import \
GreatExpectationsOperator
from utils.helpers import sql_templates_dir
table = "country_codes"
base_path = Path(__file__).parents[2]
ge_root_dir = os.path.join(base_path, "include", "great_expectations")
with DAG(
"great_expectations.athena",
start_date=datetime(2023, 2, 2),
description="Example DAG showcasing loading and data quality checking with Athena and Great Expectations.",
doc_md=__doc__,
schedule_interval=None,
template_searchpath=os.path.join(sql_templates_dir, 'great_expectations'),
catchup=False,
) as dag:
"""
#### Great Expectations suite
Run the Great Expectations suite on the table.
"""
ge_athena_validation = GreatExpectationsOperator(
task_id="ge_athena_validation",
data_context_root_dir=ge_root_dir,
conn_id="aws_default",
expectation_suite_name="dbr.country_codes",
data_asset_name=table,
fail_task_on_validation_failure=False,
)
chain(
ge_athena_validation,
)
from airflow-provider-great-expectations.
I ran pip show protobuf
to see the dependencies, this is the output (does not contain Great Expectations):
astro@9f92f792dc78:/usr/local/airflow$ pip show protobuf
Name: protobuf
Version: 4.21.12
Summary:
Home-page: https://developers.google.com/protocol-buffers/
Author: [email protected]
Author-email: [email protected]
License: 3-Clause BSD License
Location: /usr/local/lib/python3.9/site-packages
Requires:
Required-by: apache-airflow-providers-google, google-ads, google-api-core, google-cloud-aiplatform, google-cloud-appengine-logging, google-cloud-audit-log, google-cloud-automl, google-cloud-bigquery, google-cloud-bigquery-datatransfer, google-cloud-bigquery-storage, google-cloud-bigtable, google-cloud-build, google-cloud-container, google-cloud-datacatalog, google-cloud-dataform, google-cloud-dataplex, google-cloud-dataproc, google-cloud-dataproc-metastore, google-cloud-dlp, google-cloud-kms, google-cloud-language, google-cloud-logging, google-cloud-memcache, google-cloud-monitoring, google-cloud-orchestration-airflow, google-cloud-os-login, google-cloud-pubsub, google-cloud-redis, google-cloud-resource-manager, google-cloud-secret-manager, google-cloud-spanner, google-cloud-speech, google-cloud-tasks, google-cloud-texttospeech, google-cloud-translate, google-cloud-videointelligence, google-cloud-vision, google-cloud-workflows, googleapis-common-protos, grpcio-status, proto-plus
I'm not sure why this would affect GX or the provider, except that there is an incompatibility with this Airflow version and you're seeing a cryptic error here. I definitely think this should be raised in an Airflow issue.
from airflow-provider-great-expectations.
OK, found the cause of this issue - I've been using standalone packages (e.g. 'google-ads'), instead of using apache-airflow-providers-google package.
After I sawitched to apache-airflow-providers-google package, problem was gone.
from airflow-provider-great-expectations.
Related Issues (20)
- Snowflake Region should be optional
- Add Trino support HOT 1
- GreatExpectationsOperator is overriding database name with schema HOT 2
- Feature Request: pass parameters from Airflow to GE Checkpoint
- Feature Request: run EXPERIMENTAL expectation (from great_expectations_experimental library) from Airflow? HOT 1
- Parsing data_asset_name should account for fully qualified table name.
- How to pass create_temp_table: False to SqlAlchemy? HOT 3
- Test failure due to `apache-airflow-providers-snowflake` 4.3.1
- Passing value_set through Evaluation Parameters crashes if list size is more than 100
- Cannot run validation on BigQuery HOT 2
- Connection to Athena via `conn_id`
- Case-sensitive code for ODBC Driver extras
- Feature request: Allow passage of credentials for Stores via the GXO
- Parallel GreatExpectationsOperator tasks corrupt great_expectations.yml HOT 3
- Parallel Execution of GX in Airflow randomly fails. In serial execution always passes
- Build data_context object in `__init__()` and not in `execute` method
- Can't use `checkpoint_kwargs` with `conn_id`
- add support for Athena (aws_default) airflow connection HOT 7
- GreatExpectationsOperator ignoring schema from connection HOT 1
- Add schema to template fields
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from airflow-provider-great-expectations.