Comments (38)
@eladkal please exclude openlineage provider from this wave, and if possible, let's go for rc2.
#40353 is causing the scheduler crash, I prepared a revert commit in #40402 that should be included in rc2. I also described the bug and provided some logs in #40403, but i guess for now just reverting this will allow us to move forward with the release.
from airflow.
Thank you everyone. Providers are released.
openlineage provider is excluded from this wave
I invite everyone to help improve providers for the next release, a list of open issues can be found here.
from airflow.
Tested #40287, working fine. Thanks
from airflow.
tested #40290, working fine 👍🏻
from airflow.
tested #39955, working fine, thanks !
from airflow.
from airflow.
Confirmed #40206 works as expected
from airflow.
I have tested #39991 and it works as expected.
[2024-06-23, 04:47:48 UTC] {beam.py:119} WARNING - {"message":"org.apache.beam.runners.dataflow.DataflowRunner - Dataflow SDK version: 2.56.0","severity":"INFO"}
[2024-06-23, 04:47:49 UTC] {beam.py:119} WARNING - {"message":"org.apache.beam.runners.dataflow.DataflowRunner - To access the Dataflow monitoring console, please navigate to https://console.cloud.google.com/dataflow/jobs/<redacted>","severity":"INFO"}
[2024-06-23, 04:47:49 UTC] {beam.py:119} WARNING - {"message":"org.apache.beam.runners.dataflow.DataflowRunner - Submitted job: <redacted>","severity":"INFO"}
[2024-06-23, 04:47:49 UTC] {beam.py:119} WARNING - {"message":"org.apache.beam.runners.dataflow.DataflowRunner - To cancel the job using the \u0027gcloud\u0027 tool, run:\n\u003e gcloud dataflow jobs --project\u003d<redacted> cancel --region\u003<redacted> <redacted>","severity":"INFO"}
[2024-06-23, 04:47:49 UTC] {beam.py:172} INFO - Process exited with return code: 0
[2024-06-23, 04:47:49 UTC] {dataflow.py:461} INFO - Start waiting for done.
[2024-06-23, 04:47:49 UTC] {dataflow.py:403} INFO - Google Cloud DataFlow job <redacted> is state: JOB_STATE_PENDING
[2024-06-23, 04:47:49 UTC] {dataflow.py:464} INFO - Waiting for done. Sleep 10 s
[2024-06-23, 04:47:59 UTC] {dataflow.py:403} INFO - Google Cloud DataFlow job <redacted> is state: JOB_STATE_PENDING
[2024-06-23, 04:47:59 UTC] {dataflow.py:464} INFO - Waiting for done. Sleep 10 s
[2024-06-23, 04:48:09 UTC] {dataflow.py:403} INFO - Google Cloud DataFlow job <redacted> is state: JOB_STATE_RUNNING
[2024-06-23, 04:48:09 UTC] {taskinstance.py:1401} INFO - Marking task as SUCCESS. dag_id=<redacted>, task_id=start_streaming, map_index=0, execution_date=20240623T044637, start_date=20240623T044717, end_date=20240623T044809
[2024-06-23, 04:48:09 UTC] {local_task_job_runner.py:228} INFO - Task exited with return code 0
[2024-06-23, 04:48:09 UTC] {taskinstance.py:2781} INFO - 0 downstream tasks scheduled from follow-on schedule check
from airflow.
Changes #40253 works as expected.
from airflow.
Tested #40041 and it works as expected.
from airflow.
Tested #38497, works as expected.
from airflow.
Can you please remove teradata #40378 PR from this 2.3.0 release as we wanted to release compute cluster functionality along with this. We are in process of creating new PR for compute cluster and we would like to release 2.3.0 with these two features as per our roadmap.
from airflow.
#39889 is working as expected
from airflow.
Verified #40080
from airflow.
#40023 works as expected
from airflow.
#39348 looks good. thank you!
from airflow.
Tested #40013, #39771 and #39941. All works fine. Thank you for the release efforts!
from airflow.
#40062 working as expected
from airflow.
#40278 works as expected
from airflow.
Tested all of my work (#38868, #39154, #40048, #40086, #40162) and they all work as expected!
Thanks! 🥂
from airflow.
40301
40300
40297 are all working as expected
from airflow.
Hello. Could you please exclude from this release ydb provider? It’s too young for that ;) it has known bug related to auth and I’d like to add new feature there later. Thank you in advance.
from airflow.
Hello. Could you please exclude from this release ydb provider? It’s too young for that ;) it has known bug related to auth and I’d like to add new feature there later. Thank you in advance.
In your PR you marked it as ready to be released
Missing features are not a blocker for release. We can always add new features later on.
from airflow.
Hello. Could you please exclude from this release ydb provider? It’s too young for that ;) it has known bug related to auth and I’d like to add new feature there later. Thank you in advance.
In your PR you marked it as ready to be released
Missing features are not a blocker for release. We can always add new features later on.
To be honest yaml was just copied and bug was found two days ago;) auth info from file does not work, but there is workaround to provide auth info inplace.
from airflow.
Checked that all changes are there. The common.compat
does not have any code yet, so we could skip releasing it, but there is no harm in doing so - having a 1.0.0 version released in PyPI is generally a good idea.
from airflow.
#39520 stopped working correctly (something changed between now and when i wrote it) and I'm trying to debug what exactly is causing it.
from airflow.
Can you please remove teradata #40378 PR from this 2.3.0 release as we wanted to release compute cluster functionality along with this. We are in process of creating new PR for compute cluster and we would like to release 2.3.0 with these two features as per our roadmap.
It's of course @eladkal (Release Manager's) decision - but no @satish-chinthanippu , this is not how our provider's release process work. We release every changed provider from main, and we only exclude a provider (not individual PRs) from release if a serious bug has been found. Manipulating and manually modifying stuff during the release is not easy, takes time and effort, might break various parts of the process (like documentation generation, or package preparation, publishing, signing and verification) and introduces serious overhead for ther release process that we don't want.
We have more than 90 providers and we cannot afford individual treatment and such "custom" approach.
If there is a bug / regression in the existing functionality that is a blocker - we might remove whole provider from the set of the providers being voted - but that's about all flexibility, and unless there is a bug in Teradata provider, it will be released as-is.
We do not look at other's roadmaps - this is a bit of price to pay for making the provider a "community managed" one and we've been very clear about the process when we accepted Teradata - one of the reasons why Teradata could choose to release their own provider, was that this could help them to manage their own release roadmap and schedule. Once this is a community provider, we do expect Teradata to keep it updated, with dashboard and fix bugs (in their own interest) but this also means that anyone can contribute changes (and Airflow committers make decisions what goes in and out) but also that we release it together with other providers with the same cadence.
I hope that explains how it works :). This is not a complaint or being nasty (we appreciate all the work done, system test dashboards, and all the new features you guys, add). It's just the way how we have to manage 90+ providers in release has to have some limitations and structure, and also the governance requirements of the ASF is very clear that once the code is submitted to our repo it has to follow the rules of ASF where only Airflow committers can decide on the code modifications. So I wanted to make it clear.
from airflow.
Hello, @potiuk . Thank you for clarification. I'd like to ask how we can improve quality of this release or it is not our goal? Usually release means something good and stable for use. But you tell that improving is big overhead. That means that release is something dirty in some situations, it is easy to release. I suggest to make it clear to users who installs such “release”. It could be just a comment that particular providers have some known problems or give authors some release branch to merge fixes there. Or even exclude provider from release.
from airflow.
Or even exclude provider from release.
Yes. This is how it works. Unit tests are the first line of defense and we assume that when we have passing unit test in main - the provider is ready to have release candidate out. And then this conversation here is foreseen to see if there are any bugs that should block certain providers and remove them from release. But this is NOT to block certain features from being released, it is only to see if there are no blocker bugs. What gets merged into main is assumed to be "ready for next release candidate". If you do not want a PR to be merged yet, you should keep on rebasing it and mark it as Draft until you feel it is ready to be released in the next release candidate (whenever it happens).
If there is a blocker bug /regression we exclude provider from release. But it's a 0/1 decision based on release manager's assesment whether it's ok to release particular provider or not, based on description of those who test the RC here. If a bug is found during RC, those who find it - should describe the scope and impact of the bug and release manager assesses and decides what to do. This is at sole discretion of the release manager (see https://www.apache.org/legal/release-policy.html#approving-a-release and releated documentation on the release process requirements by the Apache Software Foundation).
In our case we are ok to release new features even with minor non-blocking bugs, the "strong" reason for excluding the provider is when there is a major regression in already released features. Sometimes we even decide to release providers even if new features are not complete, if some partial implementation "work" but new work is planned (then it will be released in the next wave).
We do not "hold" releases, we release everything that has been merged to main. Full stop. This has been working like that for ~ 4 years for 90+ providers of ours.
from airflow.
BTW. And just to clarify - as per definition of the ASF release manager's job is purely mechanical (+ single-handedly make decision to exclude certain provides based on the assessment of bug description provided by those who test it).
See also here: https://infra.apache.org/release-publishing.html#releasemanager
The release manager releases whatever the community decided to merge as "ready to be released". In case of providers - "main" is the "ready for release" sign, so when you are marking your PR as "ready to be merged" and it passes all tests as an author you are saying "it's ready to be released".
That's why also in those "Status of the providers" issue we mark the authors, so that they can verify just before the release if there are no blocking bugs. See #40382 (comment) where @kacpermuda is still evaluating the impact for openlineage provider. But again - this is for bugs only. What goes into the next release is decided at merge time. See also https://github.com/apache/airflow/blob/main/PROVIDERS.rst#community-providers-release-process- where the release process and various aspects of it are explained.
from airflow.
So just to summarize it in short - release manager is NOT responsible for quality of the merged changes nor for the set of changes that are being prepared as release candidates. In both cases the authors are responsible - both for what goes in but also about the quality of what goes in. Release manager is a purely mechanical role to make the release happen, but the authors (with approval of committers who merge the changes) are driving both the scope and quality of the release. No-one else.. And the authors have a chance to verify their changes once the RC is out and have a chance to say "hey there is a blocker bug, I will fix it for the future release but for now let's remove the provider from the release".
I think this is a very, very clear split of responsibilities here and I am explaining it here, so that it's crystal clear as different people might have different assumptions on what is the release manager's and author's role in the process, and when tests are done.
from airflow.
Hello. Could you please exclude from this release ydb provider? It’s too young for that ;) it has known bug related to auth and I’d like to add new feature there later. Thank you in advance.
Let’s release ydb provider. Known issues are minors.
from airflow.
Can you please remove teradata #40378 PR from this 2.3.0 release as we wanted to release compute cluster functionality along with this. We are in process of creating new PR for compute cluster and we would like to release 2.3.0 with these two features as per our roadmap.
It's of course @eladkal (Release Manager's) decision - but no @satish-chinthanippu , this is not how our provider's release process work. We release every changed provider from main, and we only exclude a provider (not individual PRs) from release if a serious bug has been found. Manipulating and manually modifying stuff during the release is not easy, takes time and effort, might break various parts of the process (like documentation generation, or package preparation, publishing, signing and verification) and introduces serious overhead for ther release process that we don't want.
We have more than 90 providers and we cannot afford individual treatment and such "custom" approach.
If there is a bug / regression in the existing functionality that is a blocker - we might remove whole provider from the set of the providers being voted - but that's about all flexibility, and unless there is a bug in Teradata provider, it will be released as-is.
We do not look at other's roadmaps - this is a bit of price to pay for making the provider a "community managed" one and we've been very clear about the process when we accepted Teradata - one of the reasons why Teradata could choose to release their own provider, was that this could help them to manage their own release roadmap and schedule. Once this is a community provider, we do expect Teradata to keep it updated, with dashboard and fix bugs (in their own interest) but this also means that anyone can contribute changes (and Airflow committers make decisions what goes in and out) but also that we release it together with other providers with the same cadence.
I hope that explains how it works :). This is not a complaint or being nasty (we appreciate all the work done, system test dashboards, and all the new features you guys, add). It's just the way how we have to manage 90+ providers in release has to have some limitations and structure, and also the governance requirements of the ASF is very clear that once the code is submitted to our repo it has to follow the rules of ASF where only Airflow committers can decide on the code modifications. So I wanted to make it clear.
Thank you @potiuk for your detailed information. Understood the process and considerations regarding the release. Actually, @eladkal suggested to raise individual PRs for each related functionality implemented in Airflow Teradata Provider to make the review process simpler. So, in line with this suggestion, we thought of raising individual PRs for the two features we have planned for this release.
We're committed to aligning with the community standards and appreciate the governance framework outlined by ASF.
Given this understanding, we'll proceed with the new PR for the compute cluster functionality alongside Teradata's provider updates as per the standard release cadence. We'll ensure that our contributions meet the necessary criteria and are compatible with the overall release process.
Please let us know if there are any specific guidelines or additional steps we should follow as we prepare these updates with multiple PRs for a single release.
from airflow.
#40378 working as expected.
from airflow.
Actually, @eladkal suggested to raise individual PRs for each related functionality implemented in Airflow Teradata Provider to make the review process simpler.
Which we stand for.
I do not understand the concern you raised. Merged PR = ready to release.
What is the problem with releasing it as is?
from airflow.
Which we stand for.
I do not understand the concern you raised. Merged PR = ready to release.
What is the problem with releasing it as is?
Yep. This is all good and I stand for it too. Raising PR <> merging PR. If you wish PR to wait because there is a need to release it together with another - related - PR, the PR can be kept in draft or with an unresolved conversation explaining that it shoud not yet been merged, and other PR might be added on top (based on the first PR) and both could be rebased until both are ready to be released. I believe there was a lack of understanding that "merged" = "ready to release" for providers, so I hope it's now clear.
from airflow.
@potiuk and @eladkal understood. These steps clarifies on the process to follow to release related features with multiple PRs in a single release. Thank you for providing detailed information.
from airflow.
Actually, @eladkal suggested to raise individual PRs for each related functionality implemented in Airflow Teradata Provider to make the review process simpler.
Which we stand for. I do not understand the concern you raised. Merged PR = ready to release. What is the problem with releasing it as is?
Please release it. #40378 Tested and working as expected.
from airflow.
Related Issues (20)
- Hardcoded zooKeeperNamespace in URL doesn't allow to use custom name for zooKeeperNamespace HOT 5
- CDXGen latest caues error when generating SBOM HOT 6
- Add links to example dags in use params section HOT 3
- Airflow Doesn't Register Changes to Codebase or Connections HOT 1
- List Tasks API Call Fails When Airflow DAG is Missing `start_date` param HOT 5
- Adding BigQuery job state before Marking the state as success in Airflow HOT 1
- Add a check in BigQueryInsertJobOperator to verify the Job state before marking it as success in Airflow HOT 2
- Support DEBUG logging by component HOT 1
- Task with "wait_for_downstream" doesn't wait, because start_date is mishandled HOT 6
- airflow issue HOT 2
- Injecting Environment Variables into ExternalPythonOperator
- Docs: Persona-based quick start guides HOT 2
- Add 'dag_id' and 'dag_run_id' to the default template variables
- KubernetesExecutor does not save logs HOT 1
- EcsRunTaskOperator log output in UI does not contain all events HOT 1
- Problem with retry logic in response to certain database errors
- SnowflakeSqlApiOperator's Snowflake Legacy Url Authentication Issue HOT 4
- On Airflow with Kubernetes Executor, sometime the Task keeps running even though the executor pod has failed and later it is cleaned up by Scheduler as Daemon task HOT 2
- HttpTrigger does not use the new `json` param in the HttpAsyncHook HOT 3
- Breeze svg not working in the developer tasks doc HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from airflow.