Coder Social home page Coder Social logo

feature: Capture run ID about meltano HOT 7 CLOSED

menzenski avatar menzenski commented on May 27, 2024
feature: Capture run ID

from meltano.

Comments (7)

edgarrmondragon avatar edgarrmondragon commented on May 27, 2024 1
  • Invoke meltano with meltano run tap-my-source target-my-destination --run-id=abc123
  • In the runs table, the record for this run has that persisted on the payload as a new "metadata": {"run-id":"abc123"} field.

@menzenski Would this have a different value to the run_id column in the runs table? If so, I can imagine it could lead to some confusion.

FWIW if you wanna check out the approach, I was able to experiment with a --run-id=... option in #8459 and I'm able to see the value correctly set in the runs table:

Screenshot 2024-03-22 at 11 12 49 a m

from meltano.

menzenski avatar menzenski commented on May 27, 2024 1

@edgarrmondragon sorry for my delayed response here, I was out of office and missed your update - the draft PR https://github.com/meltano/meltano/pull/8459/files looks awesome, that'd totally work for our use case. (I confirmed that Argo Workflows is using v4 UUID strings).

from meltano.

menzenski avatar menzenski commented on May 27, 2024 1

@edgarrmondragon I put Meltano 3.4.0 into production today - we're using this new --run-id flag to set the Meltano run ID to the workflow ID of the Argo Workflows workflow that runs Meltano.

It works great! Huge quality-of-life improvement for us. Thanks so much for implementing this!

from meltano.

edgarrmondragon avatar edgarrmondragon commented on May 27, 2024

Thanks for filing @menzenski!

If meltano run could accept a --run-id=abc123 CLI argument or similar, that could be persisted as part of the runs table record for that run.

I can imagine this, though we'd prefer to keep the run ID as a UUID to avoid having to create an Alembic migration script, since in Postgres it uses the builtin UUID type.

Uniqueness of run_id is not enforced, but I wonder what problems could come from running two pipelines with the same run ID. Maybe they'd just use the same log file?

Let me know if those restrictions work for you and your workflow, or if you'd need support for arbitrary strings.

If meltano run would expose the run ID of the current job as an environment variable (MELTANO_RUN_ID or similar), we could capture that upon completion of the job and persist it in the argo workflows archive.

I'm certain we could pass down a MELTANO_RUN_ID env var to the plugin's subprocess, but I don't think that would be exposed outside of it, so I'm not sure it could be retrieved.

from meltano.

menzenski avatar menzenski commented on May 27, 2024

If meltano run could accept a --run-id=abc123 CLI argument or similar, that could be persisted as part of the runs table record for that run.

I can imagine this, though we'd prefer to keep the run ID as a UUID to avoid having to create an Alembic migration script, since in Postgres it uses the builtin UUID type.

Uniqueness of run_id is not enforced, but I wonder what problems could come from running two pipelines with the same run ID. Maybe they'd just use the same log file?

Let me know if those restrictions work for you and your workflow, or if you'd need support for arbitrary strings.

Sorry - I wasn't clear in my original message. I wasn't trying to propose that an orchestrator external to meltano should be able to set the meltano run ID. Rather, I was thinking about something like this:

  • Invoke meltano with meltano run tap-my-source target-my-destination --run-id=abc123
  • In the runs table, the record for this run has that persisted on the payload as a new "metadata": {"run-id":"abc123"} field.

Or similar - it seems that the payload column is "just a JSON-encoded dict" (per

payload: Mapped[dict] = mapped_column(MutableDict.as_mutable(JSONEncodedDict))
) so in theory it could support an additional field (alongside the existing singer_state property).

If meltano run would expose the run ID of the current job as an environment variable (MELTANO_RUN_ID or similar), we could capture that upon completion of the job and persist it in the argo workflows archive.

I'm certain we could pass down a MELTANO_RUN_ID env var to the plugin's subprocess, but I don't think that would be exposed outside of it, so I'm not sure it could be retrieved.

For our use case, as long as it was available as an environment variable here, when the block run completed message is logged (on success or error)

async def _run_blocks(
tracker: Tracker,
parsed_blocks: list[BlockSet | PluginCommandBlock],
dry_run: bool,
) -> None:
for idx, blk in enumerate(parsed_blocks):
blk_name = blk.__class__.__name__
tracking_ctx = PluginsTrackingContext.from_block(blk)
with tracker.with_contexts(tracking_ctx):
tracker.track_block_event(blk_name, BlockEvents.initialized)
if dry_run:
msg = f"Dry run, but would have run block {idx + 1}/{len(parsed_blocks)}."
if isinstance(blk, BlockSet):
logger.info(
msg,
block_type=blk_name,
comprised_of=[plugin.string_id for plugin in blk.blocks],
)
elif isinstance(blk, PluginCommandBlock):
logger.info(
msg,
block_type=blk_name,
comprised_of=f"{blk.string_id}:{blk.command}",
)
continue
try:
await blk.run()
except RunnerError as err:
logger.error(
"Block run completed.",
set_number=idx,
block_type=blk_name,
success=False,
err=err,
exit_codes=err.exitcodes,
)
with tracker.with_contexts(tracking_ctx):
tracker.track_block_event(blk_name, BlockEvents.failed)
raise CliError(
f"Run invocation could not be completed as block failed: {err}", # noqa: EM102
) from err
except Exception as bare_err:
# make sure we also fire block failed events for all other exceptions
with tracker.with_contexts(tracking_ctx):
tracker.track_block_event(blk_name, BlockEvents.failed)
raise bare_err
logger.info(
"Block run completed.",
set_number=idx,
block_type=blk.__class__.__name__,
success=True,
err=None,
)
with tracker.with_contexts(tracking_ctx):
tracker.track_block_event(blk_name, BlockEvents.completed)
, I think we'd be able to pull it from the environment in our workflow exit handler.

from meltano.

edgarrmondragon avatar edgarrmondragon commented on May 27, 2024

@edgarrmondragon sorry for my delayed response here, I was out of office and missed your update - the draft PR https://github.com/meltano/meltano/pull/8459/files looks awesome, that'd totally work for our use case. (I confirmed that Argo Workflows is using v4 UUID strings).

Thanks for confirming @menzenski. I'm already in the process of beta testing Meltano 3.4.0 but I could probably slip #8459 in if the team accepts it.

from meltano.

edgarrmondragon avatar edgarrmondragon commented on May 27, 2024

@edgarrmondragon I put Meltano 3.4.0 into production today - we're using this new --run-id flag to set the Meltano run ID to the workflow ID of the Argo Workflows workflow that runs Meltano.

It works great! Huge quality-of-life improvement for us. Thanks so much for implementing this!

I'm glad that it's helpful!

from meltano.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.