Is your feature request related to a problem? Please describe. <de

I have opened <a class="issue-link js-issue-link" data-error-text="Failed to load titl

Deserialize traceback from stack trace string in Temporal failures if able,about temporalio/sdk-python

Comments (27)

cretz commented on July 17, 2024 1

Java's stack trace element string format is a bit more stable from version to version.

Regardless, yeah we can do this. I think we'd just take tblib.Traceback.from_string which seems to have an expectation on Python traceback string format. So maybe it is more stable than I realize. And of course we'd have some tests to ensure it continues to work. I will change the title of this issue to add this functionality. This shouldn't be hard to implement, I'll probably get to it before next release.

from sdk-python.

nathanielobrown commented on July 17, 2024 1

After internal discussions, I can change those default worker-side log levels from debug to warning (but doesn't change the fact that client-side is missing stack traces on their logs by default).
I like it!

I guess I personally don't care so much that the stack traces I see are client side vs. just seeing stack traces by default when testing, but I can see how it's important to have both. Well actually, it might get a little weird if we have both (duplicate output for testing case) but it's important to at least have both options!

You and the rest of the team are doing such a great job of nailing all the details, I really enjoy following along. Thank you for engaging with my questions/issues.

from sdk-python.

cretz commented on July 17, 2024 1

I have opened #75. In addition to other things we have suggested, it contains a test that appends the stack to all Temporal failure errors. Basically you can use this helper:

def append_temporal_stack(exc: Optional[BaseException]) -> None:
    while exc:
        # Only append if it doesn't appear already there
        if (
            isinstance(exc, FailureError)
            and exc.failure
            and exc.failure.stack_trace
            and len(exc.args) == 1
            and "\nStack:\n" not in exc.args[0]
        ):
            exc.args = (f"{exc}\nStack:\n{exc.failure.stack_trace.rstrip()}",)
        exc = exc.__cause__

I am hesitant to add it as a supported public utility at the moment (it's very simple, only a couple of lines if the formatter didn't break it out).

Since we cannot put the traceback back on the error and we don't want stack to be on every string representation of the error, opt-in on the user part is the only way.

I researched custom logging formatters, adapters, etc and it boiled down to just being easier to alter the exceptions in the chain than altering the logging calls. The traceback.format_exception and similar calls by logging and others handle the chain for you, so you can't add the stack after the fact there. And I don't want to recreate my own chaining string formatter because I want to reuse Python's. I would have preferred shallow copying all the exceptions (e.g. copy.copy()) and only adding stack to the shallow copies, but that had problems maintaining the chain too. Same with customizing log record and other approaches.

So basically, altering the exception is easiest. I chose to put the stack after the message instead of before because the internal Python exception formatter always put's the exception class name first which means I can't inject anything before that.

Using that helper above, given the following:

    try:
        raise ValueError("error1")
    except Exception as err:
        raise RuntimeError("error2") from err

Once serialized and sent back from Temporal, after running through append_temporal_stack, your output might look like:

temporalio.exceptions.ApplicationError: ValueError: error1
Stack:
  File "/path/to/file.py", line 100, in my_function
    raise ValueError("error1")

The above exception was the direct cause of the following exception:

temporalio.exceptions.ApplicationError: RuntimeError: error2
Stack:
  File "/path/to/file.py", line 201, in my_function
    raise RuntimeError("error2") from err

Which I think is about the best we can do.

from sdk-python.

nathanielobrown commented on July 17, 2024

Ok, I looked into this bit further and I realized that the chaining is working, but you don't get any stack traces. To be honest I just didn't notice the exception chain without associated stack traces, particularly because the top level WorkflowError had a big stack trace.

I know that the stack trace for ApplicationErrors is available (e.cause.cause.failure.stack_trace) I wonder if there's a way to "rehydrate" the stack trace so it displays as if the exception was raised locally. It looks like https://github.com/ionelmc/python-tblib does this sort of thing.

From what I see there is no stack trace attached for ActivityErrors. It would be nice to have stack traces for these as well, but I guess it's less important and I imagine if it's not in the error prototbuf you're not about to add it.

from sdk-python.

cretz commented on July 17, 2024

I wonder if there's a way to "rehydrate" the stack trace so it displays as if the exception was raised locally

My fear of "rehydrating" the stack trace is if it's not from Python. These stack traces may come from any language. It is normal to have a Python workflow execute a Go activity and vice-versa. Let me check what other languages do here.

From what I see there is no stack trace attached for ActivityErrors.

This is because ActivityError is never "raised", it is only used as a wrapper by the server. It'd be the actual raised error (i.e. ApplicationError) that would have a stack trace.

from sdk-python.

nathanielobrown commented on July 17, 2024

My fear of "rehydrating" the stack trace is if it's not from Python. These stack traces may come from any language. It is normal to have a Python workflow execute a Go activity and vice-versa. Let me check what other languages do here.

I suppose you could just try "rehydration" and then fallback to no traceback or a string representation.

from sdk-python.

cretz commented on July 17, 2024

I suppose you could just try "rehydration" and then fallback to no traceback or a string representation.

I have confirmed this is exactly what Java does. My only concern now is how to parse the traceback. I would rather not have a third party library it at all possible just for this use case. And I also would rather not come up with my own string format for tracebacks that will work on all versions of Python (because I doubt I can trust default traceback string formats from version to version).

The question now becomes how important is it to rehydrate Python stack traces compared to the effort and brittleness of the solution?

from sdk-python.

nathanielobrown commented on July 17, 2024

The question now becomes how important is it to rehydrate Python stack traces compared to the effort and brittleness of the solution?

I say, if Java did it why not Python? I think it would be a nice thing. You could just pull the minimal code needed from tblib into sdk-python. I'd be up for working on something like this, but I haven't been able to get the development setup working on my M1 Mac 😔

Alternatively we could just rely on logging. This is what I'm used to with Dramatiq which is what I currently use for task processing and it's not soo bad.

from sdk-python.

cretz commented on July 17, 2024

I am bit concerned about doing this now as I see the string format is changing in 3.11 to include more detail. It seems relying on traceback string format is rough. It breaks on 3.11 with the library you linked at https://github.com/ionelmc/python-tblib and other traceback parsers I've seen (e.g. bolton). I wonder if we can just ask that users:

err = my_exception
while isinstance(err, FailureError):
    if err.failure.stack_trace:
        try:
            err.__traceback__ = tblib.Traceback.from_string(err.failure.stack_trace)
        except TracebackParseError:
            pass
    err = err.__cause__

from sdk-python.

nathanielobrown commented on July 17, 2024

You are so right, I can see that being a big problem. Maybe the thing is just to think of the developer story ("as a developer I want to see where in my code an error is occurring when I am testing locally") and make sure it's covered in some way?

For example, I'm pretty happy with my custom activity decorator that logs exceptions in activities. So maybe improving the default logging would work too, but I feel like having clear stack traces be displayed in stdout/stderr by default is important.

Under the same developer story, it's still slightly bugs me that you don't get any stack trace for the workflow portion of errors. I feel like the activities are more important as that's were the errors should mostly be, but it would be very nice to have stack traces everywhere.

from sdk-python.

cretz commented on July 17, 2024

Any suggestions here? Options as I see them:

Deserialize into traceback - brittle because traceback strings are non-standard across versions
Ask the user to extract the traces themselves - may be asking too much of the user
Have custom logger - we need to respect their choice on logging
Put the stack trace inside the exception args - then normal str() calls return way more than the message
Our own equivalent of https://docs.python.org/3/library/traceback.html#traceback.print_exception - still requires users actually call it

Are there any options I missed? Any preferred option there?

from sdk-python.

nathanielobrown commented on July 17, 2024

I think I'm partial to logging. Maybe a way to think of it is "default logging" and then user's can still setup their own system if they like. I personally use loguru but it's fairly interoperable/based on the core Python logging, and I feel like that's how things generally are in Python logging. I also use the rich logging handler as my handler for nicely formatted and colored output. So I think it's OK to depend on Python logging. I feel like if you have nice default logging out of the box it also gives a great feel to first time users while power users still know how to reconfigure things how they like it.

I'm not sure where you would log exceptions from (worker process? workflow process?) but I'm looking forward to seeing what you decide.

from sdk-python.

cretz commented on July 17, 2024

The concern is not depending on Python logging, the concern is that the caller calls their own logger at their own time to log the exception. We can't inject the stack trace only at logging time without forcing the user to use some kind of helper explicitly.

How do I make logger.exception("OH NO") print these stack traces without including them in the exception args (i.e. what str()) would show)? We can make helpers they can call, but I am not aware of a way to do this implicitly. Do you have a more concrete suggestion? Can you show code of how a user might use their logger to log this exception with string stack traces?

from sdk-python.

nathanielobrown commented on July 17, 2024

The concern is not depending on Python logging, the concern is that the caller calls their own logger at their own time to log the exception. We can't inject the stack trace only at logging time without forcing the user to use some kind of helper explicitly.

If they want to do that, they can just disable the temporal section of the logging tree. I feel it's not so widely known that you can do this, but you can always add a note to the SDK docs the help point people in the right direction.

How do I make logger.exception("OH NO") print these stack traces without including them in the exception args (i.e. what str()) would show)? We can make helpers they can call, but I am not aware of a way to do this implicitly. Do you have a more concrete suggestion? Can you show code of how a user might use their logger to log this exception with string stack traces?

Possibly you could log exceptions for actions directly from the action worker, and then this issue goes away. Then you lose the exception chain, but I personally don't find the current exception chain very useful because it basically says "A workflow failed and the source was an application error with this message" with no other information about how those things are linked. So, as a developer, if I see "my workflow failed due to an ApplicationError" along with a separate, logged, stack trace for the ApplicationError I likely understand what happened.

Logging application errors local to the activity feels less pure, but might be more useful in practice for local development and then in production you would probably rely more on the temporal UI anyway.

As to logging as string stack trace, I think you would just have to use logger.error rather than logger.exception. As I understand it, logger.exception calls logger.error but passes exception information to be formatted. We have already formatted stack trace, so logger.error might be fine.

from sdk-python.

cretz commented on July 17, 2024

Possibly you could log exceptions for actions directly from the action worker, and then this issue goes away.

I do log as debug with full stack traces at

sdk-python/temporalio/worker/workflow_instance.py

Lines 1120 to 1123 in 0392468

    
           logger.debug( 
        
               f"Workflow raised failure with run ID {self._info.run_id}", 
        
               exc_info=True, 
        
           )

and

sdk-python/temporalio/worker/activity.py

Lines 442 to 444 in 0392468

    
           temporalio.activity.logger.debug( 
        
               "Completing as failed", exc_info=True 
        
           )

, but errors are a normal thing and I don't think we should log each one at a higher log level by default. But of course any user can catch/log their own exceptions or use an interceptor to log.

The issue of client not seeing stack traces does not go away though, because the client does not get the stack trace for a failed activity or workflow in the client-side log by default.

Then you lose the exception chain

Exception chain will always remain. We serialize hierarchical exception causes in all SDKs.

Logging application errors local [...]

Sure, nobody is stopping users from doing that. Granted it shouldn't be the default I don't think.

As to logging as string stack trace [...]

I am not following the concrete suggestion here. Regardless of what exc_info is set as (True on logger.exception by default, False otherwise), the point is the traceback is not present.

Here is a simple problem request:

"Can you make Temporal stack traces appear on the client when they use their logger?"

Everything else about how a user may want to log on the worker side or manually extract the stack trace string out is fine and doable, but it does not solve the problem. I think we either need to solve this problem or tell users their alternatives (e.g. "use X helper when logging" and/or "log on the worker side and just accept client side logging doesn't have stack traces", etc).

from sdk-python.

cretz commented on July 17, 2024

After internal discussions, I can change those default worker-side log levels from debug to warning (but doesn't change the fact that client-side is missing stack traces on their logs by default).

Also, after consulting with others, I think we'll just populate the traceback knowing the limitations and brittleness.

from sdk-python.

cretz commented on July 17, 2024

Well, unfortunately trying to create a types.TracebackType with my own frame type gives:

TypeError: TracebackType() argument 'tb_frame' must be frame, not _TracebackFrame

So I am not sure I can do this even if I wanted to (I am using types.TracebackType so that methods like inspect.istraceback would work). Still researching...

(attached exceptions.py with the parsing code at the bottom in case we want to revisit this in the future)

from sdk-python.

cretz commented on July 17, 2024

After research, I can't find anyone that re-populates __traceback__ from a remote exception. https://github.com/ionelmc/python-tblib, https://github.com/mahmoud/boltons/blob/master/boltons/tbutils.py, https://github.com/celery/billiard/blob/master/billiard/einfo.py, etc all have structure that mimics traceback and can serialize/deserialize traceback and such, but nothing I have found puts the traceback back on the exception for use in logging.

If there is any example in the Python world of anyone doing this, please link. Otherwise, I am afraid it's not really doable.

We could put the traceback string in the message, but that's unreasonable for str() users.

from sdk-python.

nathanielobrown commented on July 17, 2024

Gotcha. Well maybe small logging improvements are the way to a better experience then.

Here's another small improvement I've been thinking of. It would be nice for ApplicationError's to include the root exception name. Right now the message/value whatever of the application error is just that of the underlying message, but then the exception name is not visible at all. So if root error is MyException("something went wrong"), currently ApplicationError: something went wrong is raised but I feel it would be better to raise ApplicationError: MyException: something went wrong. Often the name of the exception is an important piece, and right now it is not displayed but can be accessed via .type I think. I feel like prepending it to the exception message might be a pragmatic solution. What do you think?

from sdk-python.

cretz commented on July 17, 2024

Well maybe small logging improvements are the way to a better experience then.

Any concrete suggestions here? I am struggling to think of the best way.

I feel like prepending it to the exception message might be a pragmatic solution. What do you think?

I am hesitant to fiddle with anyone's exception message string. People should be able to throw a string message and assert equality against that string message I think. This is the case with all of our SDKs I believe.

from sdk-python.

nathanielobrown commented on July 17, 2024

Any concrete suggestions here? I am struggling to think of the best way.

I think switching the logging level for action exceptions to warning (as you all decided) is a good one. If I think of more than the below, I'll let you know!

That's a good point, but can't they access the underlying message via .message? I feel like it's reasonably common to raise a specific exception when the name of the exception describes the issue and the message is empty. For this case, you would currently get an error like ApplicationError: '' which is not so helpful and then you have to drop into a debugger to check that the .type is. Also, from a consistency standpoint, I feel like ApplicationError is different then the root error, so it's OK for the message to be different.

I feel like the the default output should do its best to help the developer understand the error, and then they have these other attributes to do programmatic checks and dig deeper.

from sdk-python.

cretz commented on July 17, 2024

That's a good point, but can't they access the underlying message via .message?

While true, often people expect str(my_error) to have the message, not the type and then the message. But we might be able to compromise here. Will discuss with team.

Note I think that this concern exists with all of our SDKs today, so it may be a bit difficult to convince the team that Python should be special.

EDIT: Also, are you suggesting prepending the string type name on the client side or the worker side? If on the client side, how does this output look when I manually raise ApplicationError("some error") which we often ask people to do (e.g. it's the only way to raise a specific non-retryable exception for instance).

I feel like the the default output should do its best to help the developer understand the error

I agree when it comes to logger output, but not every string representation of the error. If we can do this just for loggers that'd be ideal.

from sdk-python.

nathanielobrown commented on July 17, 2024

Note I think that this concern exists with all of our SDKs today, so it may be a bit difficult to convince the team that Python should be special.

I can understand that. It would be annoying if you wrote in a couple different SDK languages to have these types of differences. As far as matching the pattern of checking the str(my_error) case, I don't see this as a big problem because:

The exception model already takes some understanding, so I feel like it's unreasonable/unlikely for people to ship code without some testing and exploration where they would be confronted with this behaviour
They can always use str(my_error).endswith(...) or just check for the string value with the exception name in it. It's almost a feature, check type and message in one line!
And they have that .message as well

EDIT: Also, are you suggesting prepending the string type name on the client side or the worker side? If on the client side, how does this output look when I manually raise ApplicationError("some error") which we often ask people to do (e.g. it's the only way to raise a specific non-retryable exception for instance).

In this case there would be no underlying exception name to prepend to the message, so maybe just don't prepend anything and this is a non-issue? I guess I was thinking of prepending on the client side. So it's just a presentation level thing, and what's sent over the wire does not change.

I agree when it comes to logger output, but not every string representation of the error. If we can do this just for loggers that'd be ideal.

Again, I don't really care how it gets on the screen, I just want to have clear exception output when I write my tests! I know you will find the best way to accomplish that.

from sdk-python.

cretz commented on July 17, 2024

I think I am starting to come around here...I don't personally like that str(error) is not a simple message like it is with all the standard Python errors, but whatever.

They can always use str(my_error).endswith(...) ...

Do you think we should also append the stack trace to the message? I'm thinking so.

I'm thinking I'll open a PR that sets the base exception args as a string format of optional `f"Stack:\n{stack}\n", optional f"{type}: ", then the message. Sound about right? I'll open a PR and see how it looks.

from sdk-python.

nathanielobrown commented on July 17, 2024

Do you think we should also append the stack trace to the message? I'm thinking so.

Then people really have to go to .message to have logic based on the exception message, but maybe that's a good thing.

For me, I guess this depends on how it displays, but I guess this could tick all the boxes if it looks OK and it's intuitive what's going on. I can compile temporal locally now, so if you come up with something I'd be happy to try it out and share any thoughts.

from sdk-python.

cretz commented on July 17, 2024

After internal discussion, I think we will just prefix with type and document a way to get stack traces in the log formatter. Will update with PR.

from sdk-python.

cretz commented on July 17, 2024

I am closing this for now, but will refer back to it anytime anyone is curious why we can't deserialize tracebacks.

from sdk-python.

Deserialize traceback from stack trace string in Temporal failures if able about sdk-python HOT 27 CLOSED

Comments (27)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	logger.debug(
	f"Workflow raised failure with run ID {self._info.run_id}",
	exc_info=True,
	)

	temporalio.activity.logger.debug(
	"Completing as failed", exc_info=True
	)