Coder Social home page Coder Social logo

Comments (26)

sumerc avatar sumerc commented on June 17, 2024 1

I could only find a bit time working on this.

@sm-Fifteen Implementation is finished and there are few updates on the implementation. Please feel free to comment on anything:

  • I removed fn_descriptor from YStat. This is because YStats cannot be pickled because CodeObjects are unpickable and we have save() methods to save profiles with pickle. So, fn_descriptors go to an external helper dict and I use the help of that dict in func_matches(), module_matches() and those functions will be module functions as they have a dependency to a module global and having a method depending on a global is messy.

So, no more playing with this field. We would explicitly handle filtering on these by module functions.

An example usage can be seen in test_functionality.BasicUsage.test_filter_callback.

  • One other note is: I will make this backward compatible without removing filter from get_func_stats API, but I will not document it anymore. In the docs, only tag and ctx_id filtering will be seen.

EDIT: Some type hints would come in handy, the types builtin module has type hints for some of the internal python objects.

It could be done and admit it is a nice idea. But current Yappi code is Python agnostic as you know. We support all Python versions so it might be a bit tricky but possible for sure.

from yappi.

sumerc avatar sumerc commented on June 17, 2024 1

Finally, this is merged to master. It seemed small change at the start, but turns out not so easy. Docs/Examples are updated.

Thanks for all your help on this!

from yappi.

sm-Fifteen avatar sm-Fifteen commented on June 17, 2024 1

Fair enough. Once I release two #38 and #41 (which is currently on master) I think we are good to go.

That's the plan ;)

Incremental stats clearing will come but I will focus my efforts on releasing these first.

Yeah, the solution I have right now is by no means ideal, but it does the job reasonably enough that it's not a priority for me at the moment.

https://github.com/sm-Fifteen/asgi-server-timing-middleware/blob/eaa91f18cb537b0ad1a7986f2e9e11ae626f82bc/asgi_server_timing/middleware.py#L79-L80

The reason I want this is your middleware implementation will indeed might be a reference implementation for all other ASGI frameworks as well :)

With the current structure, I think it should work for any application using the ASGI protocol (though I've only specifically tested FastAPI). I'll be changing the interface a bit before release (I definitely need to replace those __qualname__ strings with arrays of callables/coroutines), but there shouldn't be any need to adapt it to work with, say, Sanic or Quart since all the FastAPI/Starlette-specific things are actually in the configuration.

app.add_middleware(ServerTimingMiddleware, calls_to_track={
    "db_exec": [sqlalchemy.engine.base.Engine.execute],
    "db_fetch": [
        sqlalchemy.engine.ResultProxy.fetchall,
        sqlalchemy.engine.ResultProxy.fetchone
    ],

    # All steps of the generated fastapi.routing.get_request_handler app
    "1deps": [fastapi.routing.solve_dependencies], # `Depends` execution
    "2endpoint": [fastapi.routing.run_endpoint_function], # Since FastAPI 0.49.2
    "3valid": [fastapi.routing.ModelField.validate], # Pydantic model validation
    "4encode": [fastapi.encoders.jsonable_encoder], # JSON encoding
    "5render": [starlette.responses.JSONResponse.render], # Respnse (JSON) rendering
})

from yappi.

sumerc avatar sumerc commented on June 17, 2024 1

Hey @sm-Fifteen ,
I have released a new version for Yappi: 1.2.4 :) . Keep an eye on any hotfixes(maybe?) following days. Other than that, thanks again! This versions fixes most of the issues we have discussed.

from yappi.

sm-Fifteen avatar sm-Fifteen commented on June 17, 2024

Related to #22.

from yappi.

sumerc avatar sumerc commented on June 17, 2024

Good point!

Please correct me if I am wrong:

An example: you have a nested function like this:

class A:
    class B:
        class C:
            def f(self):
                pass

And you cannot filter on function f because the name generated in Yappi is not deterministic enough. In other words, name might change depending on where it is defined. Right?

I think there are two issues here:

  • We should use __qualname__ instead of __name__ if Python version is 3.3 and up. (Note: I am surprised cProfile has not implemented this, too.)
  • We should be able to filter on function itself somehow in a deterministic way regardless of its string representation. One idea might be to use PyCodeObject? e.g: .get_func_stats(filter={'code': func.__code__}).

from yappi.

sm-Fifteen avatar sm-Fifteen commented on June 17, 2024

Actually, my initial concern was a lot simpler than that:

>>> from time import sleep as sleep1
>>> from asyncio import sleep as sleep2
>>> sleep1.__qualname__
'sleep'
>>> sleep2.__qualname__
'sleep'

If we blindly filter by {'name':sleep1.__qualname__} and don't remember to filter by module as well, the result object will contain both


But since we're here, I just did a quick and dirty check and the following code:

@router.get('/testfoo')
async def testfoo():
    class A:
        class B:
            class C:
                async def f(self):
                    await asyncio.sleep(5)

    await A.B.C().f()
    return {'name': A.B.C().f.__name__, "qualname": A.B.C().f.__qualname__}

Gets profiled as follows:

index: 3265
full_name: .\app\testfoo.py:140 testfoo
ncall: 1/1
ttot: 5.008828
tsub: 0.000012
children:

     index: 3270
     child_full_name: .\app\testfoo.py:145 C.f
     ncall: 1/1
     ttot: 5.008768
     tsub: 0.000008
`{"name":"f","qualname":"testfoo.<locals>.A.B.C.f"}`

It looks like neither of them matches against a name filter, only "C.f" works.

That's admittedly a fairly convoluted case, but it's just another case where building the filter by hand can get obscure, error-prone and lead to an incorrect reading of the result data.

This example (wrappers, a much more common python pattern):

def bar():
    import asyncio
    async def baz():
        await asyncio.sleep(5)

    return baz

@router.get('/testfoo')
async def testfoo():
    runme = bar()
    await runme()
    
    return {'name': runme.__name__, "qualname": runme.__qualname__}
index: 2840
full_name: .\app\testfoo.py:147 testfoo
ncall: 1/1
ttot: 5.000080
tsub: 0.000007
children:

     index: 2842
     child_full_name: .\app\testfoo.py:142 baz
     ncall: 1/1
     ttot: 5.000069
     tsub: 0.000010

     index: 2841
     child_full_name: .\app\testfoo.py:140 bar
     ncall: 1/1
     ttot: 0.000004
     tsub: 0.000004
{"name":"baz","qualname":"bar.<locals>.baz"}

...can also easily make for some ambiguous and tricky to manipulate results.

We should use __qualname__ instead of __name__ if Python version is 3.3 and up. (Note: I am surprised cProfile has not implemented this, too.)

That would definitely be a good idea, qualname takes care of removing a lot of the ambiguity for you. It's not very effective on lambda functions and list/set/dict comprehension and generator expressions (mygen = (num * num for num in [1,2,3])), but then so is yappi and profiling these is probably not as much of an immediate concern.

We should be able to filter on function itself somehow in a deterministic way regardless of its string representation. One idea might be to use PyCodeObject? e.g: .get_func_stats(filter={'code': func.__code__}).

Surely there's an actual function handle we can use instead of the bytecode reference in most cases?

from yappi.

sumerc avatar sumerc commented on June 17, 2024

Surely there's an actual function handle we can use instead of the bytecode reference in most cases?

My concern is that Python runtime pass frame objects as arguments to profiler hooks. Function objects are not naturally passed around in the C side of Yappi(except PyCFunctionObject) and it is not very natural to get function objects from frame objects.

Let's change proposition(we do not need to get __code__ explicitly, we can do it in API)

.get_func_stats(filter={'function': func})

User does not need to know we use PyCodeObject for the underlying comparison? WDYT?

from yappi.

sm-Fifteen avatar sm-Fifteen commented on June 17, 2024

Yeah, I'd agree with passing the function itself, the user doesn't need to know. Plus that would leave yappi with the liberty of changing the exact details of function selection is done without breaking the API. It's probably the wisest way to do it.

The only thing I'm not 100% sure about is that passing the actual callable means it would get in the way of (maybe eventually) adding more advanced filters using callbacks, at least not for all filter parameters, since there wouldn't be a way for yappi to make the distinction between an actual function to filter on and a function meant to test if the function matches:

yappi.get_func_stats(name=lambda name: not name.startswith('_'), module='foo')

yappi.get_func_stats(function=lambda func: inspect.ismethod(func) and isinstance(func.__self__, MySuperClass))

While it's not really much of a concern right now (I'm not even sure the latter would be possible, for the reasons you just brought up), I'm a bit worried about the possibility of blocking that avenue. There's also how the function filter as you're proposing it would be exclusive to the current name-module pair. I'd personally be tempted to just add a new function/functions parameter to get_func_stats that specifically only takes function objects to avoid this.

...but I'm probably just overthinking this, .get_func_stats(filter={'function': func}) is perfect for Yappi as it currently is.

from yappi.

sumerc avatar sumerc commented on June 17, 2024

adding more advanced filters using callbacks, at least not for all filter parameters, since there wouldn't be a way for yappi to make the distinction between an actual function to filter on and a function meant to test if the function matches:

Nice observation!

I generally prefer Python side for these kind of advanced filters. Having name and module in filter param was a bad idea and I regret it. People occasionally and naturally think this is the only way to filter. See this for example.

In fact, filter param was just there optionally for convenience for avoiding looping over the results. But with latest changes now we use it to filter on tag in a fast manner(by implementing filtering on C side) as you know, and it becomes important suddenly :)

I generally prefer this kind of filtering on Yappi for other cases:

db_stats = []
stats = yappi.get_func_stats(filter={tag: 2}) # fast
for stat in stats:
    if stat.module.endswith('sqlalchemy.cursor'):
        db_stats.append(stat)

from yappi.

sm-Fifteen avatar sm-Fifteen commented on June 17, 2024

In that case, shouldn't you deprecate the use of filter, use tag=2 instead (maybe ctx_id, too) for fast first-pass filtering and recommand that people use iteration to sort through the rest (and documenting it aas such)? If you do that, though, adding a function key to the filter dict wouldn't do, and it's clear that the exact method used by yappi to filter on function would expose internal details if the user were to filter it manually, so I presume another keyword parameter for function handles would be in order? Or maybe just add some sort of function_matches() method to the stat object to let users do iterative filtering without having to know how yappi does that comparison internally?

from yappi.

sumerc avatar sumerc commented on June 17, 2024

In that case, shouldn't you deprecate the use of filter, use tag=2 instead (maybe ctx_id, too) for fast first-pass filtering and recommand that people use iteration to sort through the rest (and documenting it aas such)?

Unfortunately, I think we have passed that point, it might not worth the effort(I mean for users). I see some code in repos. that uses filter dict. Let's keep the filter dict, at least for now...

so I presume another keyword parameter for function handles would be in order? Or maybe just add some sort of function_matches() method to the stat object to let users do iterative filtering without having to know how yappi does that comparison internally?

Good point.

I would prefer a custom YFuncStat.function class which has __eq__ against function objects? That way users might be able to write such code:

for stat in stats:
    if stat.function == myfunc:
        pass

Note: I will be starting this in few days, hopefully as it is mostly clarified.

from yappi.

sm-Fifteen avatar sm-Fifteen commented on June 17, 2024

I would prefer a custom YFuncStat.function class which has __eq__ against function objects? That way users might be able to write such code:

for stat in stats:
    if stat.function == myfunc:
        pass

But stat.function wouldn't actually be a function descrptor, would it? If the only purpose of that YFuncStat.function class is to allow for an overloaded equal comparison, wouldn't a simple function_matches method be less surprising/confusing?

EDIT: I mean you could be tempted to do fancier stuff like I proposed earlier, only for that to then not work:

for stat in stats:
    if inspect.ismethod(stat.function) and isinstance(stat.function.__self__, MySuperClass):
        # stat.function is not a real function, so neither of these checks would work
        pass

Note: I will be starting this in few days, hopefully as it is mostly clarified.

👍

from yappi.

sumerc avatar sumerc commented on June 17, 2024

But stat.function wouldn't actually be a function descrptor, would it?

That would definitely be better!

This boils down to the issue of getting function object from a frame or code object and there are ways to do that but not sure if:

  • they will work on all Python versions,
  • they have some edge cases,

If you can find a Python version agnostic way to do this efficiently, I am open to it.

If the only purpose of that YFuncStat.function class is to allow for an overloaded equal comparison, wouldn't a simple function_matches method be less surprising/confusing?

Hmm. Explicit better than implicit. You might be right on that. Ok. Let's do that way.

from yappi.

sm-Fifteen avatar sm-Fifteen commented on June 17, 2024

That would definitely be better!

This boils down to the issue of getting function object from a frame or code object and there are ways to do that but not sure if:

  • they will work on all Python versions,

  • they have some edge cases,

If you can find a Python version agnostic way to do this efficiently, I am open to it.

I'd love this if it were possible, but those constraints make it a pretty tall order. I'm hardly familiar at all with the CPython C API, and then you'd have to consider PyPy support...

Let's not wait until we find a way of doing that before implementing this. function_matches is a fine enough solution for now and can be implemented with what we currently have. If we do figure out a way to make stats.function possible in the future, we can always just change the implementation of function_matches to rely on that without breaking the public API.

Also, I understand we won't be adding a 'function' key to the yappi.get_func_stats filter object? Because making it so you can only filter by function object by manually iterating could be an interesting way to get people to progressively move away from the filter object.

from yappi.

sumerc avatar sumerc commented on June 17, 2024

Hi again,

Here is a sample initial implementation. I would like to hear your feedback.
#44

Note: I get rid of filter dict(regardless of backward compatibility. I make ctx_id and tag arguments to get_func_stats API and defined a filter_callback argument instead of looping all the items. The last one is because we might want to do post-processing on the data after filtering like .sort() or simple slicing.)

Note: There are still some errors on some unit tests, do not bother with those.

Here is a sample application:

import time


def foo():
    time.sleep(0.1)


def bar():
    pass


def _filter(stat):
    from time import sleep
    return stat.func_matches([sleep, foo, bar])
    


import yappi
yappi.start(builtins=True)
foo()
bar()
yappi.stop()
stats = yappi.get_func_stats(filter_callback=_filter)
stats.print_all()

'''
Clock type: CPU
Ordered by: totaltime, desc

name                                  ncall  tsub      ttot      tavg      
t3.py:4 foo                           1      0.000031  0.000070  0.000070
time.sleep                            1      0.000039  0.000039  0.000039
t3.py:8 bar                           1      0.000006  0.000006  0.000006
'''

from yappi.

sumerc avatar sumerc commented on June 17, 2024

While playing with it, module filtering became a possibility without too much effort. Maybe with another API like: module_matches([modules]).

Here is an example:

import time
import packagea

def _filter(stat):
    import inspect
    module = inspect.getmodule(stat.fn_descriptor)
    return module == packagea.packageb

import yappi
yappi.start(builtins=True)
packagea.packageb.bf()
yappi.stop()
stats = yappi.get_func_stats(filter_callback=_filter)
stats = stats.sort("name")
stats.print_all()
'''

Clock type: CPU
Ordered by: name, desc

name                                  ncall  tsub      ttot      tavg      
..packagea/packageb/__init__.py:1 bf  1      0.000006  0.000006  0.000006
'''

UPDATE: I have implemented module_matches(), it is in the branch.

from yappi.

sm-Fifteen avatar sm-Fifteen commented on June 17, 2024

Code-wise, this if far above the expectations I had. I definitely did not expect fn_descriptor to be available, that's great! That could open up a whole bunch of advanced filtering techniques, like checking for method overrides and such.

Getting rid of filter on get_func_stats is ultimately a good call, I'd say, though removing it like that is an API breakage, so as per semantic versionning, that would warrant skipping to yappi 2.0. You haven't removed it from get (yet), so that seems like a bit of an arbitrary removal right now.

I'll see if I can try that branch for myself in the coming days. Judging by your code examples, though, it looks perfect!

from yappi.

sumerc avatar sumerc commented on June 17, 2024

I definitely did not expect fn_descriptor to be available, that's great!

Thanks! But let me underline again that fn_descriptor is still not the function object, it is a bit more complicated than that. It is PyCFunctionObject for the builtin functions/methods and PyCodeObject for regular Python functions.

A side note: there are ways to get function object from code object, but I think I will not bother implementing that at least for now.

from yappi.

sm-Fifteen avatar sm-Fifteen commented on June 17, 2024

Thanks! But let me underline again that fn_descriptor is still not the function object, it is a bit more complicated than that. It is PyCFunctionObject for the builtin functions/methods and PyCodeObject for regular Python functions.

Ahh, I see, that's not immediately obvious by just looking at the code. It might be preferable to keep this field private instead of exposing CPython internals, and using methods like module_matches and func_matches to operate on it in that case. The name fn_descriptor would otherwise suggest something that this property is not, and the variations between user-defined and built-in functions is bound to cause obscure errors for users who do attempt to use that field directly.

EDIT: Some type hints would come in handy, the types builtin module has type hints for some of the internal python objects.

EDIT EDIT: Having something like code_matches() instead of exposing fn_descriptor in case you really need to match on code could also come in handy, like with this complete abomination:

import time.sleep
import inspect 

def bar():
    def baz():
        time.sleep(5)

    return baz

import yappi
yappi.start()
bar()()
yappi.stop()

inner_func = None

for const in bar.__code__.co_consts:
    if inspect.iscode(const) and const.co_name == 'baz':
        inner_func = const

stats = yappi.get_func_stats(filter_callback=lambda stat: stat.code_matches(inner_func))
stats.print_all()

from yappi.

sm-Fifteen avatar sm-Fifteen commented on June 17, 2024
  • This is because YStats cannot be pickled because CodeObjects are unpickable and we have save() methods to save profiles with pickle.

So function and module matching will simply not work on loaded profiling data? You might want to throw an error if the user attempts that, otherwise I can imagine people being confused when they find out none of the func_matches code actually works when operating on pickled data.

  • So, fn_descriptors go to an external helper dict and I use the help of that dict in func_matches(), module_matches() and those functions will be module functions as they have a dependency to a module global and having a method depending on a global is messy.

Makes sense.

  • I will make this backward compatible without removing filter from get_func_stats API, but I will not document it anymore. In the docs, only tag and ctx_id filtering will be seen.

You might want to keep them somewhere and really highlight that they're deprecated or something, but yeah, good idea. That will save you from having to break the API.

It could be done and admit it is a nice idea. But current Yappi code is Python agnostic as you know. We support all Python versions so it might be a bit tricky but possible for sure.

Ah, right, I forgot about Python2 support. I know mypy supports a #type: List[Callable] comment syntax for those who need to stay compatible with Python 2, but I've never tried it myself, I don't know how cumbersome it is to use compared to Python 3000 function annotations.

from yappi.

sm-Fifteen avatar sm-Fifteen commented on June 17, 2024

Thank you for seeing that feature request through, this was the one big headache-inducing issue that was preventing me from confidently using yappi for targeted function profiling. Any plans to include this in a release in the near future?

from yappi.

sumerc avatar sumerc commented on June 17, 2024

Yeah sure. I am planning to release a new version in 2-3 weeks hopefully. BUt sometimes, I really struggle finding time. Hopefully, this will not be the case. Have you integrated a middleware for FastAPI, already?

from yappi.

sm-Fifteen avatar sm-Fifteen commented on June 17, 2024

Have you integrated a middleware for FastAPI, already?

I have a draft implementation ready on the backburner, but I was waiting on this to make sure the API I'm exposing actually works the way I think it does (since my current approach with __qualname__ tends to be fairly failure-prone). I also consider bug #38 as blocking in that regard, since even though I have a workaround, I would much prefer to remove it.

With that said, don't feel like you need to hurry on this. I'm already using this prototype on a few projects of mine and it works rather well, it's just that if I'm going to make this middleware public and have to maintain it, I would prefer to wait until I can make sure that this mechanism is in place so that stats filtering works the way most people will assume it should.

from yappi.

sumerc avatar sumerc commented on June 17, 2024

I agree. I looked at the draft and see https://github.com/sm-Fifteen/asgi-server-timing-middleware/blob/eaa91f18cb537b0ad1a7986f2e9e11ae626f82bc/asgi_server_timing/middleware.py#L56. #38 should fix the issue you have here, right?

I'm already using this prototype on a few projects of mine and it works rather well, it's just that if I'm going to make this middleware public and have to maintain it, I would prefer to wait until I can make sure that this mechanism is in place so that stats filtering works the way most people will assume it should.

Fair enough. Once I release two #38 and #41 (which is currently on master) I think we are good to go. Incremental stats clearing will come but I will focus my efforts on releasing these first. The reason I want this is your middleware implementation will indeed might be a reference implementation for all other ASGI frameworks as well :)

from yappi.

sumerc avatar sumerc commented on June 17, 2024

With the current structure, I think it should work for any application using the ASGI protocol (though I've only specifically tested FastAPI)

This is very nice to hear, indeed! :)

from yappi.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.