Comments (26)
I could only find a bit time working on this.
@sm-Fifteen Implementation is finished and there are few updates on the implementation. Please feel free to comment on anything:
- I removed
fn_descriptor
fromYStat
. This is because YStats cannot be pickled because CodeObjects are unpickable and we havesave()
methods to save profiles withpickle
. So, fn_descriptors go to an external helper dict and I use the help of that dict in func_matches(), module_matches() and those functions will be module functions as they have a dependency to a module global and having a method depending on a global is messy.
So, no more playing with this field. We would explicitly handle filtering on these by module functions.
An example usage can be seen in test_functionality.BasicUsage.test_filter_callback
.
- One other note is: I will make this backward compatible without removing
filter
fromget_func_stats
API, but I will not document it anymore. In the docs, onlytag
andctx_id
filtering will be seen.
EDIT: Some type hints would come in handy, the types builtin module has type hints for some of the internal python objects.
It could be done and admit it is a nice idea. But current Yappi code is Python agnostic as you know. We support all Python versions so it might be a bit tricky but possible for sure.
from yappi.
Finally, this is merged to master. It seemed small change at the start, but turns out not so easy. Docs/Examples are updated.
Thanks for all your help on this!
from yappi.
Fair enough. Once I release two #38 and #41 (which is currently on master) I think we are good to go.
That's the plan ;)
Incremental stats clearing will come but I will focus my efforts on releasing these first.
Yeah, the solution I have right now is by no means ideal, but it does the job reasonably enough that it's not a priority for me at the moment.
The reason I want this is your middleware implementation will indeed might be a reference implementation for all other ASGI frameworks as well :)
With the current structure, I think it should work for any application using the ASGI protocol (though I've only specifically tested FastAPI). I'll be changing the interface a bit before release (I definitely need to replace those __qualname__
strings with arrays of callables/coroutines), but there shouldn't be any need to adapt it to work with, say, Sanic or Quart since all the FastAPI/Starlette-specific things are actually in the configuration.
app.add_middleware(ServerTimingMiddleware, calls_to_track={
"db_exec": [sqlalchemy.engine.base.Engine.execute],
"db_fetch": [
sqlalchemy.engine.ResultProxy.fetchall,
sqlalchemy.engine.ResultProxy.fetchone
],
# All steps of the generated fastapi.routing.get_request_handler app
"1deps": [fastapi.routing.solve_dependencies], # `Depends` execution
"2endpoint": [fastapi.routing.run_endpoint_function], # Since FastAPI 0.49.2
"3valid": [fastapi.routing.ModelField.validate], # Pydantic model validation
"4encode": [fastapi.encoders.jsonable_encoder], # JSON encoding
"5render": [starlette.responses.JSONResponse.render], # Respnse (JSON) rendering
})
from yappi.
Hey @sm-Fifteen ,
I have released a new version for Yappi: 1.2.4
:) . Keep an eye on any hotfixes(maybe?) following days. Other than that, thanks again! This versions fixes most of the issues we have discussed.
from yappi.
Related to #22.
from yappi.
Good point!
Please correct me if I am wrong:
An example: you have a nested function like this:
class A:
class B:
class C:
def f(self):
pass
And you cannot filter on function f
because the name
generated in Yappi is not deterministic enough. In other words, name
might change depending on where it is defined. Right?
I think there are two issues here:
- We should use
__qualname__
instead of__name__
if Python version is 3.3 and up. (Note: I am surprisedcProfile
has not implemented this, too.) - We should be able to filter on function itself somehow in a deterministic way regardless of its string representation. One idea might be to use
PyCodeObject
? e.g:.get_func_stats(filter={'code': func.__code__})
.
from yappi.
Actually, my initial concern was a lot simpler than that:
>>> from time import sleep as sleep1
>>> from asyncio import sleep as sleep2
>>> sleep1.__qualname__
'sleep'
>>> sleep2.__qualname__
'sleep'
If we blindly filter by {'name':sleep1.__qualname__}
and don't remember to filter by module as well, the result object will contain both
But since we're here, I just did a quick and dirty check and the following code:
@router.get('/testfoo')
async def testfoo():
class A:
class B:
class C:
async def f(self):
await asyncio.sleep(5)
await A.B.C().f()
return {'name': A.B.C().f.__name__, "qualname": A.B.C().f.__qualname__}
Gets profiled as follows:
index: 3265
full_name: .\app\testfoo.py:140 testfoo
ncall: 1/1
ttot: 5.008828
tsub: 0.000012
children:
index: 3270
child_full_name: .\app\testfoo.py:145 C.f
ncall: 1/1
ttot: 5.008768
tsub: 0.000008
`{"name":"f","qualname":"testfoo.<locals>.A.B.C.f"}`
It looks like neither of them matches against a name filter, only "C.f" works.
That's admittedly a fairly convoluted case, but it's just another case where building the filter by hand can get obscure, error-prone and lead to an incorrect reading of the result data.
This example (wrappers, a much more common python pattern):
def bar():
import asyncio
async def baz():
await asyncio.sleep(5)
return baz
@router.get('/testfoo')
async def testfoo():
runme = bar()
await runme()
return {'name': runme.__name__, "qualname": runme.__qualname__}
index: 2840
full_name: .\app\testfoo.py:147 testfoo
ncall: 1/1
ttot: 5.000080
tsub: 0.000007
children:
index: 2842
child_full_name: .\app\testfoo.py:142 baz
ncall: 1/1
ttot: 5.000069
tsub: 0.000010
index: 2841
child_full_name: .\app\testfoo.py:140 bar
ncall: 1/1
ttot: 0.000004
tsub: 0.000004
{"name":"baz","qualname":"bar.<locals>.baz"}
...can also easily make for some ambiguous and tricky to manipulate results.
We should use
__qualname__
instead of__name__
if Python version is 3.3 and up. (Note: I am surprised cProfile has not implemented this, too.)
That would definitely be a good idea, qualname takes care of removing a lot of the ambiguity for you. It's not very effective on lambda functions and list/set/dict comprehension and generator expressions (mygen = (num * num for num in [1,2,3])
), but then so is yappi and profiling these is probably not as much of an immediate concern.
We should be able to filter on function itself somehow in a deterministic way regardless of its string representation. One idea might be to use
PyCodeObject
? e.g:.get_func_stats(filter={'code': func.__code__})
.
Surely there's an actual function handle we can use instead of the bytecode reference in most cases?
from yappi.
Surely there's an actual function handle we can use instead of the bytecode reference in most cases?
My concern is that Python runtime pass frame
objects as arguments to profiler hooks. Function objects are not naturally passed around in the C side of Yappi(except PyCFunctionObject
) and it is not very natural to get function objects from frame objects.
Let's change proposition(we do not need to get __code__
explicitly, we can do it in API)
.get_func_stats(filter={'function': func})
User does not need to know we use PyCodeObject
for the underlying comparison? WDYT?
from yappi.
Yeah, I'd agree with passing the function itself, the user doesn't need to know. Plus that would leave yappi with the liberty of changing the exact details of function selection is done without breaking the API. It's probably the wisest way to do it.
The only thing I'm not 100% sure about is that passing the actual callable means it would get in the way of (maybe eventually) adding more advanced filters using callbacks, at least not for all filter parameters, since there wouldn't be a way for yappi to make the distinction between an actual function to filter on and a function meant to test if the function matches:
yappi.get_func_stats(name=lambda name: not name.startswith('_'), module='foo')
yappi.get_func_stats(function=lambda func: inspect.ismethod(func) and isinstance(func.__self__, MySuperClass))
While it's not really much of a concern right now (I'm not even sure the latter would be possible, for the reasons you just brought up), I'm a bit worried about the possibility of blocking that avenue. There's also how the function
filter as you're proposing it would be exclusive to the current name
-module
pair. I'd personally be tempted to just add a new function
/functions
parameter to get_func_stats
that specifically only takes function objects to avoid this.
...but I'm probably just overthinking this, .get_func_stats(filter={'function': func})
is perfect for Yappi as it currently is.
from yappi.
adding more advanced filters using callbacks, at least not for all filter parameters, since there wouldn't be a way for yappi to make the distinction between an actual function to filter on and a function meant to test if the function matches:
Nice observation!
I generally prefer Python side for these kind of advanced filters. Having name
and module
in filter
param was a bad idea and I regret it. People occasionally and naturally think this is the only way to filter. See this for example.
In fact, filter
param was just there optionally for convenience for avoiding looping over the results. But with latest changes now we use it to filter on tag
in a fast manner(by implementing filtering on C side) as you know, and it becomes important suddenly :)
I generally prefer this kind of filtering on Yappi for other cases:
db_stats = []
stats = yappi.get_func_stats(filter={tag: 2}) # fast
for stat in stats:
if stat.module.endswith('sqlalchemy.cursor'):
db_stats.append(stat)
from yappi.
In that case, shouldn't you deprecate the use of filter
, use tag=2
instead (maybe ctx_id, too) for fast first-pass filtering and recommand that people use iteration to sort through the rest (and documenting it aas such)? If you do that, though, adding a function
key to the filter dict wouldn't do, and it's clear that the exact method used by yappi to filter on function would expose internal details if the user were to filter it manually, so I presume another keyword parameter for function handles would be in order? Or maybe just add some sort of function_matches()
method to the stat
object to let users do iterative filtering without having to know how yappi does that comparison internally?
from yappi.
In that case, shouldn't you deprecate the use of filter, use tag=2 instead (maybe ctx_id, too) for fast first-pass filtering and recommand that people use iteration to sort through the rest (and documenting it aas such)?
Unfortunately, I think we have passed that point, it might not worth the effort(I mean for users). I see some code in repos. that uses filter
dict. Let's keep the filter dict, at least for now...
so I presume another keyword parameter for function handles would be in order? Or maybe just add some sort of function_matches() method to the stat object to let users do iterative filtering without having to know how yappi does that comparison internally?
Good point.
I would prefer a custom YFuncStat.function
class which has __eq__
against function objects? That way users might be able to write such code:
for stat in stats:
if stat.function == myfunc:
pass
Note: I will be starting this in few days, hopefully as it is mostly clarified.
from yappi.
I would prefer a custom
YFuncStat.function
class which has__eq__
against function objects? That way users might be able to write such code:for stat in stats: if stat.function == myfunc: pass
But stat.function
wouldn't actually be a function descrptor, would it? If the only purpose of that YFuncStat.function
class is to allow for an overloaded equal comparison, wouldn't a simple function_matches
method be less surprising/confusing?
EDIT: I mean you could be tempted to do fancier stuff like I proposed earlier, only for that to then not work:
for stat in stats:
if inspect.ismethod(stat.function) and isinstance(stat.function.__self__, MySuperClass):
# stat.function is not a real function, so neither of these checks would work
pass
Note: I will be starting this in few days, hopefully as it is mostly clarified.
👍
from yappi.
But stat.function wouldn't actually be a function descrptor, would it?
That would definitely be better!
This boils down to the issue of getting function object from a frame or code object and there are ways to do that but not sure if:
- they will work on all Python versions,
- they have some edge cases,
If you can find a Python version agnostic way to do this efficiently, I am open to it.
If the only purpose of that YFuncStat.function class is to allow for an overloaded equal comparison, wouldn't a simple function_matches method be less surprising/confusing?
Hmm. Explicit better than implicit. You might be right on that. Ok. Let's do that way.
from yappi.
That would definitely be better!
This boils down to the issue of getting function object from a frame or code object and there are ways to do that but not sure if:
they will work on all Python versions,
they have some edge cases,
If you can find a Python version agnostic way to do this efficiently, I am open to it.
I'd love this if it were possible, but those constraints make it a pretty tall order. I'm hardly familiar at all with the CPython C API, and then you'd have to consider PyPy support...
Let's not wait until we find a way of doing that before implementing this. function_matches
is a fine enough solution for now and can be implemented with what we currently have. If we do figure out a way to make stats.function
possible in the future, we can always just change the implementation of function_matches
to rely on that without breaking the public API.
Also, I understand we won't be adding a 'function' key to the yappi.get_func_stats
filter object? Because making it so you can only filter by function object by manually iterating could be an interesting way to get people to progressively move away from the filter object.
from yappi.
Hi again,
Here is a sample initial implementation. I would like to hear your feedback.
#44
Note: I get rid of filter
dict(regardless of backward compatibility. I make ctx_id
and tag
arguments to get_func_stats
API and defined a filter_callback
argument instead of looping all the items. The last one is because we might want to do post-processing on the data after filtering like .sort()
or simple slicing.)
Note: There are still some errors on some unit tests, do not bother with those.
Here is a sample application:
import time
def foo():
time.sleep(0.1)
def bar():
pass
def _filter(stat):
from time import sleep
return stat.func_matches([sleep, foo, bar])
import yappi
yappi.start(builtins=True)
foo()
bar()
yappi.stop()
stats = yappi.get_func_stats(filter_callback=_filter)
stats.print_all()
'''
Clock type: CPU
Ordered by: totaltime, desc
name ncall tsub ttot tavg
t3.py:4 foo 1 0.000031 0.000070 0.000070
time.sleep 1 0.000039 0.000039 0.000039
t3.py:8 bar 1 0.000006 0.000006 0.000006
'''
from yappi.
While playing with it, module filtering became a possibility without too much effort. Maybe with another API like: module_matches([modules])
.
Here is an example:
import time
import packagea
def _filter(stat):
import inspect
module = inspect.getmodule(stat.fn_descriptor)
return module == packagea.packageb
import yappi
yappi.start(builtins=True)
packagea.packageb.bf()
yappi.stop()
stats = yappi.get_func_stats(filter_callback=_filter)
stats = stats.sort("name")
stats.print_all()
'''
Clock type: CPU
Ordered by: name, desc
name ncall tsub ttot tavg
..packagea/packageb/__init__.py:1 bf 1 0.000006 0.000006 0.000006
'''
UPDATE: I have implemented module_matches()
, it is in the branch.
from yappi.
Code-wise, this if far above the expectations I had. I definitely did not expect fn_descriptor
to be available, that's great! That could open up a whole bunch of advanced filtering techniques, like checking for method overrides and such.
Getting rid of filter
on get_func_stats
is ultimately a good call, I'd say, though removing it like that is an API breakage, so as per semantic versionning, that would warrant skipping to yappi 2.0
. You haven't removed it from get
(yet), so that seems like a bit of an arbitrary removal right now.
I'll see if I can try that branch for myself in the coming days. Judging by your code examples, though, it looks perfect!
from yappi.
I definitely did not expect fn_descriptor to be available, that's great!
Thanks! But let me underline again that fn_descriptor
is still not the function object, it is a bit more complicated than that. It is PyCFunctionObject
for the builtin functions/methods and PyCodeObject
for regular Python functions.
A side note: there are ways to get function object from code object, but I think I will not bother implementing that at least for now.
from yappi.
Thanks! But let me underline again that
fn_descriptor
is still not the function object, it is a bit more complicated than that. It isPyCFunctionObject
for the builtin functions/methods andPyCodeObject
for regular Python functions.
Ahh, I see, that's not immediately obvious by just looking at the code. It might be preferable to keep this field private instead of exposing CPython internals, and using methods like module_matches
and func_matches
to operate on it in that case. The name fn_descriptor
would otherwise suggest something that this property is not, and the variations between user-defined and built-in functions is bound to cause obscure errors for users who do attempt to use that field directly.
EDIT: Some type hints would come in handy, the types
builtin module has type hints for some of the internal python objects.
EDIT EDIT: Having something like code_matches()
instead of exposing fn_descriptor
in case you really need to match on code could also come in handy, like with this complete abomination:
import time.sleep
import inspect
def bar():
def baz():
time.sleep(5)
return baz
import yappi
yappi.start()
bar()()
yappi.stop()
inner_func = None
for const in bar.__code__.co_consts:
if inspect.iscode(const) and const.co_name == 'baz':
inner_func = const
stats = yappi.get_func_stats(filter_callback=lambda stat: stat.code_matches(inner_func))
stats.print_all()
from yappi.
- This is because YStats cannot be pickled because CodeObjects are unpickable and we have
save()
methods to save profiles withpickle
.
So function and module matching will simply not work on loaded profiling data? You might want to throw an error if the user attempts that, otherwise I can imagine people being confused when they find out none of the func_matches code actually works when operating on pickled data.
- So, fn_descriptors go to an external helper dict and I use the help of that dict in func_matches(), module_matches() and those functions will be module functions as they have a dependency to a module global and having a method depending on a global is messy.
Makes sense.
- I will make this backward compatible without removing
filter
fromget_func_stats
API, but I will not document it anymore. In the docs, onlytag
andctx_id
filtering will be seen.
You might want to keep them somewhere and really highlight that they're deprecated or something, but yeah, good idea. That will save you from having to break the API.
It could be done and admit it is a nice idea. But current Yappi code is Python agnostic as you know. We support all Python versions so it might be a bit tricky but possible for sure.
Ah, right, I forgot about Python2 support. I know mypy supports a #type: List[Callable]
comment syntax for those who need to stay compatible with Python 2, but I've never tried it myself, I don't know how cumbersome it is to use compared to Python 3000 function annotations.
from yappi.
Thank you for seeing that feature request through, this was the one big headache-inducing issue that was preventing me from confidently using yappi for targeted function profiling. Any plans to include this in a release in the near future?
from yappi.
Yeah sure. I am planning to release a new version in 2-3 weeks hopefully. BUt sometimes, I really struggle finding time. Hopefully, this will not be the case. Have you integrated a middleware for FastAPI, already?
from yappi.
Have you integrated a middleware for FastAPI, already?
I have a draft implementation ready on the backburner, but I was waiting on this to make sure the API I'm exposing actually works the way I think it does (since my current approach with __qualname__
tends to be fairly failure-prone). I also consider bug #38 as blocking in that regard, since even though I have a workaround, I would much prefer to remove it.
With that said, don't feel like you need to hurry on this. I'm already using this prototype on a few projects of mine and it works rather well, it's just that if I'm going to make this middleware public and have to maintain it, I would prefer to wait until I can make sure that this mechanism is in place so that stats filtering works the way most people will assume it should.
from yappi.
I agree. I looked at the draft and see https://github.com/sm-Fifteen/asgi-server-timing-middleware/blob/eaa91f18cb537b0ad1a7986f2e9e11ae626f82bc/asgi_server_timing/middleware.py#L56. #38 should fix the issue you have here, right?
I'm already using this prototype on a few projects of mine and it works rather well, it's just that if I'm going to make this middleware public and have to maintain it, I would prefer to wait until I can make sure that this mechanism is in place so that stats filtering works the way most people will assume it should.
Fair enough. Once I release two #38 and #41 (which is currently on master) I think we are good to go. Incremental stats clearing will come but I will focus my efforts on releasing these first. The reason I want this is your middleware implementation will indeed might be a reference implementation for all other ASGI frameworks as well :)
from yappi.
With the current structure, I think it should work for any application using the ASGI protocol (though I've only specifically tested FastAPI)
This is very nice to hear, indeed! :)
from yappi.
Related Issues (20)
- Missing tag for 1.3.5 release HOT 3
- Tests are broken on Python 3.11 due to @asyncio.coroutine use HOT 1
- yappi fails to build with python 3.11
- suggestion: add a column showing max execution time for a single call HOT 6
- `test_gevent` seems to depend on `3.10`
- Support more architectures in Github Actions
- Openssl related Error install Yappi with pip on pyton 3.11 OS:Windows HOT 1
- Python run `gevent` tests on CI for `3.11`
- New 3.11 compatible release? HOT 3
- base class names incorrect HOT 2
- Question: does yappi add "considerable runtime overhead"? Can it be used in production? HOT 2
- failing to capture FastAPI uvicorn threads, only seeing the _MainThread HOT 4
- sum(<genexpr>) not represented correctly in call graph
- Can't install on macOS HOT 2
- Uniqueness of the YFuncStat object
- Full path not printed HOT 2
- could you make the git tag name pattern consistent? HOT 1
- Irrelevant Thread Name for Gevent/Greenlet Profiling
- Print only `N` lines of result
- Is python 3.12 supported? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from yappi.