Coder Social home page Coder Social logo

Comments (8)

iritkatriel avatar iritkatriel commented on August 25, 2024

The inspect implementation is very old. I think now we can just do max(p[1] for p in code_obj.co_positions()), no?

from cpython.

gaogaotiantian avatar gaogaotiantian commented on August 25, 2024

We can, but should we need to? co_positions could be large for a relatively large function, and I think it's pretty inefficient. If we have some fundamental reasons why we can't have the last line, we can of course fall back to that. But that is not obvious to users. If we have a simple and straightforward O(1) solution that only takes an integer space per code object, why not do it? One reason could be that co_lastlineno will not be used as frequently as co_firstlineno, but I think it's not an uncommon request to get the range of the code.

What might be the concerns here for co_lastlineno? The unnecessary new interface? Or the extra complication in the compiler? I agree the usage might not be imminent, otherwise there would be a lot of complaints, but the cost is pretty small too. If we really don't want to add an extra field to code object, I can understand. If it's the implementation detail, we have alternatives. For example, making it a property which caches the result of calculation from co_positions().

I just think co_lastlineno is very intuitive to users who need to know it.

from cpython.

iritkatriel avatar iritkatriel commented on August 25, 2024

Adding a field to each code object is a cost too. The tradeoff needs to be justified.

The implementation in inspect can change anyway.

from cpython.

gaogaotiantian avatar gaogaotiantian commented on August 25, 2024

Unlike objects like interpreter frame, code object is already huge - an extra integer field is basically nothing space wise. Of course, adding an extra field would mean more maintenance effort.

co_positions() is very inefficient. For example, it takes about 2ms to get the last line number of a function with 3k lines. For pdb, that means we either need to bear this (in run-time), or cache it somewhere. Caching may work but it will use more memory.

The introduction of co_lastlineno solves more than "getting the source code of a function", it makes a useful scenario very easy - is this line in this function. In pdb, the most useful case is at the "call" event of a frame, we can immediately know whether there's a breakpoint in it. If not, we can disable events in that frame.

It's not easy to cache this, because it would mean that we probably need to cache all the code objects and their last line numbers. It's not ideal to bear the overhead either - that's run-time, not in debugging session, it could make the debugging extremely slow.

It's possible that we can pinpoint the corresponding code object when the breakpoint is set by searching all the code objects created in that file to avoid the run-time line check, but lastlineno also helps there! How else could you determine whether a breakpoint belongs to a certain code object?

This value is very helpful for pdb and other dev tools. More importantly, I think the cost is minimal (except for introducing a new field).

from cpython.

iritkatriel avatar iritkatriel commented on August 25, 2024

@markshannon

from cpython.

brandtbucher avatar brandtbucher commented on August 25, 2024

The introduction of co_lastlineno solves more than "getting the source code of a function", it makes a useful scenario very easy - is this line in this function.

Is this true? I'm thinking about the case where you have a nested function definition:

def foo():      # 1
    ...         # 2
    def bar():  # 3
        ...     # 4
    ...         # 5

Is line 4 "in" foo? Does this match what pdb expects? It seems to me that scanning co_positions would be a more reliable way of telling whether a line event will fire for a given code object (and I'd argue that 3k-line-functions are not the common case here).

from cpython.

brandtbucher avatar brandtbucher commented on August 25, 2024

It's not easy to cache this, because it would mean that we probably need to cache all the code objects and their last line numbers.

Code objects are weak-referenceable, so pdb could in theory maintain a weakref.WeakKeyDictionary[types.CodeType, frozenset[int]] mapping code objects to line events they may create. Doesn't seem too nasty.

(Just to be clear, I'm not super opposed to adding this member if we decide that there's a real need for it. But I'm not yet convinced that it's a game-changer for any of the described use-cases.)

from cpython.

gaogaotiantian avatar gaogaotiantian commented on August 25, 2024

It's okay for pdb to have a false positive. It will still be much better. Also co_positions() won't help in that case, the inner function will still be listed as part of the outer function. If you want the precise code object, you need to get all the code objects inside the function and build a tree-like data structure to find it. co_lines() might be helpful?

There has been reports about long programs being "undebuggable" (that involves another issue though) so we should care about that. People sometimes put data in their program which could be huge.

I can do this in pdb if an extra field is considered unnecessary. However, from my perspective, the major issue with co_positions is that it's not obvious. You need to be very familiar with the code object to know what a co_positions is and how to use that to get the last line number of the function. It's also not listed in docs of inspect where people often refer to. co_lastlineno is much more obvious.

Now that I found co_lines(), that is more efficient than co_positions() and it actually helps with nested functions, I might want to try that in pdb.

from cpython.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.