Coder Social home page Coder Social logo

Comments (6)

barneygale avatar barneygale commented on August 25, 2024

CC @pitrou

from cpython.

pitrou avatar pitrou commented on August 25, 2024

I've never been able to establish that this is a worthwhile thing to do.

That's a good question. The contention is that, if you keep a lot of related paths in memory, interning the path components would yield significant memory savings. But how useful it is would depend on the use case; and it's probably possible to construct use cases where it would be detrimental.

from cpython.

barneygale avatar barneygale commented on August 25, 2024

Thank you :)

it's probably possible to construct use cases where it would be detrimental.

Surprisingly, sys.intern(str(x)) accounts for ~10% of the cost of str(Path('foo', 'bar')) on my machine. i.e. this patch makes it ~10% faster:

diff --git a/Lib/pathlib/_local.py b/Lib/pathlib/_local.py
index 49d9f813c5..d20512bd9b 100644
--- a/Lib/pathlib/_local.py
+++ b/Lib/pathlib/_local.py
@@ -270,7 +270,7 @@ def _parse_path(cls, path):
             elif len(drv_parts) == 6:
                 # e.g. //?/unc/server/share
                 root = sep
-        parsed = [sys.intern(str(x)) for x in rel.split(sep) if x and x != '.']
+        parsed = [x for x in rel.split(sep) if x and x != '.']
         return drv, root, parsed
 
     @property

from cpython.

pitrou avatar pitrou commented on August 25, 2024

It can probably be micro-optimized if you really care. For example I'm not sure the str call is still useful.

>>> x = 'foo'
>>> %timeit str(x)
40.3 ns ± 0.169 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
>>> %timeit sys.intern(x)
46.9 ns ± 0.402 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
>>> %timeit sys.intern(str(x))
79.7 ns ± 0.635 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
>>> _intern = sys.intern
>>> %timeit _intern(x)
28.3 ns ± 0.11 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

from cpython.

barneygale avatar barneygale commented on August 25, 2024

Hum! That change used to trip up the test_str_subclass* tests, but it doesn't any more. I wonder if there's been a change in how string subclasses work elsewhere in the language.

from cpython.

vstinner avatar vstinner commented on August 25, 2024

The contention is that, if you keep a lot of related paths in memory, interning the path components would yield significant memory savings.

Can pathlib have its own "interned strings" cache with a limit on the cache size? Well, I don't know if it's worth it :-)

Pseudo-code with a limit on 3 entries:

cache = {}
def get(key):
    try:
        return cache[key]
    except KeyError:
        pass

    if len(cache) >= 3:
        return key

    cache[key] = key
    return key

abc = get(b'abc'.decode())
abc2 = get(b'abc'.decode())
assert abc2 is abc

d = get(b'd'.decode())
e = get(b'e'.decode())
e2 = get(b'e'.decode())
assert e2 is e

# cache no longer used
f = get(b'f'.decode())
print("cache size", len(cache))

Can pathlib remove entries from such cache?

from cpython.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.