Coder Social home page Coder Social logo

Comments (6)

dontgoto avatar dontgoto commented on June 14, 2024 1

There are probably some tricks we can try like using a static lookup table instead of branches:
I tried that, but it did not improve the situation on my machine.

I did some other refactoring that brought down the benchmark's runtime by 20% on my machine for s/us/ns.

There are also code paths that the benchmark is not really testing (those I saw a similar improvement from my changes): s/ms/us/ns values so large that the "day" parsing part of the code gets triggered as well as negative offsets, both don't get checked because the benchmark runs only with random numbers between 0 and 10.

from pandas.

dontgoto avatar dontgoto commented on June 14, 2024 1

I now have a version of the date time conversion that is correct and tested and runs up to 2.5x faster than main and the previous version before the regression (at least on my machine, M2 ARM). Interestingly, I do not see large differences in runtimes between the pre regression commit and main.

I would suggest changing the benchmark inputs to negative values that are at least on the order of one day to make most of the logic in the function actually show up in the runtime of the benchmark. What do you think? Is there a procedure to follow to update the history of benchmark runtimes?

To back that up, here are some of my benchmark results for all three versions, I will update my PR soon:

input range | version main main pre regression fix
+30±10ns 4.9 4.8 3.9
-30±10ns 7.9 7.2 5.2
±30ns 8.4 8.5 7.1
+4d±100s 11.6 11.4 5.2
-4d±100s 12.7 12.7 5.3
±4d 9.7 9.4 7.5

(Runtimes are in ms, runs are for 10^6 random values)

The current versions of the function have branch prediction and variable dependency issues that only show up when using larger ranges of inputs and negative values. The previously mentioned checks against the base unit size are of very low cost in reality since the unit size does not change (in the benchmark, but also in real world) and get branch predicted without failure.

Randomly negative and positive inputs still do not do all too well in my version, but realistically timestamps would not have this property.

from pandas.

dontgoto avatar dontgoto commented on June 14, 2024 1

Possibly related?: #57035

Good find, but the regression here should be unrelated to that plotting issue. Converting 1M timestamps takes some milliseconds, but the referenced regression is in the seconds range.

from pandas.

WillAyd avatar WillAyd commented on June 14, 2024

I think the issue here is that we combined a few case statements into one:

  case NPY_FR_s:
  case NPY_FR_ms:
  case NPY_FR_us:
  case NPY_FR_ns: {
    npy_int64 per_sec;
    if (base == NPY_FR_s) {
      per_sec = 1;
    } else if (base == NPY_FR_ms) {
      per_sec = 1000;
    } else if (base == NPY_FR_us) {
      per_sec = 1000000;
    } else {
      per_sec = 1000000000;
    }

    ...

Whereas previously the code was copy/pasted and slightly tweaked for every case statement.

There are probably some tricks we can try like using a static lookup table instead of branches:

  case NPY_FR_s:
  case NPY_FR_ms:
  case NPY_FR_us:
  case NPY_FR_ns: {
    const npy_int64 sec_per_day = 86400;
    static npy_int64 per_secs[] = {
      1, // NPY_FR_s
      1000, // NPY_FR_ms
      1000000, // NPY_FR_us
      1000000000 // NPY_FR_ns
    };
    const npy_int64 per_sec = per_secs[base - NPY_FR_s];

but I'm not really sure the tricks are worth it. And I think the code de-duplication is probably the increased timing here

from pandas.

WillAyd avatar WillAyd commented on June 14, 2024

@dontgoto thanks for the great research. If you have improvements to the benchmark I would just go ahead and submit it - we don't manage the history of changes to a benchmark itself that strictly

from pandas.

ba05 avatar ba05 commented on June 14, 2024

Possibly related?: #57035

from pandas.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.