Coder Social home page Coder Social logo

Comments (7)

schochastics avatar schochastics commented on June 25, 2024 1

Upfront I need to clarify that my benchmark is not that fair yet, because I just return a character vector so far, not a POSIXct formatted object. calcUnique doesnt make a difference in my current benchmark since I used a vector of unique dates.

I try to remember to report back here once i did some more rigorous testing with chronos and a better interface

from anytime.

eddelbuettel avatar eddelbuettel commented on June 25, 2024 1

No need to report back here then if you also use unique values.

from anytime.

eddelbuettel avatar eddelbuettel commented on June 25, 2024

(Aside: That's gowdawfully formatted code. But that's just me and 25+ years of ESS use.)

I have the feeling that has come up before. Did you check old issues?

Could you also please measure the overhead of computing unique values at those size for vectors that are in fact unique without replicates?

from anytime.

eddelbuettel avatar eddelbuettel commented on June 25, 2024

Maybe add a third column using this value:

calcUnique: A logical value with a default value of ‘FALSE’ that tells
          the function to perform the ‘anytime()’ or ‘anydate()’
          calculation only once for each unique value in the ‘x’
          vector. It results in no difference in inputs or outputs, but
          can result in a significant speed increases for long vectors
          where each timestamp appears more than once. However, it will
          result in a slight slow down for input vectors where each
          timestamp appears only once.

from anytime.

etiennebacher avatar etiennebacher commented on June 25, 2024

Only saw #109 and calcUnique now... Sorry for the duplicated (and already fixed) issue.


Results with calcUnique = TRUE for future visitors:

from anytime.

eddelbuettel avatar eddelbuettel commented on June 25, 2024

@schochastics Please see above -- @etiennebacher did some digging and touches upon an issue that may matter for your benchmarks too. I have the default for unique on 'off' because where I came from (in my former field of high-ish frequency finance) our timestamps tend to indeed be unique (and by now the field is of course more occoupied with nanoseconds resolution so POSIXct is of limited usefulness, that was different when I wrote anytime). And for dates it is definitely an issue as it is so much easier to clash values.

@etiennebacher We could think about some data.table alike heuristics here. Maybe if N > someValue, say 10k, we sample 100 and see if we have replication. Or maybe blockwise sample ten blocks of ten? This would require some thinking but you do document that the gain could be substantial. Worth doing as a heuristic?

from anytime.

etiennebacher avatar etiennebacher commented on June 25, 2024

Worth doing as a heuristic?

Could be, but I'm not an active user of anytime as I rarely have a usecase for it, so I don't think my opinion matters much here. I was simply intrigued by the benchmarks of @schochastics and explored a bit to see if there were some low-hanging fruits.

from anytime.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.