Coder Social home page Coder Social logo

Comments (5)

mh- avatar mh- commented on August 24, 2024 1

For one specific case, there was an explanation here: corona-warn-app/cwa-server#693
A user submitted twice, the original keys were accepted only once (to avoid duplicate keys), but 2x4 random padding keys were added.
If someone uses my diagnosis-keys tools, Iā€™d suggest to not always use the auto detect feature, but fix the factor to 5 at the moment, and change it when required.

from ctt.

PalminX avatar PalminX commented on August 24, 2024

Hm, there are 3749 keys in the 2020-08-18 file, which obviously is not a multiple of 5. So it is maybe more a question to @micb25 how he handles these discrepancies

from ctt.

PalminX avatar PalminX commented on August 24, 2024

OK, I saw that @micb25 sometimes manually corrected the multiplier in the past.
So I think here you should also have some way of handling or flagging these inconsistent values, because currently the number of users from 2020-08-18 is probably too high

from ctt.

micb25 avatar micb25 commented on August 24, 2024

OK, I saw that @micb25 sometimes manually corrected the multiplier in the past.

Yes, I had to correct this manually for one of yesterday's hourly packages as well as for one package in the past (2020-08-04). I wonder what situation causes these issues. Fortunately, it seems to happen very rarely. However, the impact on the statistics can be quite significant as you spotted out.

Edit: As a consequence, I do manually check the statistics every day before uploading the new data. And I think this is still necessary for the future, at least as long as fake diagnosis keys are being generated.

from ctt.

janpf avatar janpf commented on August 24, 2024
  • https://ctt.pfstr.de/users/2020-08-18.txt shows a detected padding number of 1, resulting in 379 users
  • The graphs for number of users and number of keys show 98 users and 952 keys, which doesn't match the numbers from approx. users file

the https://ctt.pfstr.de/X/Y.txt files are generated based on the published daily package, while the graphs are based on the hourly packages. So there will be a discrepancy.
This is done since there is no use for the enduser to click through 24 hourly files per day, but the analysis for the hourly files is of course more precise.

I've now changed it so that the https://ctt.pfstr.de/X/Y.txt files are always analysed with a fixed multiplier of 5, so if the multiplier is wrongly detected, or actually changes it will now be visible by comparing those files to the graphs (1 or 2 users difference will nearly always be present).

Is there a problem in the downloaded source data? If so, why does https://micb25.github.io/dka/ show more reasonable values?

Kind of, yes. If the padding is detected strictly automatically the value is jumping all over the place for the hourly packages, as you correctly noted:

There are 3749 keys in the 2020-08-18 file, which obviously is not a multiple of 5.

I wanted to keep the process of updating the page and analyzing new data as automated and "hands-off" as possible, so these cases were handled incorrectly on my end.
I did this to generate the data as transparently as possible, without any manual interventions.
Everybody can replicate my numbers by running the commands defined in the workflow file in that order.

So it is maybe more a question to @micb25 how he handles these discrepancies.

It seems the only way to handle this is to set some reasonable hard-coded values like @micb25 did.

So I think here you should also have some way of handling or flagging these inconsistent values, because currently the number of users from 2020-08-18 is probably too high

If someone uses my diagnosis-keys tools, Iā€™d suggest to not always use the auto detect feature, but fix the factor to 5 at the moment, and change it when required.

I've placed some safeguards, which should fix it for the moment.
I will use -n -a -m 5 (so with the automatic detection activated, but capped at 5) on new packages every day and when an issue appears I will manually flag the file to be reanalyzed with a fixed multiplier of 5 by adding them to this list.

Thank you for notifying me about the issue!

from ctt.

Related Issues (9)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.