Coder Social home page Coder Social logo

Decide on handling country `EUR` about ctt HOT 9 CLOSED

felixlen avatar felixlen commented on August 24, 2024
Decide on handling country `EUR`

from ctt.

Comments (9)

mh- avatar mh- commented on August 24, 2024 3

@felixlen I also would like to thank you for your investigations. Honestly, I'm quite unhappy with the way the CWA project communicates things like this, and that we have to find out everything ourselves, after the fact.

Here is another interesting observation from @kbobrowski:
[The CWA] app fetches only from endpoints which are intersection of two lists:

from ctt.

felixlen avatar felixlen commented on August 24, 2024 1

Totally understandable. I tried to investigate a bit and provide answers and/or speculations:

The reasons I decided to ignore the EUR endpoint up until now and just do nothing:

  • I'm unsure, if the EUR endpoint provides a superset of the DE endpoint or in fact in which way the published packages of the two endpoints are related at all, since the DE endpoint does publish "foreign" keys. I'm honestly a bit confused and did not have the time to dive into this.

I checked both files for the past 14 days with parse_keys.py of diagnosis-keys, parsed and compared the output:

2020-10-18 DE in EUR: 4814, DE not in EUR: 0; EUR in DE: 4814, EUR not in DE: 0; order of teks identical
2020-10-19 DE in EUR: 6386, DE not in EUR: 0; EUR in DE: 6386, EUR not in DE: 0; order of teks identical
2020-10-20 DE in EUR: 10876, DE not in EUR: 0; EUR in DE: 10876, EUR not in DE: 0; order of teks identical
2020-10-21 DE in EUR: 14309, DE not in EUR: 0; EUR in DE: 14309, EUR not in DE: 0; order of teks identical
2020-10-22 DE in EUR: 16122, DE not in EUR: 0; EUR in DE: 16122, EUR not in DE: 0; order of teks identical
2020-10-23 DE in EUR: 18645, DE not in EUR: 0; EUR in DE: 18645, EUR not in DE: 0; order of teks identical
2020-10-24 DE in EUR: 14422, DE not in EUR: 0; EUR in DE: 14422, EUR not in DE: 0; order of teks identical
2020-10-25 DE in EUR: 9554, DE not in EUR: 0; EUR in DE: 9554, EUR not in DE: 0; order of teks identical
2020-10-26 DE in EUR: 15491, DE not in EUR: 0; EUR in DE: 15491, EUR not in DE: 0; order of teks identical
2020-10-27 DE in EUR: 20231, DE not in EUR: 0; EUR in DE: 20231, EUR not in DE: 1203
2020-10-28 DE in EUR: 23570, DE not in EUR: 0; EUR in DE: 23570, EUR not in DE: 1172
2020-10-29 DE in EUR: 26139, DE not in EUR: 0; EUR in DE: 26139, EUR not in DE: 886
2020-10-30 DE in EUR: 25083, DE not in EUR: 0; EUR in DE: 25083, EUR not in DE: 1083
2020-10-31 DE in EUR: 18668, DE not in EUR: 0; EUR in DE: 18668, EUR not in DE: 1832
  • It seems that the content of the files is identical up to Oct 26 and from then on the EUR files are a superset of the DE files. For the files up to Oct 26 I also diffed the output and found a single line difference: - Region: EUR instead of - Region: DE. Of course this difference explains why the filehashes differ also for otherwise identical content.
  • I was not aware that foreign keys have been included into the DE files, so I suspect the following as speculation: The CWA team wanted to have warnings for foreign keys before they rolled out the new endpoint to a significant percentage of devices. So in a first step they packaged the content of what should be in the EUR files into the DE files to ensure they are distributed to all users and on the same time the identical content into the EUR files for users who already queried the new endpoint. Once enough users transitioned, they switched to a behavior where the files contain keys according to their regional description (i.e. DE files only keys from Germany and EUR from all of Europe where possible).
  • I'm unsure on when that switch happened, so I would have to find a date to switch on my dashboard as well in order to stay somewhat transparent.
    It is even more complicated if my speculation above is true: The switch to include european TEK happened earlier, but is masked by the fact that they are included into the DE files. The switch to distribute them with different regional files would have most likely been on Oct 27.
  • This brings the problem, that I did not crawl the EU endpoint in the past, meaning I do not have a full record of all packages. Do you know of an archive where I could download daily and hourly packages, otherwise my switch would be incomplete at best.
    Unfortunately I am not aware of such a resource. I just downloaded all available files from the past 14 days.
  • I'm quite swamped in real life right now.
    I can totally relate to that, this is a spare-time entertainment; just that there is no spare time...

Also the computed hashes from iOS match with those of the EUR files explaining the issue we had last week.

Referencing #3 so that this issue is displayed over there.

for the purposes of ena_log, I would be keen on having the counts updated in the filehashes.json file to those of the files from EUR endpoint

understandable. Should be possible, but would be great, if we had an archive of previously published packages in order to provide a complete set of all published packages from the EUR endpoint.

I agree, that would be most desirable. I am wondering who could help here. I have the cwa companion app running on one device, maybe it does not delete files after download (unlikely).

For reference: The same issue is also discussed here

from ctt.

kbobrowski avatar kbobrowski commented on August 24, 2024 1

Observation quoted by @mh- is for Android device, intersection of these two lists is done here but did not verify with MITM.

Interesting that iOS is fetching EUR, and Android does not seem to, not sure what is going on here

from ctt.

felixlen avatar felixlen commented on August 24, 2024 1

Thanks for the updated filehashes including the EUR files.

  • for the purposes of ena_log, I would be keen on having the counts updated in the filehashes.json file to those of the files from EUR endpoint.

This point is closed :)

from ctt.

janpf avatar janpf commented on August 24, 2024

The reasons I decided to ignore the EUR endpoint up until now and just do nothing:

  • I'm unsure, if the EUR endpoint provides a superset of the DE endpoint or in fact in which way the published packages of the two endpoints are related at all, since the DE endpoint does publish "foreign" keys. I'm honestly a bit confused and did not have the time to dive into this.

  • I'm unsure on when that switch happened, so I would have to find a date to switch on my dashboard as well in order to stay somewhat transparent.

  • This brings the problem, that I did not crawl the EU endpoint in the past, meaning I do not have a full record of all packages. Do you know of an archive where I could download daily and hourly packages, otherwise my switch would be incomplete at best.

  • I'm quite swamped in real life right now.

Also the computed hashes from iOS match with those of the EUR files explaining the issue we had last week.

Referencing #3 so that this issue is displayed over there.

for the purposes of ena_log, I would be keen on having the counts updated in the filehashes.json file to those of the files from EUR endpoint

understandable. Should be possible, but would be great, if we had an archive of previously published packages in order to provide a complete set of all published packages from the EUR endpoint.

from ctt.

janpf avatar janpf commented on August 24, 2024

I checked both files for the past 14 days with parse_keys.py of diagnosis-keys, parsed and compared the output

Thanks for your work!

It seems that the content of the files is identical up to Oct 26

That's great, so we would only need the files from Oct. 27 onwards, as all others would be identical.
So we would only need earlier files to calculate file hashes, which we could circumvent by just matching key counts, am I right?

I was not aware that foreign keys have been included into the DE files

I'm absolutely no expert here, that's only what I've been told:
https://github.com/mh-/diagnosis-keys/issues/12#issuecomment-714218260

for your dashboard: do you want to keep it displaying the uploaded keys from german user or do you want to switch to european users

I've thought more about adding a second dashboard for european keys. I would be down implementing that, but I think it would be a bit harder, since the parsing tool is currently focused on german risk vectors and honestly it gets a bit complicated keeping track of

  • which version is currently deployed server side
  • which country submits which keys with which vectors
  • which app version is used in a country atm, can it submit vectors with skipped dates, or not?
  • which keys were padded in which way
  • which keys from countries are "processed" by different server versions before getting published
  • what's the "pipeline" a key goes through from getting submitted in an app in a country to actually getting added in a "EUR" package
  • some issues I don't even know about as I don't follow the deployment cycle of servers and apps too closely

I think the only meaningful statistic I would feel comfortable publishing for a EUR dashboard right now would be "number of keys published/valid per day", since everything else would be guesswork from my end. And even this statistics would be guessed, as I don't know if the padding of all keys is consistent for the EUR endpoint or different per country.

If somebody has more insights and ideas on how to overcome most uncertainties I think I could be convinced to try to add such a dashboard in the future ;)

I suspect the following as speculation

reasonable assumptions. I agree.

from ctt.

felixlen avatar felixlen commented on August 24, 2024

Thanks a lot for your consideration!

That's great, so we would only need the files from Oct. 27 onwards, as all others would be identical.
So we would only need earlier files to calculate file hashes, which we could circumvent by just matching key counts, am I right?

Correct. Matching based on key counts works fine, so not having information on files prior to Oct 27 would not be an issue for me.

I've thought more about adding a second dashboard for european keys. I would be down implementing that, but I think it would be a bit harder, since the parsing tool is currently focused on german risk vectors and honestly it gets a bit complicated keeping track of

  • which version is currently deployed server side
  • which country submits which keys with which vectors
  • which app version is used in a country atm, can it submit vectors with skipped dates, or not?
  • which keys were padded in which way
  • which keys from countries are "processed" by different server versions before getting published
  • what's the "pipeline" a key goes through from getting submitted in an app in a country to actually getting added in a "EUR" package
  • some issues I don't even know about as I don't follow the deployment cycle of servers and apps too closely

I think the only meaningful statistic I would feel comfortable publishing for a EUR dashboard right now would be "number of keys published/valid per day", since everything else would be guesswork from my end. And even this statistics would be guessed, as I don't know if the padding of all keys is consistent for the EUR endpoint or different per country.

Do you mean a dashboard on european keys inclusive german or exclusive. I agree that it is sensible to have two dashboards, with the option inclusive german keys being the most easy to implement and the option exlusive german keys probably the more informative. But also the harder to get proper statistics on because of the uncertainties mentioned.

from ctt.

janpf avatar janpf commented on August 24, 2024

Do you mean a dashboard on european keys inclusive german or exclusive.

Up until now I only thought about an overall dashboard for the EUR packages "as published".
But removing the german keys from the european packages would be an interesting approach to add an additional statistic.

But also the harder to get proper statistics on because of the uncertainties mentioned.

The uncertainties exist for both approaches, as in both cases I would analyze a mixture of all european CWAs.

from ctt.

kbobrowski avatar kbobrowski commented on August 24, 2024

ok using MITM I verified that it's fetching from EUR, so maybe there was an update in config / app or I got something wrong.

They most likely turned on USE_EUR_KEY_PKGS env variable in recent release which indeed triggers this part of the code, enabling download from EUR and disabling DE.

from ctt.

Related Issues (9)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.