mozilla / data-review Goto Github PK
View Code? Open in Web Editor NEWTemplates for Firefox data collection review process (https://wiki.mozilla.org/Firefox/Data_Collection)
License: Mozilla Public License 2.0
Templates for Firefox data collection review process (https://wiki.mozilla.org/Firefox/Data_Collection)
License: Mozilla Public License 2.0
Would be good to have as many links as previous reviewers have collected. I know that "canonical list" is unrealistic, but anything helps
Users have come to me for clarification about how to file for Data Collection Review. The topics of confusion usually manifest as questions like
I think this stems from a couple of wording choices and some perceived rigidity. For instance "clone" means something in git, so its use for the form suggests that users should fork the repo or create a gist or something, which is in conflict with the following substeps.
I propose a rewording:
To request a review, Data Review requesters require the following:
Bikeshed: begin
For Firefox measurement exclusively at this point, consider amending the existing data collection request forms with a question directed at providing a description of how the measurement should behave in private browsing mode.
As described in issue #2, the current request form is designed for too general a use case. We would like to revise the current process to reflect the need to have data collection requests variants.
If a probe is being renewed and the expiration is being changed to forever, then the requester needs to specify a responsible individual.
The request.md
form has this:
Lines 44 to 50 in fe0b96a
7) How long will this data be collected? Choose one of the following:
* This is scoped to a time-limited experiment/project until date MM-DD-YYYY.
* I want this data to be collected for 6 months initially (potentially renewable).
* I want to permanently monitor this data. (put someone’s name here)
The request_renewal.md
form has:
data-review/renewal_request.md
Line 11 in fe0b96a
2) When will this collection now expire?
I claim the two should be in sync and we should use the same language.
FYI: The following changes were made to this repository's wiki:
These were made as the result of a recent automated defacement of publically writeable wikis.
Copying/pasting the form into a bugzilla comments results in unformatted text (no numbers, no bullet points, etc.). Can we get a version that's easy to use in bugzilla?
tl;dr This form imposes on everybody the burden of the process necessary for the most dramatic cases, which adds more work to an already complicated procedure.
First of all, I'm proud that we at Mozilla have managed to create an internal culture of seeing user privacy protection as one of our central principles (and in fact my job mostly revolves around protecting user privacy in Firefox). Thank you for your work on data stewardship.
Adding a telemetry probe has never been quick and easy. The simple technical limitation that it can't be done using artifact builds, the (rightfully) short expiration time and the data steward review process have provided some necessary overhead that I anecdotally know have discouraged the addition of some "trivial" telemetry probes so far.
This form complicates things even further.
I know that the intention behind this is good (if I understand it correctly it's intended to be like the uplift request comment template on Bugzilla). In fact I was always a bit uncertain how to properly request data-review, so I can get fully behind a more formalized process.
But the questionnaire in its size and tone (and the fact that it's not a Bugzilla comment template) makes me urgently want to do something other than add a new telemetry probe to Firefox.
Excerpts I'm skeptical about (note that these complaints are specifically about adding telemetry to Firefox, this looks like it could be used for other things as well):
All questions are mandatory.
Doesn't set a great mood. "Please fill out all questions"?
- What alternative methods did you consider to answer these questions? Why were they not sufficient?
For harmless data, this question feels inappropriate. This question should only be asked when the data we are collecting is in fact category 3 or 4 data. For other categories the honest and correct answer to this is "we didn't consider alternative methods, why should we?". This kind of question should be left to the data reviewer, IMO.
- Can current instrumentation answer these questions?
I probably just misunderstand, but what's the difference to number 3?
- List all proposed measurements and indicate the category of data collection for each measurement, using the Firefox data collection categories on the found on the Mozilla wiki.
Measurement Description | Data Collection Category | Tracking Bug #
The table there shows a "Tracking Bug #", what is that supposed to mean?
- How long will this data be collected? Choose one of the following:
Firefox telemetry has an expiration version on every probe.
- What populations will you measure?
Have data stewards historically had troubles finding out about this? (Honest question, this might be a good thing to ask, I would just like to find out while I'm here).
- Please provide a general description of how you will analyze this data.
Why bother? How would my answer influence the decision? Almost all the data is public, anyone can do any kind of analysis with it after it gets recorded, right?
- Where do you intend to share the results of your analysis?
See 8., is this question necessary for public data?
A reader writes in noting some confusion on the following paragraph:
Release: Default off. May be eligible for opt-out on a case-by-case basis if mitigations are identified. Mitigations may include UX changes that make users aware of additional risk, technical mechanisms that remove the risk, or a risk assessment done of a case-by-case basis that determines the risk is limited.
The first and easiest to remedy is "risk assessment done of a case-by-case basis". This should probably be "on a case-by-case basis".
The second is that the reader wasn't clear on why, if it's default off, that we say it is eligible for opt-out. To use their words directly:
If the default is off, shouldn't the option be to opt-in instead of opt-out, or do I misunderstand something?
I think we can restructure this slightly to make it clear that "Default off" is not only the default setting but is the default posture for collecting Category 3 data on release. We can maybe use formatting or word choice to highlight that mitigations to allow opt-out are an exception, not a clarifying statement.
Question 8 reads "If this data collection is default on, what is the opt-out mechanism for users?"
The most common answer I want to see from people is "The Firefox Data Collection Preference" (because I get a lot of bog-standard Telemetry requests). Sometimes requesters are confused and ask if they need to implement something separate from the global telemetry opt-out if they're adding opt-out telemetry (they don't. Please don't do this.)
Shall we add this as a preferred/recommended option?
As of January 1 2019, Mozilla requires that all GitHub projects include this CODE_OF_CONDUCT.md file in the project root. The file has two parts:
If you have any questions about this file, or Code of Conduct policies and procedures, please see Mozilla-GitHub-Standards or email [email protected].
(Message COC001)
https://wiki.mozilla.org/Firefox/Data_Collection#Step_1:_Submit_Request doesn't say which product + component I should use to file the request.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.