When and how verifiable claims are used is important for correlation purposes. It is i

Thanks for the example, <a class="user-mention notranslate" data-hovercard-type="user"

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

Created new issue in use case repo: <a class="issue-link js-issue-link" data-error-tex

Correlatability of usage patterns,about w3c/vc-data-model

Comments (21)

jonnycrunch commented on May 23, 2024 2

Regarding the original issue of Correlatability: I see the issue as maintaining control over the selective disclosure of verifiable claims that may lead to Correlatability and understanding/accepting that risk. The Rx use-case is interesting and much too complex, because face it, when we are sick and/or dying we will give up our souls for the chance to live longer. Perhaps, this thread should continue with the more general discussion of Correlatabilty. More specific the use of Correlatability or K-means clustering of certain attributes for potential re-identification. The classic example was the re-identification of Gov. Weld from 'de-identified' patient discharge information in Massachusetts. In medicine, with regard to the potential re-identification of "de-identified" Personally Identifiable Information we often use k-anonymity methods to preserve privacy, whereas certain attributes are translated to more general ranges. Female form zip-code 37203 age 29 is translated to Female from area 372**, <20 age <=30 depending on the data set and calculated. I didn't find the NIST, IR 8062 very useful as it only described the typical governmental mantra of "Monitor -- Assess -- Respond" and policy responses. The worksheet would be helpful to create the strawman discussion. Although the verbage and concepts were helpful, the privacy risk equation wasn't terrible helpful. Rather, I think we should model the risk of re-identificaiton given the attributes and have a cohesive methodology that we use for each scenario/use-case. (For instance, the 29 year old female in 37203 who is buying alcohol or filling a Rx). The context of how many people live in area code 37203 and in that age range is about 30k. So rather than focus on the characteristics of a secure system as suggested in the PRAM, i think we should focus on the methods to calculate the risk for re-identification given the context of each of the selective disclosure of attributes.

from vc-data-model.

jandrieu commented on May 23, 2024 1

I'd like to step out a few layers to relate this to "Identity" because, in effect we are talking about how verifiable claims will be used, intentionally or otherwise, to correlate individuals' identities from one context to another.

It turns out that "identity" as used in technical discussions means two things, sometimes conflated in a subset as PII or "Personally Identifiable Information".

Verifiable claims will be used for both of these and there are different nuances for each in terms of correlation.

First, there are identifiers and attributes used explicitly for correlating individuals across contexts. Sometimes these identifiers are opaque, e.g., "anonymous" GUIDs used for tracking in third party cookies. Sometimes they are grounded in real-world identity systems such as legal name, driver's license #, etc.

Second, attributes are used for customizing services, such as a zip code for movie listings or using a name in a greeting on a page, separate from whether or not that name or zip code is used to correlate the individual with anything beyond the presentation of features.

Privacy issues occur when

people are correlated across contexts in ways they did not expect or desire, either by the original steward of such information or by third parties
attributes are shared with parties in unexpected or undesired ways

The two biggest gotchas, in my experience, are

Thinking "anonymous" identifiers resolve privacy issues. In fact, they are the root of third party cookie privacy concerns. They do help with minimizing undesired attribute sharing, but since correlation can be used to fuse identities from different data sets, there remains the likelihood of even anonymous identifiers leaking attributes through affiliation.
Imagining that there is a subset of attributes that, properly managed, address privacy issues. So called "Personally Identifiable Information" has been used as a framework for privacy, but it leaves open several core questions. https://en.wikipedia.org/wiki/Personally_identifiable_information There is no definitive agreement as to what is or isn't PII--in some cases this question has been addressed by the courts, but as discussed in the Wikipedia article, there are also statutory and standards-based definitions which demonstrate a variety of possible interpretations of that term. More critically, it has been repeatedly shown that even when using "anonymized" data sets with no innate PII, personal information and even real-world identities can be discovered. The AOL search data leak is perhaps the most famous of these examples https://en.wikipedia.org/wiki/AOL_search_data_leak .

Sometimes, the notion of "identity" and PII is taken to include any and all attributes related to an individual, regardless of how those attributes are used. This helps deal with the reality that even seemingly innocuous data can be used for deanonymizing and correlating individuals, but it turns your "identity" into everything which is almost useless as input to engineering a good system.

Rather than discussing this issue in terms of the title language "verifiable claims are used are important for correlation purposes", I would suggest that verifiable claims will be used to both

correlate individuals across contexts (in the case of common identifiers and shared attributes) and
minimize or prevent correlation (in the case of anonymous or tokenized identifiers).

Implementers should understand and communicate to their users how their particular identity systems correlate individuals, e.g., from session to session and with third party services, and how they actively prevent or minimize undesired and unexpected correlation.

The shift I'm going for here is that, contrary to a general push in the crypto world, correlation isn't inherently bad. In fact, correlation is the direct result of identification and "identity" is useful. While anti-correlation features of various technologies are great features, that focus has obscured the fact that there are times when we want and need to be correctly correlated with our rights and privileges. In a privacy respecting system, individuals would have maximum control over correlation, enabling intentional correlation where desired while preventing undesired correlations. The appropriate limits of that control are still up for debate, but every system using verifiable claims will necessitate design choices that impact how individuals are correlated across contexts.

from vc-data-model.

agropper commented on May 23, 2024 1

Privacy engineering can help sort through the different aspects of privacy in a use case. I’d like to relate this issue to the specific prescription use case that I maintain.

The highest privacy goal of this use-case is to maintain the self-sovereignty of the physician-patient relationship in the face of regulatory requirements. Specifically, we seek to enable a transaction system for a prescription that does not depend on institutional trust as an identity provider for the physician and the patient together. The privacy benefit is the ability for the physician and the patient to interact without having that interaction monitored by an intermediary institution such as a hospital. This opportunity for an un-mediated patient-physician relationship was common with paper prescriptions and has been lost in the transition to electronic health records.
In order to achieve this privacy goal, the role of the hospital intermediary as a combined root of trust for patient ID, physician ID, physician attribute, and transaction auditability needs to be distributed among various substitutable actors with a minimum of correlation risk across the actors.
Working backward, the physician claim must be verified against a directory operated by an issuer that has no role relative to the issuance of the physician ID or any role whatsoever relative to patient ID. The reason for this is that the directory operator does not want any responsibility for security breaches of any patient information and has no interest in sharing an identity provider used by the patient. For the physician ID, the directory provider does not want to bear responsibility for identity proofing the physician. Cost-effective operation of directories requires they trust identity providers. The directory operator is merely a relying party, using whatever identity the physician chooses. The federation implied by the physician’s IDP is responsible for identity proofing to a level adequate for prescribing controlled substances per DEA and access to patient records under state and federal privacy mandates.
The physician ID used to maintain the physician claim must be non-repudiable and able to:

Sign updates to the physician directory operated by the issuer prescription in an auditable way (e.g.: associated with a blockchain timestamp).
Sign a prescription for a specific patient in an auditable way.

The patient identity (but not necessarily the patient ID) captured in the prescription and presented to the pharmacy must be correlated in a non-repudiable way. The pharmacy transaction must be auditable.
The pharmacy must be able to verify the physician claim in a way that does not allow the pharmacy to correlate other transactions by the same physician. (The sale of physician prescribing info has been challenged in high courts and is allowed as free speech by the pharmacy. This causes a lot of privacy problems for the physician and the patient. It is the primary source of the huge data broker market in healthcare.)
The physician must be able to report the transaction to a (law-enforcement) registry that can track patient identity across different physician-patient relationships and physicians must be able to query this registry prior to issuing a prescription. The registry itself, as a law enforcement function, can have access to the identity of the physician and the patient. (These state-operated registries are called prescription drug monitoring programs.) The pharmacy must be able to verify that the prescription was reported to the registry (to keep the doctors honest). The pharmacy may have it’s own law enforcement registry reporting requirements but these are outside the physician-patient relationship privacy engineering issue.
Note that in most, maybe all, states, the pharmacy can deliver a prescription to the physician for the physician to distribute to the patient. In this somewhat inconvenient way, the privacy of the patient relative to the pharmacy can be absolute.

from vc-data-model.

jandrieu commented on May 23, 2024

Thanks for the example, @agropper. It's a solid example of where Verifiable Claims can help with privacy.

To help us move towards a more rigorous lexicon, I'd like to call this a "use domain" instead of a "use case." I'm hoping to establish a specific semantics for use cases:

A use case defines a specific value-creating transaction between an individual and the system.

A use domain defines a related set of use cases.

I'm still working through the best alternative language, but "use domain" or "domain of use" seems like a good way to describe this example, which include several transactions, as well as domain-specific non-functional requirements, such as both the correlatability and non-correlatability you outline.

From what I read, I tease out a few different transactions:

Issue prescription
Verify prescription
Present prescription
Audit pharmacy
Register prescription

There may also be transactions related to the credential that enables a doctor to prescribe as well as recording pharmacy interactions: requesting fulfilment of a prescription, fulfilling a prescription, etc., so we can understand the needs of the audit. As with many of these kinds of use, the trick is defining the coherent boundary so we can focus on the new and interesting bits. For example, one could discuss how all of the entities in the domain provision their credentials: the monitoring agencys, the pharmacies, the pharmacists, the insurance companies (surprisingly missing from your example). Clearly, taking some of these entities (and their credentialing) as a given greatly simplifies the documentation.

To try and tease out the correlatability:

Intended correlations:

The live person redeeming a prescription needs to be correlatable to the patient for whom the prescription was given, by the pharmacist, prior to distribution so that the medicine is given to the actual patient.
The patient needs to be correlatable to a singular legal person by the prescription drug monitoring program for the purposes of assuring that individuals are not getting multiple prescriptions by visiting multiple doctors. The physician needs to be able to query the program prior to issuing a prescription.
The physician and patients need to be correlatable across multiple prescriptions for physician audits.
A given prescription must be resolvable to a delivery address while preventing the pharmacy from correlating the doctor to the prescription. This resolution must be non-repudiable.
Upon delivery, a prescription must be correlatable by the issuing doctor to the patient

Blocked correlations:

The pharmacy must not be able to correlate the physician's prescriptions across different patients.
Someone who is not the intended patient must not be able to redeem a prescription (must not be falsely correlated as the patient).
The prescription may be redeemed at any pharmacy. There is no innate correlation between a given prescription and the pharmacy that fulfills it.

Do these transactions and correlations seem correct?

Questions:

If I assume for the sake of discussion that all of this information is stored in an effectively public repository--this assumption addresses both public ledgers and compromised data stores--then can we assume that certain information is encrypted for the intended recipient? For example is the prescription delivery address encrypted for a specific delivery service?
For correlation 3, who is doing the audit? How do we allow an audit without allowing the pharmacy to perform the same correlation? Are there baked in assumptions about where "auditable" data is stored that can be trusted to be secure from the Pharmacy? I don't think we care about pharmacies that are bad actors willing to hack a physician's database. For this use domain, it might be valuable to identify a strawman architecture that distinguishes who holds what data. I'm assuming that the monitoring program and the physician both have private data stores for audit purposes, while the rest of the data could be stored in a self-sovereign public ledger. (If insurance is involved, the pharmacy will probably need its own data store as well.) Or... is there a way that all of this information could be in a public data store?
For correlation 2, are we trusting the monitoring program to operate a secure, live system? Or is the goal to have that monitoring (and the doctor's query) based in a public ledger? In other words, along with question 2, can we clarify where, for this use domain, we need to trust a system (and its operator) with certain information and which systems we choose not to trust with certain information?
What about insurance companies? Are they an important part of the privacy engineering?
Does the pharmacy need to verify that the prescription has been registered with the monitoring program?

from vc-data-model.

msporny commented on May 23, 2024

@agropper @jandrieu Should we move @agropper's use case description to the use cases repository? This repository is about data model and specifically the privacy/correlatability section. We may want to split this discussion into two aspects: 1) The use case itself (put it in the use cases repo issue tracker), and 2) How this use case impacts the correlatability subsection in the privacy considerations section.

from vc-data-model.

jandrieu commented on May 23, 2024

Sounds good. I'll move the use domain over there for its own refinement.

One thing became clear to me in working through Adrian's use domain is the need for a strawman architecture for these kinds of use domains so that we can evaluate the privacy impact. For example, the main page for the Tahoe-LAFS has a simple diagram distinguishing what parts of the architecture must be trusted implicitly and which rely instead on cryptographic trust.

I'm reminded of Eben Moglen's testimony to congress in 2010:

These [Facebook] “privacy settings” merely determine what one user can see of another user’s private
data. The grave, indeed fatal, design error in social networking services like Facebook isn’t that Johnny can see Billy’s data. It’s that the service operator has uncontrolled access to everybody’s
data, regardless of the so-called “privacy settings.”

So, in order to understand how Verifiable Claims addresses privacy issues, I believe we will need to consider how they would operate within the context of various systems, each of which will have distinct trust boundaries and differing needs for information access.

Once we understand that, we can evaluate what the data model needs to support those use cases, and in particular, how verifiable claims improve privacy when used correctly.

from vc-data-model.

burnburn commented on May 23, 2024

So, in order to understand how Verifiable Claims addresses privacy issues, I believe we will need to consider how they would operate within the context of various systems, each of which will have distinct trust boundaries and differing needs for information access.

@jandrieu So, how do you suggest we proceed? In particular, is there anything you think might be productive to discuss in next week's call?

from vc-data-model.

agropper commented on May 23, 2024

Thanks @jandrieu You're structuring this is a useful way. The correlations and transactions seem correct.

Questions:
1 - Good question. I'm not sure what to say about encryption but I suspect security design will be evident as we go forward.

2 - I agree with your framing. I don't know the legal answer to who does routine audits. I would allow for a separate registry no matter what. The pharmacy is handling controlled substances and subject to audit by the DEA. I'm skeptical of storing anything other than timestamps in a public data store.

3 - Good point. I think we need to do both. Keep in mind that some states will require the querying physician to have a relationship with the patient and others will simply require they be a licensed practitioner.

4 - The insurance may need to be consulted for decision support and/or costs before the prescription is finalized by the physician. The pharmacy also needs insurance access, unless the patient pays cash - which is allowed by law. Once we create the "use domain" representation, we would do well to add insurance.

5 - Maybe. The monitoring programs are run at state level and can include the pharmacy. some states also mandate that physicians check the registry before prescribing controlled substances and we could imagine transactions that warn the physician or regulators if this is not done.

from vc-data-model.

jandrieu commented on May 23, 2024

Created new issue in use case repo: w3c/vc-use-cases#38

from vc-data-model.

msporny commented on May 23, 2024

Next steps is to do a privacy analysis on Adrian Gropper's use case listed in this issue. The people that volunteered are: @jandrieu @agropper @jonnycrunch @msporny @amigus

from vc-data-model.

burnburn commented on May 23, 2024

Discussed in 24 Jan 2017 telecon (link to minutes when available)

from vc-data-model.

stonematt commented on May 23, 2024

@jandrieu @agropper @jonnycrunch @msporny @amigus -- looking for an update on this issue.

from vc-data-model.

jandrieu commented on May 23, 2024

Note this has moved to the use cases repository: w3c/vc-use-cases#38

from vc-data-model.

msporny commented on May 23, 2024

This issue was originally about writing 2-3 paragraphs for the Data Model specification under the Privacy section related to Correlatability. I think we're going a bit overkill here - the analysis that @jandrieu and @agropper are doing is useful, but we probably don't need to wait on that to write a section for the specification. We just need a general idea for what sort of things you could put in a Verifiable Claim that would correlate you and to what degree.

What we need to resolve this issue is 2-3 paragraphs that describe why/how Verifiable Claims can lead to correlation based on usage patterns.

from vc-data-model.

amigus commented on May 23, 2024

@stonematt I'm going to lead the effort to put the use-case @agropper defined, through the NIST Privacy Risk Assessment Methodology based on NIST.IR.8062. I expect I'll need help from the other volunteers in the coming weeks.

from vc-data-model.

agropper commented on May 23, 2024

I'm trying to make the Rx use-case simple enough to enable a bit of privacy analysis with respect to correlation. A new version at https://docs.google.com/document/d/1l4d1_gvMeljbhCbWhxKpaPOk3A9-05sD3vD1V2PnNog/edit is simpler and may lend itself to a focused discussion. As far as de-identification, I find the topic is used mostly to justify lack of transparency in how personal data is used. The combination of two trends: many more public data sets plus more accessible machine intelligence, makes re-identification risk calculation plausible for only the most trivial of cases. Adrian

…

On Tue, Feb 14, 2017 at 10:50 AM, jonnycrunch ***@***.***> wrote: Regarding the original issue of Correlatability: I see the issue as maintaining control over the selective disclosure of verifiable claims that may lead to Correlatability and understanding/accepting that risk. The Rx use-case is interesting and much too complex, because face it, when we are sick and/or dying we will give up our souls for the chance to live longer. Perhaps, this thread should continue with the more general discussion of Correlatabilty. More specific the use of Correlatability or K-means clustering of certain attributes for potential re-identification. The classic example was the re-identification of Gov. Weld from 'de-identified' patient discharge information in Massachusetts. In medicine, with regard to the potential re-identification of "de-identified" Personally Identifiable Information we often use k-anonymity methods to preserve privacy, whereas certain attributes are translated to more general ranges. Female form zip-code 37203 age 29 is translated to Female from area 372**, <20 age <=30 depending on the data set and calculated. I didn't find the NIST, IR 8062 very useful as it only described the typical governmental mantra of "Monitor -- Assess -- Respond" and policy responses. The worksheet would be helpful to create the strawman discussion. Although the verbage and concepts were helpful, the privacy risk equation wasn't terrible helpful. Rather, I think we should model the risk of re-identificaiton given the attributes and have a cohesive methodology that we use for each scenario/use-case. (For instance, the 29 year old female in 37203 who is buying alcohol or filling a Rx). The context of how many people live in area code 37203 and in that age range is about 30k. So rather than focus on the characteristics of a secure system as suggested in the PRAM, i think we should focus on the methods to calculate the risk for re-identification given the context of each of the selective disclosure of attributes. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#12 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAIeYV4-p8WQWNUOAHniXkk0EDPNI13Tks5rcc1WgaJpZM4K92Lm> .

-- Adrian Gropper MD PROTECT YOUR FUTURE - RESTORE Health Privacy! HELP us fight for the right to control personal health data. DONATE: http://patientprivacyrights.org/donate-2/

from vc-data-model.

jandrieu commented on May 23, 2024

I just made a pull request with specific language.

One challenge I had was distinguishing the privacy of the holder separate from the subject. It's easier to discuss and reason about when the holder is presumed to be the subject, but that presumption is knowably false when dealing with delegated or guardian holders, e.g., claims about a child. Unfortunately, it also should probably never be presumed that the holder of a claim is guaranteed to be the subject of the claim. Maybe I'm missing something here, but I don't think we've teased out the issues of identity assurance between the

the holder of a claim (the digital holder, who has the JSON-LD or other serialization),
the presenter of a claim (the party actually asserting the claim to an inspector)
the subject of the claim

My understanding is that, for example, a parent could present to a DID and assert that DID applies to an individual who they claim is their child. In this case, all three of the above listed parties are different. I haven't seen any language addressing how we deal with the presenter's assertion that the subject of the claim is any particular individual, including themselves.

I realize some of this is protocol related and potentially out of scope, but I found my own language challenging to reconcile with the ambiguous relationship between the holder, the presenter, and the subject.

from vc-data-model.

agropper commented on May 23, 2024

I'm confused by the terminology. - Parent A presents Prescription for Sibject child B to Pharmacy. - A has a DID and B has a DID. - I don't see a third party. - The parent needs to show a driver's license because the Prescription is for a controlled substance. - The custodial relationship between A and B needs to be verified. Adrian

On Wed, Feb 22, 2017 at 12:39 PM Joe Andrieu ***@***.***> wrote: I just made a pull request with specific language. One challenge I had was distinguishing the privacy of the holder separate from the subject. It's easier to discuss and reason about when the holder is presumed to be the subject, but that presumption is knowably false when dealing with delegated or guardian holders, e.g., claims about a child. Unfortunately, it also should probably never be presumed that the holder of a claim is guaranteed to be the subject of the claim. Maybe I'm missing something here, but I don't think we've teased out the issues of identity assurance between the 1. the holder of a claim (the digital holder, who has the JSON-LD or other serialization), 2. the presenter of a claim (the party actually asserting the claim to an inspector) 3. the subject of the claim My understanding is that, for example, a parent could present to a DID and assert that DID applies to an individual who they claim is their child. In this case, all three of the above listed parties are different. I haven't seen any language addressing how we deal with the presenter's assertion that the subject of the claim is any particular individual, including themselves. I realize some of this is protocol related and potentially out of scope, but I found my own language challenging to reconcile with the ambiguous relationship between the holder, the presenter, and the subject. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#12 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAIeYdRyYU170LOj3WQOXMbW4hisAKb7ks5rfHK4gaJpZM4K92Lm> .

-- Adrian Gropper MD PROTECT YOUR FUTURE - RESTORE Health Privacy! HELP us fight for the right to control personal health data. DONATE: http://patientprivacyrights.org/donate-2/

from vc-data-model.

prototypo commented on May 23, 2024

@agropper It would be really helpful to align terms, such as "patient" with equivalent (or new, if necessary) Verifiable Claims terms (e.g. "entity").

It would also be helpful to surface all the implicit relationships in the prescription use case (e.g. the verifying employee at the pharmacy is authorised by the pharmacy to certify a prescription as being valid).

from vc-data-model.

agropper commented on May 23, 2024

Agreed. The format of the document is not ideal for a privacy analysis because the relationships are hard to follow. I tried to highlight the issues in my comment on #18 A more general format for how to deal with this eludes me but I suspect we need a couple of examples for privacy just like we have a couple of examples for how to code a claim. Simply listing 18 separate issues in sections 5 and 6 doesn't do it for me. Adrian

…

On Sat, Mar 11, 2017 at 10:49 PM, David Wood ***@***.***> wrote: @agropper <https://github.com/agropper> It would be really helpful to align terms, such as "patient" with equivalent (or new, if necessary) Verifiable Claims terms (e.g. "entity"). It would also be helpful to surface all the implicit relationships in the prescription use case (e.g. the verifying employee at the pharmacy is authorised by the pharmacy to certify a prescription as being valid). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#12 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAIeYbfrQe2b2RKhR6hJCkldI-A25L97ks5rk2tagaJpZM4K92Lm> .

-- Adrian Gropper MD PROTECT YOUR FUTURE - RESTORE Health Privacy! HELP us fight for the right to control personal health data. DONATE: http://patientprivacyrights.org/donate-2/

from vc-data-model.

msporny commented on May 23, 2024

@jandrieu did a PR for this issue and it was accepted into the spec. Closing the issue.

from vc-data-model.

Correlatability of usage patterns about vc-data-model HOT 21 CLOSED

Comments (21)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent