Measuring Privacy
Tracy Ann Kosa, University of Ontario Institute of Technology Stephen Marsh, Communications Research Centre, Canada Khalil El-Khatib, University of Ontario Institute of Technology
[email protected]
(Official) Definition pri·va·cy/ˈprīvəәsē/Noun 1. The state or condition of being free from being observed or disturbed by other people. 2. The state of being free from public attention.
Privacy as Identifiability i·den·ti·fy (-dnt-f) v. i·den·ti·fied, i·den·ti·fy·ing, i·den·ti·fies v.tr. 1. To establish the identity of. 2. To ascertain the origin, nature, or definitive characteristics of. 3. Biology To determine the taxonomic classification of (an organism). 4. To consider as identical or united; equate. 5. To associate or affiliate (oneself) closely with a person or group. v.intr. To establish an identification with another or others.
What’s it about really?
Privacy differs depending on context: physical, territorial and informational Privacy is a mechanism for protecting information about identifiable people Privacy is about informational selfdetermination . . . . there are also a number of cultural and regional definitions
Problem •
•
•
Research across disciplines suffers because there is no unified mechanism for measurement Computer science has focussed on policy enforcement, ontologies and taxonomies The approach does not focus on the individual's privacy preferences in a given environment; which is how the legislation (requirements) are derived
Hypothesis
Disregarding the value-based approach to privacy, it is possible to derive a finite representation based on a discrete list of 3 different factors. We can use this representation to understand privacy better, and determine principles. Principles about privacy are not rules; they are observed phenomena from the psychological and sociological literature about privacy. Most are sensible enough to be included as rules for computing states of privacy.
The States 1. State 1 is Private: total privacy, and existence is unknown. It is unknown whether a person is present or not. 2. State 2 is Unidentified: only existence is known, for example, a shadowy figure can be seen but no identifying features can be determined. 3. State 3 is Anonymous: limited object and information may be known, but there is no link to a specific identity. Consider an anonymous donor to a named organization; there are a number of known data elements (donation amount, time, organization name), but a specific person cannot be derived from this data.
The States 4. State 4 is Masked: the object and information is known but linkages to an identity are concealed. The intent of the concealment may be deliberate or accidental. 5. State 5 is De identified: the object or information itself does not directly identify a person, but when linked with other objects or information the person may become known. 6. State 6 is Pseudonymous: defined objects and information are revealed but identified or associated by an assumed (incorrect) name. The level of identity assurance is the key distinction between this state and others.
The States 7. State 7 is Confidential: defined objects and information revealed to a defined person or organization acting in a certain role in a defined setting. 8. State 8 is Identified: the defined objects and information are capable of being distinguished, and named. Characteristics (objects and information) known to almost everybody with few or no control. 9. State 9 is Public: no private objects or information, complete openness. This is the least amount of privacy possible for an individual, where everything about them is available and assigned.
Factors 1. Human
Considerations identified by social psychologists used by individual in making decision about privacy
2. Technology
Services that computers can perform related to the managemetn of identifiable information
3. Data Types
Generally agreed upon types of personal information
Human
Human privacy rules are specific to the establishment; they are reflected in the physical structure and properties of society. Each individual has a social contact threshold which determines how they exercise their privacy rights. There are 16 human factors:
H Factors
Object
Subject matter?
Appearance
Of self, others
Choice
Is choice possible?
Control: Info
What info is disclosed?
Control: Audience
Who is present?
Control: Access
Who may have access?
Discretion
Is discretion possible?
Roles Established
Each party has a role?
Status
Social status of invader?
Common Bonds
Existing relationship?
Social Structure
Existing social relationship?
Social Condition
What kind of situation?
Ritual Type
What are the social rules?
Authority
Is there an authority figure?
Visibility
Defined physical space?
Public Expectation
Absence of expectations?
Data Types
The notion of privacy as information protection is well represented in legislation and regulation across the world. Less widely used is the notion of identifiability; that data exists that may or may include traditional identifiers (e.g. name) but may still uniquely identify a person.
Within the notion of identifiability, some types of data can reveal more personal information then others. With few notable exceptions a phone number is less privacy invasive then a unique identifier.
D Factors
Biographic
Demographic
Health
Education
Financial
Criminal
Identifiers
Behavioral
Technology
Computers are generally accepted to be an effective tool for information management; used to organize, retrieve, acquire and maintain information. As technology evolves it becomes cheaper and more convenient to store information for longer periods of time. Increasingly, machines can read information without human intervention. Yet for all the new technologies that continue to evolve, when it comes to managing information about an identifiable person, there are a discrete number of functions that computers can provide.
T Factors
Network
Hosting
Registration
Messaging
Backup
Software
Archiving
Websites / Portal
States of Privacy Model
Proposed Formalization
The State of privacy Sn; In any given environment is calculate from Human Factors H, Technology Factors T, Data Types D as:
Sn = w H f (H )+ wD f (D)+ wT f (T )
Where F represents any one of the set of factors H,T,D, then the total of a given factor set can be represented by
f (Factor) = (w1F1 + w2 F2 +... + wn Fn )
Where F1,F2...Fn each represent a factor under Factor. This is normalized to (0,1) The more positive the individual factors, the higher to total result of the factor set, the more likely the individual will move to a lower state of privacy, Sm>Sn
Transitions
An individual may move in different ways in along the finite state machine representing privacy; either forward or backward, and jumping ahead by multiple states depending on the weight of the factors. Original state Sn and the new (to-be) state is Sm, then the movement to a new state can be represented as Sm=Sn+[(wHH+ wTT + wDD + (action)]. In cases of actions taken to protect privacy the action will be negative.
Information Disclosures
Where Sn is the starting privacy state, this movement can be represented as Sm=Sn+R, where Sn
IDx(x,p) I disclose information about myself
IDy(x,p) A third party discloses information about me
IDx(x,i) I disclose information about my property or my objects IDy(x,p) A third party discloses information about my property or my objects
Information Protections (Reactive)
Where Sn is the starting privacy state, this movement can be represented as Sm = Sn+B, where Sn>Sm, and B= (EP1+EP2) The backward transition is the result of implementing a reactive information protection; there must be some action that protects privacy resulting in a mitigation of the sum of the factors. There are two possible actions:
EP1 Redact Information
EP2 Create Legal Agreement
Test 1
The model will run in the background of a defined online action, e.g. sending an email. Based on the define factor set, the model will calculate and display a specific privacy state for the online action. A query will run asking the user if the state seems reasonable. A second query will run asking the user to respond to a short series of questions intended to assess decision-making criteria, based on H factors. Human decision making factor weights represented by w1H1, w2H2, w3H3, … w16H16 will be adjusted (if necessary) based on user feedback.
Anticipated Results
The output of the numerical privacy state achieves two objectives: 1. Informing the individual about their own state of privacy in a given online environment, and, 2. Standardizes the metric for evaluating privacy.
Future Work
Finalize factor sets
Weight individual factors
Create more tests