Home
>
Our Thinking
>
Publications
>
The eData Guide to GDPR: Anonymization and Pseudonymization Under the GDPR

Publications

Insight

Anonymization and Pseudonymization Under the GDPR

The eData Guide to GDPR

December 09, 2019

The EU General Data Protection Regulation (GDPR) regulates the use of personal data collected from European data subjects, including activities of non-European companies that target or process European data subject personal data. Compliance with the regulation’s requirements can be challenging for many organizations and its potential fines daunting. This installment of The eData Guide to GDPR examines two methods for compliance.

Companies that regularly do business in Europe understandably look for ways to lessen their burden regarding compliance with the GDPR. Two potential methods are worthy of examination.

Anonymization eliminates personal data so that data subjects can no longer be identified. Anonymized data is excluded from GDPR regulation altogether because anonymized data is no longer “personal data.”

Pseudonymization replaces personal identifiers with nonidentifying references or keys so that anyone working with the data is unable to identify the data subject without the key. This type of data may enjoy fewer processing restrictions under the GDPR.

The challenge of using either method is feasibility: whether a method is practical, able to achieve compliance, or ultimately less burdensome than other available means of processing or transferring data under the GDPR, and how to apply such method.

Anonymization

According to GDPR Recital 26, anonymized data does not fall within the GDPR at all because data is no longer considered “personal data” following anonymization:

The principles of data protection should therefore not apply to anonymous information, namely information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable.[1]

To achieve anonymization under GDPR, re-identification of a data subject (even by the company that anonymized the data[2]) must be impossible. This sounds simple, but it is actually not when considering whether the exposed data (what remains after anonymization) can be paired with other available information to deduce the identity of data subjects. Recital 26 states:

To determine whether a natural person is identifiable, account should be taken of all the means reasonably likely to be used, such as singling out, either by the controller or by another person to identify the natural person directly or indirectly. To ascertain whether means are reasonably likely to be used to identify the natural person, account should be taken of all objective factors, such as the costs of and the amount of time required for identification, taking into consideration the available technology at the time of the processing and technological developments.[3]

Achieving this level of anonymity is made more difficult by guidance published by the European Data Protection Board,[4] which states that the test for whether a person is “identifiable” does not necessarily mean that an actual subject is identified—just that it would be possible to identify the subject:

In general terms, a natural person can be considered as “identified” when, within a group of persons, he or she is “distinguished” from all other members of the group. Accordingly, the natural person is “identifiable” when, although the person has not been identified yet, it is possible to do it . . . .[5]

According to the guidelines, this possibility extends to “indirect” identification, i.e., if identification could be possible by using pieces of information to narrow down the group to which the person belongs (age, occupation, place of residence, IP address, etc.).[6]

The International Association of Privacy Professionals (IAPP) has also provided guidance on the definition of anonymization, which states that achieving true anonymization may be nearly impossible—especially considering that a recent study showed that the majority of the United States’ population could be personally identified using three data points (zip code, date of birth, and gender).[7]

Similarly, the United Kingdom Information Commissioner’s Office (ICO) provides a warning regarding companies attempting to meet the GDPR’s standard for anonymization, stating:

[Y]ou should exercise caution when attempting to anonymise personal data. Organisations frequently refer to personal data sets as having been “anonymised” when, in fact, this is not the case. You should therefore ensure that any treatments or approaches you take truly anonymise personal data. There is a clear risk that you may disregard the terms of the GDPR in the mistaken belief that you are not processing personal data. In order to be truly anonymised under the GDPR, you must strip personal data of sufficient elements that mean the individual can no longer be identified. However, if you could at any point use any reasonably available means to re-identify the individuals to which the data refers, that data will not have been effectively anonymised but will have merely been pseudonymised. This means that despite your attempt at anonymisation you will continue to be processing personal data. You should also note that when you do anonymise personal data, you are still processing the data at that point.[8]

Ireland’s Data Protection Commission has issued a similar warning regarding attempts to use anonymization as a way to circumvent the GDPR:

There is a lot of research currently underway in the area of anonymisation, and knowledge about the effectiveness of various anonymisation techniques is constantly changing. It is therefore impossible to say that a particular technique will be 100% effective in protecting the identity of data subjects . . . .[9]

Ireland also provides helpful analysis of three possible anonymization strategies:

Randomization

This technique can involve either of the following:

Alternating data to cut the link between the individual and the data, without losing the value of the data: “For example, in a database which records the height of individuals, small increases or decreases could be made to the height of each data subject, and the data can be stated to be accurate only within the range of the additions and subtractions.”[10]
Swapping certain data between the records of individuals, making it more difficult to identify the individuals involved: “For example, in the case of the height of individuals . . . the height values for different individuals is moved around, so that is no longer connected to other information about that individual.”[11]

Generalization

This technique involves reducing the granularity of the data, so that only less precise data is disclosed: “For example, a data base containing the age of data subjects might be adjusted so that it is only recorded what band of ages an individual falls within (e.g. 18-25; 25-35; 35-45; etc.).”[12]

Masking

This technique involves removing obvious or direct personal identifiers from data. However, masking alone contains a very high risk of identification so it is not considered anonymization by itself.[13]

Even if an organization were able to perform generalization or randomization on a dataset and truly anonymize data so that it would not be possible for anyone at the time to re-identify a data subject, the Ireland Data Protection Commission warns that the data might not remain anonymized in the future:

It is not possible to say with certainty that an individual will never be identified from a dataset which has been subjected to an anonymisation process. It is likely that more advanced data processing techniques than currently exist will be developed in the future that may diminish any current anonymisation techniques. It is also likely that more datasets will be released into the public domain, allowing for cross comparison between datasets. Both of these developments will make it more likely that individual records can be linked between datasets in spite of any anonymisation techniques employed, and ultimately that individuals can be identified.[14]

Given the available guidance, it appears that while anonymization may be a useful data minimization strategy under the GDPR, it should not be considered a technique for avoiding GDPR regulation altogether.

Pseudonymization

Unlike anonymization, pseudonymized data falls within the GDPR’s regulatory reach. This technique can also be an effective security measure to help companies comply with GDPR data minimization standards. For example, Article 25 lists pseudonymization as an “appropriate technical and organizational measure” to meet the requirements of the GDPR.[15] Likewise, Recital 78 lists pseudonymizing data as a method that can be used to meet the GDPR’s principals of “data protection by design and data protection by default.”[16]

Pseudonymized data also enjoys more freedom under the GDPR than non-pseudonymized, fully identified personal data. For instance, Article 6(4) of GDPR lists pseudonymization (and encryption) as a possible exception to the general rule that a controller cannot process data for a purpose other than for which it had been collected.[17]

Also unlike anonymization, GDPR contemplates that the data subject may be re-identified using additional information held by another party. Article 4 defines pseudonymization as “[t]he processing of personal data in such a way that the data can no longer be attributed to a specific data subject without the use of additional information.”[18]

The ICO states this can be achieved by, for example, replacing names or other identifiers with a reference number. A person may be able to tie the reference number back to the individual if they had access to the relevant information (the key), but a company could achieve pseudonymization by putting technical and organizational measures in place to ensure that this additional information is held separately.[19]

The ICO also provides a helpful example of effective pseudonymization under the GDPR:

A courier firm processes personal data about its drivers’ mileage, journeys and frequency in order to charge their customers for the service and to process expense claims for the mileage. Identifying the individual drivers is necessary for these purposes.

However, another team within the firm also uses the data to optimize efficiency. The identification of the individual couriers is NOT necessary for this purpose. Therefore, the firm ensures that the second team of people only has access to the data in a form that it makes it impossible to identify the individual couriers (by replacing names, job titles, location with non-identifying reference numbers, for example). The firm also ensures that the second team only has access to the pseudonymized information and does not hold any information that would allow that team to link the data back to a particular data subject.

The above example is considered pseudonymizing, rather than anonymizing, because while one team does not have access to information that could re-identify the data subjects, the additional information is still located within the company (making re-identification possible).

Conclusion

Anonymization and pseudonymization are both important data minimization techniques under the GDPR, and both can be used to help companies protect the personal data they hold, whenever feasible. They are not, however, a panacea. Anonymization and pseudonymization are still considered as “data processing” under the GDPR—therefore, companies must still comply with Article 5(1)(b)’s “purpose limitation” before attempting either data minimization technique.

While truly “anonymized” data does not, by definition, fall within the scope of the GDPR, complying with the definition is so rigorous that a data controller should be extremely cautious before attempting to use anonymization as a way to circumvent the GDPR completely.

[1] GDPR Recital 26.

[2] IAPP, Looking to Comply with GDPR? Here's a Primer on Anonymization and Pseudonymization, Apr. 27, 2019.

[3] GDPR Recital 26

[4] Formerly the Article 29 Working Party on Data Protection.

[5] Article 29 Working Party on Data Protection, Opinion 4/2007 on the Concept of Personal Data.

[6] Id.

[7] See supra note 2.

[8] ICO, What Is Personal Data?

[9] Ireland Data Protection Commission, Guidance Note: Guidance on Anonymisation and Pseudonymisation.

[10] Id.

[11] Id.

[12] Id.

[13] Id.

[14] Id.

[15] GDPR Art. 25, Data Protection by Design and by Default.

[16] GDPR Recital 78.

[17] GDPR Art. 6(4): “Where the processing for a purpose other than that for which the personal data have been collected is not based on the data subject's consent or on a Union or Member State law which constitutes a necessary and proportionate measure in a democratic society to safeguard the objectives referred to in Article 23(1), the controller shall, in order to ascertain whether processing for another purpose is compatible with the purpose for which the personal data are initially collected, take into account, inter alia . . . (e) the existence of appropriate safeguards, which may include encryption or pseudonymisation.”

[18] GDPR Art. 4(5).

[19] See supra note 8.