Reductio ad Absurdum – GDPR and Personal Data

GDPR does not live in a bubble and comes after, at least in the UK, the Data Protection Act of 1998 (DPA 1998). In response to the DPA 1998 the Information Commissioner’s Office (ICO) built a reference guide on, ‘What is Personal Data?’ which until it is surpassed will follow on into the reign of the General Data Protection Regulation (GDPR): Refernce Guide

Breaking down their thinking we find some fallacies that we need to face and or define. This will allow us to begin the process we all need to undergo so that we can comply with GDPR as well as with a moral obligation to maintain the privacy of the data subjects for which we hold data.

The ICO reference guide is constructed like a flow diagram moving from question to question and ultimately giving a steer if data held is personal or not. And while it starts off feeling a bit flippant bear with me and begin by imagining that you had a spreadsheet with the two biological genders listed on it, ‘Male’ and ‘Female’.

Can a living individual be identified from the data, or, from the data and other information [sic] your possession, or likely to come into your possession?

Now I think it would be stretching it to believe that having the two biological genders written on a spreadsheet would be considered a breach of the DPA 1998. However, with a phone book, I can get names, addresses and a reasonably accurate guess at gender. A phone book is in my possession so the answer is technically ‘yes’ these genders must be personal data as their use in conjunction with other information in my possession can identify an individual (actually a whole heap of individuals). I can’t see the flaw in my logic, but this is obviously wrong.

Does the data ‘relate to’ the identifiable living individual, whether in personal or family life, business or profession?

Well, gender is generally carried with an individual regardless of the context they are in so the answer would again be yes confirming this to be personal data.

Now there are 6 more questions that could be asked if the second question was a bit shady, so let’s continue to see how we fare:

Is the data ‘obviously about’ a particular individual?

I would argue no just specifically looking at biological gender. Using the two biological genders there will be several billion other living individuals in the world that would fit that classification. However in the context of other data in our possession, a phone book, we get name and address and a high probability of gender with which to link our biological gender. This then does in a lot of cases point out an individual.

For example, if you have Matthew Davis and Louise Davis living at the same address and you have the biological gender Male. Then with a highly complex machine learning algorithm (sarcasm), you could come to the conclusion that you have data that relates to a specific identifiable living individual Myself.

Is the data ‘linked to’ an individual so that it provides particular information about that individual?

When you are looking at individuals in a household and there are only two with obviously gendered names it would be reasonable to conclude that one is male and one is female identifying both individuals. We haven’t even started to make assumptions about these individuals like that they are part of a couple most likely married. I believe that kind of data processing would break the GDPR scale.

Is the data used, or is it to be used, to inform or influence actions or decisions affecting an identifiable individual?

Now, this was old DPA 1998 thinking where we worried about the use of the data in our decision making regarding personal data. This is not a factor as defined by GDPR. But for this example let us picture ourselves as insurance people. No judgment, but I feel dirtier already.

If we are deciding on life insurance coverage than this data would be used to affect decisions.

Does the data have any biographical significance in relation to the individual?

Gender seems to be a major part of a lot of individual’s lives so we could reasonably assume that yes this is biographically significant.

Does the data focus or concentrate on the individual as its central theme rather than on some other person, or some object, transaction or event?

This data wholeheartedly gets to the core of the individual.

Does the data impact or have the potential to impact on an individual, whether in a personal, family, business or professional capacity?

It might very well have a high impact on a person. Gender is a particularly deeply felt topic that has engendered a lot of debate.

General Data Protection Regulation

So these two biological genders fit the definition of personal data according to DPA 1998, but looking at GDPR we see that the definition of personal data is:

‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person

By this definition we still need to accept that ‘male’ and ‘female’ written on a spreadsheet, kept in a database, part of a system, or even written on a piece of paper within the confines of a private company can be used to identify a natural person indirectly through the use of a common data set (phone book).

Going further Article 9 defines gender as ‘sensitive’ personal data applying even more stringent rules.

Coming to a Point

This is obviously ridiculous. Nobody would be confused for a criminal if they were caught with a list of biological genders. But where is the cut-off point for the specificity of data? How do we look at a list of genders and say that this is laughable, yet a birthday is too specific? Or if not a birthday an address. By themselves gender, birthday, or address do not indicate individuals, they can narrow down, but not specify.

It is with the use of the phrase ‘indirectly identify’ that we turn our view to the wealth of data sets available to us that take data that might not otherwise be seen as personal data. From these data sets we can now almost not help but identify individuals. We appear to lack the tool or process that would allow us to define where information ceases to be about the individual and is just non-personal data.

So if there was a picture of me on 14 February of this year which depicted me with a cross marked on my forehead in black soot you might deduce that I had attended a Christian Ash Wednesday service. Now if this image was found on Facebook posted by my mother (not by the data subject themselves) GDPR would say that this is personal data (by strict interpretation). Logic would dictate that this does not make the date 14 February or even the religious holiday of Ash Wednesday off-limits data to hold even though it is associated with an identifiable individual through the photograph and the implied religious affiliation.

The point of this article is that there is, as yet, no formal tool that allows us to make that reasonable assumption. Personally, I think it a bit unconscionable that we would allow this lack of certainty or even a lack of ability to determine a reasonable stance while allowing every GDPR consultancy to chant “€20m or 4% of your annual turnover.”

As an alternative I would think that we could define a grace period for penalties (at least until the UK Data Protection Bill has been passed offering more definition), or start producing more specific guidance allowing businesses to prepare themselves without the dangling sword of crippling fines if they miss-step which is the prevailing message of consultancy firms.

