Purpose limitations in Big Data:is it a maintainable concept?

Richard Spoelstra

“You had to live – – did live, from habit that became instinct – – in the assumption that every sound you made was overheard, and, except in darkness, every movement scrutinized.”[i]  George Orwell wrote these dismal words in his book 1984.  The question that arises is:Have we arrived at the point where everything is monitored?  The answer is yes; Big Data can be used and is sometimes used to monitor everything.  Big Data is here.  There is no denying it and it is here to stay.  The questions that remain are:How bad is it and how can we regulate it?  While this article is too short to go in to all of the aspects that come with Big Data, it shall endeavour to inform you about a tiny aspect of it.  In this article, the principle of purpose limitation will be outlined.  A principle, which is fundamental in European data protection law.  A principle, which prohibits data controllers to abuse the data they collect.  While this article will primarily dive deeper into purpose limitation and its role in Big Data, a short introduction to the terms Big Data and purpose limitation will be given.  After that, we shall see how compatible purpose limitation is with Big Data and if anything will change with the coming ongoing EU data protection review where it is proposed to replace the current Directive with a Regulation..[ii]

The essentials
Big Data is the collecting and storing of massive amounts of data, which are then analysed to see if there are any correlating patterns.[iii]  While Big Data analyses are used in all sorts of areas, most of these analyses are not relevant for privacy and data protection laws.[iv]  The data that are relevant with regards toprivacy and data protection laws are personal data.[v]  This data can be used for “good”; Google has developed an algorithm that uses the terms that people google to Identify if there is a flu outbreak or not.[vi]  The data can also be used for “bad”; there is a case in the United States where Tesco, a supermarket chain, made a person’s family aware of her pregnancy due to targeted advertisement.[vii]

In the European Union data protection is currently governed by Directive 95/46/EC. Article 6 (1) (b) of the Directive lays down the principle of purpose specification.  It says that data must be collected for specified, explicit, and legitimate purposes.  Before the collecting of data can start, the controller must first know specifically what he wants to do with the data.  Explicit means that the data controller needs to inform others of the data processing.[viii]  The third criterion concerns the reason why data is going to be collected; this reason must be legitimate.and its legitimacy must go beyond that of article 7 of the directive.[ix]

Like purpose specification, compatible use is also laid down in article 6(1) (b).  It says that data collected shall not be processed further in a way incompatible with the purpose it was originally collected for.  Especially the fact that it prohibits incompatibility is important, because it allows for more freedom on the part of the controller.  Instead of having to prove compatibility, it only has to disprove incompatibility, which is a much less burdensome task.[x] To check for compatibility the working party 29 introduced four factors that could help assess compatibility, each of which will be listed and then combined to check how well they work with Big Data.  The first thing a controller should do is check the relation between the original purpose and the further processing.  This should primarily be done on a substance level and not on a textual level, as a purely textual analysis would certainly limit further processing much more than a test on a substance level.  The second is what the data subject could have reasonably expected from the collecting of his data.  Particular care should be given to the relation between the data subject and controller.  Did the subject consent to the data collecting or was it mandatory?  How is the balance of power between data subject and controller?  The nature of the data and the impact of further processing are also a very important criterion.  The more personal the data, the less likely further processing would be compatible with the original purpose.  The same goes for how the data is further processed.  Is it the same controller or a different one, how many people can see the data, and is the same amount of data being used or is it supplemented with other data?  The last criterion is that the controller should implement additional safeguards to ensure that further processing does not negatively affects the data subject.  These additional measures can then be used to mitigate some of the ‘damage’ that the lack of purpose limitation did at the beginning.  Such measures could include but are not limited to additional consent, opt in- opt out schemes and technical measures such as anonymisation and pseudonymisation.[xi]

The principle of purpose limitation was put into place to prevent data being used for other, creepier,[xii] purposes than the data was originally collected from and as such lose the data subjects trust.[xiii]

Purpose limitation in Big Data
As stated, Big Data is the collecting of a massive amount of data, which is then analysed for patterns. However, purpose limitation expressly prohibits this and as such will be a problem for the pure definition of Big Data.The example Professor Moerel used in her oration at Tilburg University is a perfect explanation to this effect.  She says that Google and its street view service first collected the photos and then it made the service.[xiv]  This will be the same for a lot of other Big Data projects, simply knowing what you want to do with the data beforehand is no longer always possible and if it was it would most times be near impossible to get everyone to consent.[xv]  The same will go for compatibility.  While the idea behind compatible use certainly broadens the playing field for data collectors, to abide by them would construe an undue burden on the collectors.  Most of these data are collected through automated means, which means that tons and tons of information are collected every day.  To check if this data is compatible cannot be reliably done by a machine, and because of the vast amount of these data neither can it be checked by a person.[xvi]  This burden will not make collectors stop with Big Data but instead make them stop abiding the law.[xvii]  Even if they want to, they simply cannot abide by all the rules for every piece of data they have.  Not because they do not want to but instead because they just cannot abide by all the rules for every piece of data they have.  Add to this that the guidelines given by the working party 29 are not always effective.  Take for example anonymisation and pseudonymisation as a way to minimise the negative effect further processing might have.  While this would in theory reduce the negative effect further processing might hold, because of the increase in data, anonymisation and pseudonymisation are less and less effective as re-identification becomes easier with every piece of data that is available.[xviii]  This, in turn, makes data protection laws lose its important function in society as it increases the gap between the formal interpretation of the law and the actual implementation of it.[xix]

The regulation
Having identified the problems with purpose limitation in relation to Big Data, one would assume that in the proposed regulation these problems would be adequately addressed. However, one would assume wrong.  Instead of opting for a less stringent set of criteria, the regulatory response was to increase it further.  While many provisions have been changed or adapted, only a few can be addressed in this brief article.  The new regulation addresses transparency as one of its key factors, which can be directly linked to purpose specification and the factors given by the working party to check for compatible use.  While noble in nature: making data subjects understand in what way and for what purpose their data is analysed will make them more readily consent to have their data processed.It will be problematic on two accounts.[xx]  First, no company is eager to share its processing algorithms and risks them being stolen.  Second, data subjects do not always understand the information that is given to them.[xxi]  Another point that is in correlation with the principle of purpose limitation is the Data Protection Impact Assessment (DPIA).  It requires companies before, and during, processing to assess what the impact of the processing is on the processing of personal data.[xxii]  This principle is probably going to work quite well as it requires data processors to think about why and what they are processing more extensively.  However some of the proposed provisions are going to be quite difficult to comply with given the nature of Big Data.  Take for example the requirement that a DPIA needs to be made in the case of processing of sensitive data, when does the use of Big Data involve the processing of sensitive data you cannot always know.[xxiii]  All in all the data protection regulation has kept with one of its key principles, one that will most certainly fail in its stride for the protection of personal data and in turn does the opposite of what it was created for.[xxiv]

Conclusion
The purpose limitation principle is just not adequately equipped to deal with Big Data.  Neither purpose specification, with its consent obligations, nor the compatibility doctrine, with its four pronged compliance test, do anything to help the data subject.  Instead, they seem to do quite the opposite by making it nearly impossible for the data controllers to comply with the law.  The proposed regulation should have fixed the concerns addressed by the people in the field but instead opted to increase the weight that this principle holds in data protection law as well as increasing the restrictions that are put on Big Data.  Leaving one way to go and that is through the cracks, which will result in less transparency, less protection, and less benefits for the data subject. In the end, one is left with a piece of legislation that is not adequately equipped to handleBig Data from the get go.


[i]G.Orwell, 1984, London: Secker and Warburg 1949.

[ii] 2012/0011 COD, Proposal for a  REGULATION OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL on the protection of individuals with regard to the processing of personal data and on the free movement of such data (General Data Protection Regulation). Currently there is the Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data. In 2012 the European Commission came with a proposal to amend the directive, which is currently in the process of going through the legislative machines.

[iii]Centre for information Policy Leadership, Big data and analytics: Seeking Foundations for Effective Privacy Guidance. A Discussion Document. (Found at http://www.hunton.com/files/Uploads/Documents/News_files/Big_Data_and_Analytics_February_2013.pdf), 2013, p. 1 and I.Rubinstein, ‘Big Data: The End of Privacy or a New Beginning’, Public Law & Legal Theory Research Paper Series 2012, Working Paper NO. 12-56, p. 1.

[iv]Take for example Wikipedia, there is many data on it, but privacy and data protection authorities do not care if data is collected about for example the ferocious winds on Jupiter.  Care needs to be taken here.  Some data might seem to be non-sensitive and non-personal data but when it is linked to another data, the combined result might suddenly become sensitive and personal data.

[v]For a short introduction on the concept of personal data see: R. Spoelstra, ‘Would you like some privacy with that, sir?´, Secjure 2014, Volume 29 Issue 1, p. 14 – 16

[vi] http://www.google.org/flutrends/intl/en_gb/about/how.html.

[vii] R. Cumbley& P. Church, ‘Is “Big Data” creepy?’, Computer law and Security review 2013, Volume 29 Issue 5, p. 603.

[viii]Currently this notification is laid down in article 10 and 18 of the directive. In article 10 it says that the data controller needs to inform the data subject about the purpose of processing for which the data is collected. Article 18 lays down the requirement of the data controller to notify the supervisory body of any data processing that takes place by wholly or partially automated means. Article 10 will come back in a new article 11. Article 18 will however chance into a prior authorisation requirement layed down in article 34.

[ix] Article 29 Data Protection Working Party, Opinion 03/2013 on purpose limitation, 00569/13/EN

[x]Examples are given in theArticle 29 Data Protection Working Party, Opinion 03/2013 on purpose limitation, 00569/13/EN WP 203, p. 22 – 23

[xi]Ibid, p. 20 – 27

[xii]Google can theoretically use its technology to biometrically identify each individual in its picture database, however the CEO of google said this was to creepy. R. Cumbley& P. Church, ‘Is “Big Data” creepy?’, Computer law and Security review 2013, Volume 29 Issue 5, p. 603.

[xiii]Article 29 Data Protection Working Party, Opinion 03/2013 on purpose limitation, 00569/13/EN WP 203, p. 4

[xiv] L. Moerel, Big Data Protection; How to Make the Draft EU Regulation on Data Protection Future Proof, (Oration Tilburg University), Tilburg: Tilburg University 2014, p. 54.

[xv] Ibid. A small note: consent is not always required, article 7 (b) to (f) lay down additional grounds on which data may be processed.

[xvi]Already a court in Italy ruled that checking if a  piece of data is sensitive implies a sematic variable judgement that cannot be delegated to an IT process.  http://www.jonesday.com/italian_appeals_court_overturns/.

[xvii]L. Moerel, Big Data Protection; How to Make the Draft EU Regulation on Data Protection Future Proof, (Oration Tilburg University), Tilburg: Tilburg University 2014, p. 53.

[xviii]Article 29 Data Protection Working Party, Opinion 05/2014 on Anonymisation Techniques,0829/14/EN WP216, p. 3.

[xix]R. Cumbley& P. Church, ‘Is “Big Data” creepy?’, Computer law and Security review 2013, Volume 29 Issue 5, p. 608.

[xx]I.Rubinstein, ‘Big Data: The End of Privacy or a New Beginning’, Public Law & Legal Theory Research Paper Series 2012, Working Paper NO. 12-56, p.5 – 6.

[xxi]Ibid, p. 2.

[xxii]Article 33 of the 2012/0011 COD, Proposal for a REGULATION OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL on the protection of individuals with regard to the processing of personal data and on the free movement of such data (General Data Protection Regulation).

[xxiii] C. Kuner, ‘The European Commission´s Proposed Data Protection Regulation: A Copernican Revolution in European Data Protection Law’ Bloomberg BNA Privacy & Security Law Report 2012, Febuary 2012, P 8.

[xxiv]I.Rubinstein, ‘Big Data: The End of Privacy or a New Beginning’, Public Law & Legal Theory Research Paper Series 2012, Working Paper NO. 12-56, p. 1.