ICO consults on data protection and GenAI

The ICO is in the middle of a series of consultations looking at how data protection law should apply to generative AI.

Generative AI is the type of AI that allows machines to produce creative responses to prompts, such as answering natural language questions or creating images. It came to prominence when ChatGPT was launched in November 2022 and since then, organisations of all sizes have been thinking about how to realise its potential.

The ICO is looking at the following topics:

The lawful basis for web scraping to train generative AI models
This consultation closed on 1 March. It is of interest because the ‘large language models’ or LLMs used to train models create the enormous datasets required by collecting any information publicly available on the internet. There are a number of court cases ongoing at the moment where copyright holders are objecting to this, while the LLM creators are claiming it is allowable ‘fair use’ of the copyrighted material.
Purpose limitation in the generative AI lifecycle
This consultation closed on 12 April. When data is collected, a privacy notice has to be provided. The data then can’t be used for any other purposes unless those purposes are ‘compatible’ with the original purpose. When data is collected to train an LLM, this happens automatically and the data scraping tool won’t know what the original purpose was, so cannot know whether the LLM training purpose is compatible with it. However, Generative AI can only work if data is made available at this kind of scale.
Accuracy of training data and model outputs
This consultation is currently open and closes on 10 May. ‘Accuracy’ means something different in data protection and artificial intelligence contexts. In data protection, it means that the data is a true reflection of reality at the right level of detail for the purpose for which the data was collected. In artificial intelligence, it means a level of statistical probability about whether an inference is correct or not. These are clearly very different. It is difficult to describe model accuracy in a user-friendly way, and if an individual does not conform to the most statistically probable expectation of them, AI may not work as intended for them.
Information rights
This consultation is expected to launch in the next few weeks. It will be interesting because it is not clear how data protection rights can operate in the context of generative AI. Individuals have rights including: the right to be informed about how their data will be processed; the right to a copy of the personal data held about them and information about how it is used; the right to have a copy of certain types of data ‘ported’ to another organisation on request; the right to restrict how their data is processed and the right to have their personal data deleted when it is no longer needed. It is not currently clear how those rights operate in context of AI.

Controllership in AI
This consultation is expected to launch this summer. It will be interesting because the question of who is responsible for generative AI outputs is a very hot topic. Generative AI is designed to behave very flexibly and to ‘learn’ and change over time. By design, developers will not know how the tool will be used. However, users are unlikely to know how the tool works.

The consultation responses and the guidance the ICO produces will be very helpful to give Boards the assurance that they are using generative AI safely and lawfully and controlling risks appropriately. We would encourage anyone with an interest in generative AI to read the consultations and consider responding. The documents can be found here: ICO consultation series on generative AI and data protection | ICO

Authors

Camilla Winlo

Head of Data Privacy

Read Bio