A blueish grey grid of hexagons
Image credit: Brett Styles via Pexels

We have submitted written evidence to the Information Commissioner’s Office (ICO) on the legal basis for web scraping to train generative AI models.

The ICO, the UK’s independent body set up to uphold information rights in the UK, has called for evidence on the lawful basis for web scraping to train generative AI models.

Web scraping involves the use of automated software to ‘crawl’ web pages to gather, copy, or extract information from those pages. The information can be anything on a website – images, videos, text, contact details, etc. 

Most developers of generative AI rely on scraping publicly accessible sources for their training data. Developers either collect training data directly through web scraping, or indirectly from another organisation that has web-scraped data themselves.

The use of web scraped personal data raises significant questions for the feasible enforcement and application of data protection.

While we do not disagree with the ICO’s regulatory approach, in our submission we ask for more clarification in a number of areas.

For example, it does not seem likely that individuals placing personal data on the internet would expect their data to be subject to large-scale web scraping for the training of Gen AI at an unknown point in the future, muddying the legal picture. 

We also have concerns about the ability of web scrapers to differentiate between ordinary personal data and special category data, which have different legal protections. 

And, we are concerned that this data is being used to build Gen AI models without robust research into the potential effects on individuals and societal trust in these technologies. Therefore, we urge the ICO to issue guidance to support researcher access to data, to ensure that independent researchers are able to study these models before they are put into the market and throughout their lifecycle. 

Our written evidence submission was prepared by Dr Ann Kristin Glenster.