Copyright implications on AI training are a hot topic at the moment – and now we have a first hint on how German courts will assess the matter. On 11 July, the Hamburg Regional Court had a first hearing on what could be a future land mark case on copyright and AI training.
Background on LAION e.V. / Robert Kneschke
The case involves LAION a German non-profit dedicated to creating open-source AI models, and Robert Kneschke, a photo producer. Kneschke seeks to prevent LAION from using one of his images for AI training purposes – and the court had to assess the question whether and how he can do so as the copyright owner. Crucial points of the case are:
- Text and data mining (TDM) exception: German law allows text and data mining of copyright-protected works without the copyright owner’s consent. This is based on an EU regulation. The court now needs to decide whether the text and data mining exception applies to the training of AI models, which would make scraping of copyright protected content for AI training legal in general. While there is no definite answer of the court yet, it seems to lean towards a “Yes”.
- Machine readable opt-out: If the TDM opt-out applies to AI training, copyright holders can opt out of the use of their works by giving a machine-readable notice. The court now assesses whether a machine-readable notice requires a specific form (e.g. a machine-executable “robots.txt” file) or whether natural language is sufficient.
What this means for you
This case is crucial for AI startups which rely on scraped datasets, as it addresses the legal boundaries of using copyrighted material for training AI models. The outcome could set precedents for how AI companies handle data and copyright issue and will affect the way data for AI training needs to be sourced. Here’s what can be done already:
- Making use of TDM exceptions: In Germany, relying on the TDM exception still seems to be possible for anyone training an AI model. Caution: The exception doesn’t apply necessarily in other countries – especially Swiss startups should be a bit more careful how and where they perform their AI training activities.
- Respect machine-readable opt-out: When scraping training data, respect machine readable opt-out (“robots.txt” files, “NoAI” tags etc). LEXR is of the opinion that natural language opt-out of the copyright owner is not sufficient, and a machine-executable file is required for a legally binding AI training prohibition – but this will need to be confirmed by the court.
What next?
AI startup founders should closely follow this case and consider how copyright laws impact their training data sourcing practices. A decision of the court can be expected on 27.09.2024. If you want to ensure your training processes are copyright compliant now, our AI expert Thomas and his team are here to help with tailored AI compliance checks and workshops with your tech team. Contact us today to ensure your training data sourcing practices are compliant both under Swiss and German law.