COMMUNIA Association - Europe should ensure more transparency across the copyright ecosystem. https://communia-association.org/policy-recommendation/europe-should-ensure-more-transparency-across-the-copyright-ecosystem/ Website of the COMMUNIA Association for the Public Domain Mon, 18 Dec 2023 12:24:00 +0000 en-US hourly 1 https://wordpress.org/?v=6.4.2 https://communia-association.org/wp-content/uploads/2016/11/Communia-sign_black-transparent.png COMMUNIA Association - Europe should ensure more transparency across the copyright ecosystem. https://communia-association.org/policy-recommendation/europe-should-ensure-more-transparency-across-the-copyright-ecosystem/ 32 32 An AI Christmas Miracle https://communia-association.org/2023/12/18/an-ai-christmas-miracle/ Mon, 18 Dec 2023 08:30:41 +0000 https://communia-association.org/?p=6455 With Christmas fast approaching, on December 8, the European Parliament wrapped up one of its biggest presents of the mandate: the AI Act. A landmark piece of legislation with the goal of regulating Artificial Intelligence while encouraging development and innovation. In sticking with the holiday theme, the last weeks of the negotiations have included everything […]

The post An AI Christmas Miracle appeared first on COMMUNIA Association.

]]>
With Christmas fast approaching, on December 8, the European Parliament wrapped up one of its biggest presents of the mandate: the AI Act. A landmark piece of legislation with the goal of regulating Artificial Intelligence while encouraging development and innovation. In sticking with the holiday theme, the last weeks of the negotiations have included everything from near-breakdowns of the discussions, not too dissimilar to the explosive dynamics of festive family gatherings, and 20+ hour trilogue meetings, akin to last-minute christmas shopping. But alas, it is done.

One of the key priorities for COMMUNIA was the issue of transparency of training data. In April, we issued a policy paper calling the EU to enact a reasonable and proportional transparency requirement for developers of generative AI models. We have followed the work up with several blogposts and a podcast, outlining ways to make the requirement work in practice, without placing a disproportionate burden on ML developers.

From our perspective, the introduction of some form of transparency requirement was essential to uphold the legal framework that the EU has for ML training, while ensuring that creators can make an informed choice about whether to reserve their rights or not. Going by leaked versions of the final agreement, it appears that the co-legislators have come to similar conclusions. The deal introduces two specific obligations on providers of general-purpose AI models, which serve that objective: an obligation to implement a copyright compliance policy and an obligation to release a summary of the AI training content.

The copyright compliance obligation

In a leaked version, the obligation to adopt and enforce a copyright compliance policy reads as follows:

[Providers of general-purpose AI models shall] put in place a policy to respect Union copyright law in particular to identify and respect, including through state of the art technologies where applicable, the reservations of rights expressed pursuant to Article 4(3) of Directive (EU) 2019/790

Back in November, we suggested that instead of focussing on getting a summary of the copyrighted content used to train the AI model, the EU lawmaker should focus on the copyright compliance policies followed during the scraping and training stages, mandating developers of generative AI systems to release a list of the rights reservation protocols complied with during the data gathering process. We were therefore pleased to see the introduction of such an obligation, with a specific focus on the opt-outs from the general purpose text and data mining exception.

Interestingly, the leaked version contains a recital on which the co-legislators declare their intent to apply this obligation to “any provider placing a general-purpose AI model on the EU market (…) regardless of the jurisdiction in which the copyright-relevant acts underpinning the training of these foundation models take place”. While one can understand why the EU lawmakers would want to ensure that all AI models released in the EU market respect these EU product requirements, the fact that these are also copyright compliance obligations, which apply previously to the release of the model in the EU market, would raise some legal concerns. It is not clear how the EU lawmakers intend to apply EU copyright law when the scrapping and training takes place outside the EU borders without an appropriate international legal instrument.

The general transparency obligation

The text goes on to require that developers of general-purpose AI models make publicly available a sufficiently detailed summary about the AI training content:

[Providers of general-purpose AI models shall] draw up and make publicly available a sufficiently detailed summary about the content used for training of the general-purpose AI model, according to a template provided by the AI Office

While we have previously criticized the formulation “sufficiently detailed summary” due to the legal uncertainty it could cause, having an independent and accountable entity draw-up a template for the summary (as we defended in here) could alleviate some of the vagueness and potential confusion.

We were also pleased to see that the co-legislators listened to our calls to extend this obligation to all training data. As we have said before, on the one hand introducing a specific requirement only for copyrighted data would add unnecessary legal complexity, since ML developers would first need to know which of their training materials are copyrightable, and on the other hand knowing more about the data that is feeding models that can generate content is essential for a variety of purposes, not all related to copyright.

We should also highlight that the co-legislators appear to have a similar understanding to ours in terms of how compliance with the transparency requirement could be achieved when the AI developers use publicly available datasets. In the leaked version there is a clarifying recital stating that “(t)his summary should be comprehensive in its scope instead of technically detailed, for example by listing the main data collections or sets that went into training the model, such as large private or public databases or data archives, and by providing a narrative explanation about other data sources used.”. When the training dataset is not publicly accessible, we maintain that there should be a way to ensure conditional access to the dataset, namely through a data trust, to confirm legal compliance.

Taking these amendments into account, the compromise found by the co-legislators manages to strike a good balance between what is technically feasible and what is legally necessary.

Merry Christmas!

The post An AI Christmas Miracle appeared first on COMMUNIA Association.

]]>
Statement on Transparency in the AI Act https://communia-association.org/2023/10/23/statement-on-transparency-in-the-ai-act/ Mon, 23 Oct 2023 18:12:49 +0000 https://communia-association.org/?p=6370 A fifth round of the trilogue negotiations on the Artificial Intelligence (AI) Act is scheduled for October 24, 2023. Together with Creative Commons, and Wikimedia Europe, COMMUNIA, in a statement, calls on the co-legislators to take a holistic approach on AI transparency and agree on proportionate solutions. As discussed in greater detail in our Policy […]

The post Statement on Transparency in the AI Act appeared first on COMMUNIA Association.

]]>
A fifth round of the trilogue negotiations on the Artificial Intelligence (AI) Act is scheduled for October 24, 2023. Together with Creative Commons, and Wikimedia Europe, COMMUNIA, in a statement, calls on the co-legislators to take a holistic approach on AI transparency and agree on proportionate solutions.

As discussed in greater detail in our Policy Paper #15, COMMUNIA deems it essential that the flexibilities for text-and-data mining enshrined in Articles 3 and 4 of the Copyright in the Digital Single Market Directive are upheld. For this approach to work in practice, we welcome practical initiatives for greater transparency around AI training data to understand whether opt-outs are being respected.

The full statement is provided below:

Statement on Transparency in the AI Act

The undersigned are civil society organizations advocating in the public interest, and representing  knowledge users and creative communities.

We are encouraged that the Spanish Presidency is considering how to tailor its approach to foundation models more carefully, including an emphasis on transparency. We reiterate that copyright is not the only prism through which reporting and transparency requirements should be seen in the AI Act.

General transparency responsibilities for training data

Greater openness and transparency in the development of AI models can serve the public interest and facilitate better sharing by building trust among creators and users. As such, we generally support more transparency around the training data for regulated AI systems, and not only on training data that is protected by copyright.

Copyright balance

We also believe that the existing copyright flexibilities for the use of copyrighted materials as training data must be upheld. The 2019 Directive on Copyright in the Digital Single Market and specifically its provisions on text-and-data mining exceptions for scientific research purposes and for general purposes provide a suitable framework for AI training. They offer legal certainty and strike the right balance between the rights of rightsholders and the freedoms necessary to stimulate scientific research and further creativity and innovation.

Proportionate approach

We support a proportionate, realistic, and practical approach to meeting the transparency obligation, which would put less onerous burdens on smaller players including non-commercial players and SMEs, as well as models developed using FOSS, in order not to stifle innovation in AI development. Too burdensome an obligation on such players may create significant barriers to innovation and drive market concentration, leading the development of AI to only occur within a small number of large, well-resourced commercial operators.

Lack of clarity on copyright transparency obligation

We welcome the proposal to require AI developers to disclose the copyright compliance policies followed during the training of regulated AI systems. We are still concerned with the lack of clarity on the scope and content of the obligation to provide a detailed summary of the training data. AI developers should not be expected to literally list out every item in the training content. We maintain that such level of detail is not practical, nor is it necessary for implementing opt-outs and assessing compliance with the general purpose text-and-data mining exception. We would welcome further clarification by the co-legislators on this obligation. In addition, an independent and accountable entity, such as the foreseen AI Office, should develop processes to implement it.

Signatories

The post Statement on Transparency in the AI Act appeared first on COMMUNIA Association.

]]>
Defining best practices for opting out of ML training – time to act https://communia-association.org/2023/09/29/defining-best-practices-for-opting-out-of-ml-training-time-to-act/ Fri, 29 Sep 2023 11:50:04 +0000 https://communia-association.org/?p=6355 In April of this year we published our Policy Paper #15 on using copyrighted works for teaching the machine which deals with the copyright policy implications of using copyrighted works for machine learning (ML) training. The paper highlights that the current European copyright framework provides a well-balanced framework for such uses in the form of […]

The post Defining best practices for opting out of ML training – time to act appeared first on COMMUNIA Association.

]]>
In April of this year we published our Policy Paper #15 on using copyrighted works for teaching the machine which deals with the copyright policy implications of using copyrighted works for machine learning (ML) training. The paper highlights that the current European copyright framework provides a well-balanced framework for such uses in the form of the text and data mining exceptions in Articles 3 & 4 of the Copyright Directive.

In their new Policy brief on defining best practices for opting out of ML training published today, our member Open Future takes a closer look at a key element of the text and data mining exception: The rights reservation mechanism foreseen in Article 4(3) of the Directive, which allows authors and other rights holders to opt out of their works being used to train (generative) ML models.

The policy brief highlights that there are still many open questions relating to the implementation of the opt out mechanism. That is, how the machine-readable reservation of rights under Article 4 will work in practice. One of the key issues in this context is the fact that currently there are no generally accepted standards or protocols for the machine-readable expression of the reservation. The authors of the policy brief provide an overview of existing initiatives to provide standardized opt-outs which include initiatives by Adobe and a publisher-led W3C community working group, as well as the artist-led project Spawning, which provides an API that aggregates various opt-out systems. In addition, they highlight a number of proprietary initiatives from model developers, including Google and OpenAI.

Lack of a technical standard

According to the policy brief, one of the key problems facing creators and other rights holders who wish to opt out of ML training is that it is unclear whether and how their intentions to opt-out will be respected by ML model developers. According to the authors of the policy brief, this is deeply problematic and risks undermining the legal framework put in place by the 2019 Copyright Directive:

Continued lack of clarity on how to make use of the opt-out from Article 4 of the CDSM Directive creates the risk that the balanced regulatory approach adopted by the EU in 2019 might fail in practice, which would likely lead to a reopening of substantive copyright legislation during the next mandate. Given the length of EU legislative processes, this would prolong the status quo and, as a result, fail to provide protection for creators and other rightholders in the immediate future.

It seems clear that this scenario should be avoided, both in the interest of creators, who need to have meaningful tools to enforce their rights vis-à-vis commercial ML companies, and in order to preserve the hard-won compromise reflected in the TDM exceptions.

In line with this, the Open Future policy brief calls on the European Commission to “provide guidance on how to express machine-readable rights reservations”. According to the brief, the Commission needs to step in and “publicly identify data sources, protocols and standards that allow authors and rightholders to express a machine-readable rights reservation in accordance with Article 4(3) CDSM”. This guidance would provide important clarity about the availability of freely usable methods of reservation and certainty as to their functionality.

According to the authors, such an intervention would allow the Commission to support creators and other rightholders seeking means to opt out of ML training, while at the same time providing “more certainty to ML developers seeking to understand what constitutes best efforts to comply with their obligations under Article 4(3) of the CDSM Directive.

Time to act

The Open Future policy identifies an important shortcoming in the existing EU approach to the use of copyrighted works for ML training. Without clear guidelines for standardized machine-readable rights reservations, the opt-out mechanism foreseen in Article 4 is unlikely to work in practice. While there are a number of existing standards, the fragmentation of this system causes tremendous uncertainty for creators.

As Open Future points out, it is up to the Commission, which is responsible for ensuring the proper implementation of the Directive’s provisions, to intervene in this area and provide initial clarity to all stakeholders. In the longer term, it would be ideal to see the emergence of an open standard that is maintained independently of any direct stakeholders. Such an effort should not be limited to aggregating opt-outs, but should also be designed to ensure that works in the public domain or made available under licenses that allow and/or encourage reuse are clearly identified as such (In line with our Policy Recommendation #20).

The post Defining best practices for opting out of ML training – time to act appeared first on COMMUNIA Association.

]]>