COMMUNIA Association - copyright

An AI Christmas Miracle

Teresa Nobre — Mon, 18 Dec 2023 08:30:41 +0000

With Christmas fast approaching, on December 8, the European Parliament wrapped up one of its biggest presents of the mandate: the AI Act. A landmark piece of legislation with the goal of regulating Artificial Intelligence while encouraging development and innovation. In sticking with the holiday theme, the last weeks of the negotiations have included everything from near-breakdowns of the discussions, not too dissimilar to the explosive dynamics of festive family gatherings, and 20+ hour trilogue meetings, akin to last-minute christmas shopping. But alas, it is done.

One of the key priorities for COMMUNIA was the issue of transparency of training data. In April, we issued a policy paper calling the EU to enact a reasonable and proportional transparency requirement for developers of generative AI models. We have followed the work up with several blogposts and a podcast, outlining ways to make the requirement work in practice, without placing a disproportionate burden on ML developers.

From our perspective, the introduction of some form of transparency requirement was essential to uphold the legal framework that the EU has for ML training, while ensuring that creators can make an informed choice about whether to reserve their rights or not. Going by leaked versions of the final agreement, it appears that the co-legislators have come to similar conclusions. The deal introduces two specific obligations on providers of general-purpose AI models, which serve that objective: an obligation to implement a copyright compliance policy and an obligation to release a summary of the AI training content.

The copyright compliance obligation

In a leaked version, the obligation to adopt and enforce a copyright compliance policy reads as follows:

[Providers of general-purpose AI models shall] put in place a policy to respect Union copyright law in particular to identify and respect, including through state of the art technologies where applicable, the reservations of rights expressed pursuant to Article 4(3) of Directive (EU) 2019/790

Back in November, we suggested that instead of focussing on getting a summary of the copyrighted content used to train the AI model, the EU lawmaker should focus on the copyright compliance policies followed during the scraping and training stages, mandating developers of generative AI systems to release a list of the rights reservation protocols complied with during the data gathering process. We were therefore pleased to see the introduction of such an obligation, with a specific focus on the opt-outs from the general purpose text and data mining exception.

Interestingly, the leaked version contains a recital on which the co-legislators declare their intent to apply this obligation to “any provider placing a general-purpose AI model on the EU market (…) regardless of the jurisdiction in which the copyright-relevant acts underpinning the training of these foundation models take place”. While one can understand why the EU lawmakers would want to ensure that all AI models released in the EU market respect these EU product requirements, the fact that these are also copyright compliance obligations, which apply previously to the release of the model in the EU market, would raise some legal concerns. It is not clear how the EU lawmakers intend to apply EU copyright law when the scrapping and training takes place outside the EU borders without an appropriate international legal instrument.

The general transparency obligation

The text goes on to require that developers of general-purpose AI models make publicly available a sufficiently detailed summary about the AI training content:

[Providers of general-purpose AI models shall] draw up and make publicly available a sufficiently detailed summary about the content used for training of the general-purpose AI model, according to a template provided by the AI Office

While we have previously criticized the formulation “sufficiently detailed summary” due to the legal uncertainty it could cause, having an independent and accountable entity draw-up a template for the summary (as we defended in here) could alleviate some of the vagueness and potential confusion.

We were also pleased to see that the co-legislators listened to our calls to extend this obligation to all training data. As we have said before, on the one hand introducing a specific requirement only for copyrighted data would add unnecessary legal complexity, since ML developers would first need to know which of their training materials are copyrightable, and on the other hand knowing more about the data that is feeding models that can generate content is essential for a variety of purposes, not all related to copyright.

We should also highlight that the co-legislators appear to have a similar understanding to ours in terms of how compliance with the transparency requirement could be achieved when the AI developers use publicly available datasets. In the leaked version there is a clarifying recital stating that “(t)his summary should be comprehensive in its scope instead of technically detailed, for example by listing the main data collections or sets that went into training the model, such as large private or public databases or data archives, and by providing a narrative explanation about other data sources used.”. When the training dataset is not publicly accessible, we maintain that there should be a way to ensure conditional access to the dataset, namely through a data trust, to confirm legal compliance.

Taking these amendments into account, the compromise found by the co-legislators manages to strike a good balance between what is technically feasible and what is legally necessary.

Merry Christmas!

The post An AI Christmas Miracle appeared first on COMMUNIA Association.

A Digital Knowledge Act for Europe

COMMUNIA Association — Tue, 12 Dec 2023 08:00:49 +0000

As we’re approaching the European election season, COMMUNIA is rolling out its demands for the ‘24-’29 legislature. In an op-ed published on Euractiv, we ask the next Commission and Parliament to finally put the needs of Europe’s knowledge institutions, such as libraries, universities and schools front and center.

Over the next five years, we need to remove the barriers that prevent knowledge institutions from fulfilling their public mission in the digital environment. Specifically, we need a targeted legislative intervention – a Digital Knowledge Act – that enables knowledge institutions to offer the same services online as offline.

Such a regulation would require a few surgical interventions in copyright law, such as the introduction of a unified research exception (see our Policy Recommendation #9) and an EU-wide e-lending right (see our Policy Recommendation #10). However, it would mostly involve measures that fall outside of the scope of recent copyright reform discussions.

Above all, we’re envisioning a number of safeguards that would protect knowledge institutions against the abuse of property rights. Due to the complex and fragmented state of European copyright law, many institutions shy away from fully exercising their usage rights. We believe that an exemption from liability for those who act in good faith and believe that their activities are legal would mitigate this chilling effect (see our Policy Recommendation #17).

Another limiting factor for knowledge institutions in the digital realm are unfair licensing conditions. We believe that rightsholders should be obliged to license works under reasonable conditions to libraries as well as educational and research institutions.

Finally, knowledge institutions should be allowed to circumvent technological protection measures where locks prevent legitimate access and use of works, such as uses covered by limitations and exceptions (see our Policy Recommendation #13).

These demands are far from new and even the idea of a Digital Knowledge Act has been floating around in Brussels policy circles for a long time. Now it is up to the incoming legislators to show that they have the political will to tackle these problems in a comprehensive manner to unlock the full potential of Europe’s knowledge institutions.

The post A Digital Knowledge Act for Europe appeared first on COMMUNIA Association.

Open letter on geo-blocking: Denying people access to culture benefits no-one

COMMUNIA Association — Mon, 11 Dec 2023 14:09:04 +0000

Today, we are publishing an open letter from civil society organizations to members of the European Parliament ahead of the plenary vote on the IMCO own-initiative report on the implementation of the 2018 Geo-blocking Regulation scheduled for Tuesday, December 12, 2023. The letter refutes a number of grossly exaggerated claims made by rightsholders in an attempt to undermine the report. For more background on the discussion on the review of the geo-blocking regulation, see our previous post.

Open letter to the European Parliament

Dear Members of the European Parliament,

This week, on December 12, the European Parliament is scheduled to vote on an own-initiative report on the implementation of the 2018 Geo-blocking Regulation, including for audio-visual content. The IMCO Committee adopted the report on October 25, 2023, with opinions from CULT and JURI, through the support of a broad, cross-party majority. We urge you to follow the committee’s vote, adopt the report and pave the way for a revision of the Geo-blocking Regulation during the Parliament’s ‘24-’29 term.

With regard to audiovisual content, the report “highlights potential benefits for consumers, notably in the availability of a wider choice of content across borders” (p. 4). It also asks for a report of the Commission’s stakeholder dialogue on the subject to be made public and presented to the Parliament.

Despite the report’s balanced nature, it has come under attack by rightsholders from the audio-visual industries. Over the course of the past weeks, the Creativity Works! coalition and others have engaged in a massive campaign against the report, advancing a number of false or overblown claims to undermine it, which can be easily debunked:

Misleading claim 1: Ending Geo-blocking of audio-visual content would harm “15 million creative sector jobs” and “jeopardise a €640 billion industry.”

There is no independent study that proves this statement. Contrary to what part of the copyright industry claims, the IMCO report does not challenge territorial licensing. In fact, it reaffirms the need to preserve it. What IMCO suggests – and we support – is that consumers and citizens should not be denied access to Europe’s rich cultural diversity. Territorial protectionism does not benefit anyone but incumbent industries profiteering from the unjustified partition of the Single Market.

Misleading claim 2: The IMCO report threatens territorial licensing and calls for EU-wide licensing for audiovisual services which would be prohibitively expensive for smaller players and limit cultural diversity in Europe.

This statement is factually incorrect as the IMCO report at no point makes any reference to prohibiting or discouraging territorial licensing. On the contrary, the need to safeguard territorial licensing is mentioned repeatedly throughout the report. Further, there are no demands to instate a system of EU-wide licences. Any predictions for the future of the European audio-visual sector based on these claims are severely misguided and paint a deceiving picture of the IMCO report.

While the campaign by Creativity Works paints a dire picture of the audiovisual sector, should the legislator follow the report, there is little substance to these claims. Denying the people access to culture, by contrast, is not in your, or anyone’s, interest. We encourage you to vote with confidence in favour of the report.

Signed,

COMMUNIA Association for the Public Domain

Creative Commons

Federal Union of European Nationalities (FUEN)

Vrijschrift

Wikimedia Deutschland

Xnet, Institute for Democratic Digitalisation

The post Open letter on geo-blocking: Denying people access to culture benefits no-one appeared first on COMMUNIA Association.

The transparency provision in the AI Act: What needs to happen after the 4th trilogue?

Leander Nielbock — Tue, 07 Nov 2023 09:34:20 +0000

Before the trilogue, COMMUNIA issued a statement, calling for a comprehensive approach on the transparency of training data in the Artificial Intelligence (AI) Act. COMMUNIA and the co-signatories of that statement support more transparency around AI training data, going beyond data that is protected by copyright. It is still unclear whether the co-legislators will be able to pass the regulation before the end of the current term. If they do, proportionate transparency obligations are key to realising the balanced approach enshrined in the text and data mining (TDM) exception of the Copyright Directive.

How can transparency work in practice?

As discussed in our Policy Paper #15, transparency is key to ensuring a fair balance between the interests of creators on the one hand and those of commercial AI developers on the other. A transparency obligation would empower creators, allowing them to assess whether the copyrighted materials used as AI training data have been scraped from lawful sources, as well as whether their decision to opt-out from AI training has been respected. At the same time, such an obligation needs to be fit-for-purpose, proportionate and workable for different kinds of AI developers, including smaller players.

While the European Parliament’s text has taken an important step towards improving transparency, it has been criticised for falling short in two key aspects. First, the proposed text focuses exclusively on training data protected under copyright law which arbitrarily limits the scope of the obligation in a way that may not be technically feasible. Second, the Parliament’s text remains very vague, calling only for a “sufficiently detailed summary” of the training data, which could lead to legal uncertainty for all actors involved, given how opaque the copyright ecosystem itself is.

As such, we are encouraged to see the recent work of the Spanish presidency on the topic of transparency, improving upon the Parliament’s proposed text. The presidency recognises that there is a need for targeted provisions that facilitate the enforcement of copyright rules in the context of foundation models and proposes that providers of foundation models should demonstrate that they have taken adequate measures to ensure compliance with the opt-out mechanism under the Copyright Directive. The Spanish presidency has also proposed that providers of foundation models should make information about their policies to manage copyright-related aspects public.

This proposal marks an important step in the right direction by expanding the scope of transparency beyond copyrighted material. Furthermore, requiring providers to share information about their policies to manage copyright-related aspects could provide important clarity as to the methods of opt-out that are being respected, empowering creators to be certain that their choices to protect works from TDM are being respected.

In search of a middle ground

Unfortunately, while the Spanish presidency has addressed one of our key concerns by removing the limitation to copyrighted material, ambiguity remains. Calling for a sufficiently detailed summary about the content of training data leaves a lot of room for interpretation and may lead to significant legal uncertainty going forward. Having said that, strict and rigid transparency requirements which force developers to list every individual entry inside of a training dataset would not be a workable solution either, due to the unfathomable quantity of data used for training. Furthermore, such a level of detail would provide no additional benefits when it comes to assessing compliance with the opt-out mechanism and the lawful access requirement. So what options do we have left?

First and foremost, the reference to “sufficiently detailed summary” must be replaced with a more concrete requirement. Instead of focussing on the content of training data sets, this obligation should focus on the copyright compliance policies followed during the scraping and training stages. Developers of generative AI systems should be required to provide a detailed explanation of their compliance policy including a list of websites and other sources from which the training data has been reproduced and extracted, and a list of the machine-readable rights reservation protocols/techniques that they have complied with during the data gathering process. In addition, the AI Act should allocate the responsibility to further develop transparency requirements to the to-be-established Artificial Intelligence Board (Council) or Artificial Intelligence Office (Parliament). This new agency, which will be set up as part of the AI Act, must serve as an independent and accountable actor, ensuring consistent implementation of the legislation and providing guidance for its application. On the subject of transparency requirements, an independent AI Board/Office would be able to lay down best-practices for AI developers and define the granularity of information that needs to be provided to meet the transparency requirements set out in the Act.

We understand that the deadline to find an agreement on the AI Act ahead of the next parliamentary term is very tight. However, this should not be an excuse for the co-legislators to rush the process by taking shortcuts through ambiguous language purely to find swift compromises, creating significant legal uncertainty in the long run. In order to achieve its goal to protect Europeans from harmful and dangerous applications of AI while still allowing for development and encouraging innovation in the sector, and to potentially serve as model legislation for the rest of the world, the AI Act must be robust and legally sound. Everything else would be a wasted opportunity.

The post The transparency provision in the AI Act: What needs to happen after the 4th trilogue? appeared first on COMMUNIA Association.

Statement on Transparency in the AI Act

COMMUNIA Association — Mon, 23 Oct 2023 18:12:49 +0000

A fifth round of the trilogue negotiations on the Artificial Intelligence (AI) Act is scheduled for October 24, 2023. Together with Creative Commons, and Wikimedia Europe, COMMUNIA, in a statement, calls on the co-legislators to take a holistic approach on AI transparency and agree on proportionate solutions.

As discussed in greater detail in our Policy Paper #15, COMMUNIA deems it essential that the flexibilities for text-and-data mining enshrined in Articles 3 and 4 of the Copyright in the Digital Single Market Directive are upheld. For this approach to work in practice, we welcome practical initiatives for greater transparency around AI training data to understand whether opt-outs are being respected.

The full statement is provided below:

Statement on Transparency in the AI Act

The undersigned are civil society organizations advocating in the public interest, and representing knowledge users and creative communities.

We are encouraged that the Spanish Presidency is considering how to tailor its approach to foundation models more carefully, including an emphasis on transparency. We reiterate that copyright is not the only prism through which reporting and transparency requirements should be seen in the AI Act.

General transparency responsibilities for training data

Greater openness and transparency in the development of AI models can serve the public interest and facilitate better sharing by building trust among creators and users. As such, we generally support more transparency around the training data for regulated AI systems, and not only on training data that is protected by copyright.

Copyright balance

We also believe that the existing copyright flexibilities for the use of copyrighted materials as training data must be upheld. The 2019 Directive on Copyright in the Digital Single Market and specifically its provisions on text-and-data mining exceptions for scientific research purposes and for general purposes provide a suitable framework for AI training. They offer legal certainty and strike the right balance between the rights of rightsholders and the freedoms necessary to stimulate scientific research and further creativity and innovation.

Proportionate approach

We support a proportionate, realistic, and practical approach to meeting the transparency obligation, which would put less onerous burdens on smaller players including non-commercial players and SMEs, as well as models developed using FOSS, in order not to stifle innovation in AI development. Too burdensome an obligation on such players may create significant barriers to innovation and drive market concentration, leading the development of AI to only occur within a small number of large, well-resourced commercial operators.

Lack of clarity on copyright transparency obligation

We welcome the proposal to require AI developers to disclose the copyright compliance policies followed during the training of regulated AI systems. We are still concerned with the lack of clarity on the scope and content of the obligation to provide a detailed summary of the training data. AI developers should not be expected to literally list out every item in the training content. We maintain that such level of detail is not practical, nor is it necessary for implementing opt-outs and assessing compliance with the general purpose text-and-data mining exception. We would welcome further clarification by the co-legislators on this obligation. In addition, an independent and accountable entity, such as the foreseen AI Office, should develop processes to implement it.

Signatories

The post Statement on Transparency in the AI Act appeared first on COMMUNIA Association.

We need to talk about AI and transparency!

COMMUNIA Association — Fri, 06 Oct 2023 13:26:06 +0000

That’s what we thought when we agreed to join the latest episode of the AI lab. podcast, fittingly titled “AI & the quest for transparency”.

Over the course of 20 minutes, Teresa walks the listeners through some of the key questions when it comes to the training of generative AI models and how it affects the text and data mining (TDM) exception laid down in Articles 3 and 4 of the EU Copyright Directive (see also our Policy Paper #15).

Disclaimer: Playback of the embedded video establishes a connection to YouTube and may lead to data being collected by and shared with third parties. Proceed only if you agree.

While the Directive clearly establishes the right to mine online content and use it to train machine learning algorithms, this right hinges on the possibility for rightsholders to opt out their works if the activity takes place in a commercial context. A key issue we are currently seeing is that there is a lack of transparency around the training of generative AI models, which makes it impossible to tell whether such opt-outs are being respected or not.

In the discussion, Teresa highlighted the need for more transparency regarding opt-outs but also across the copyright ecosystem as a whole. COMMUNIA has long advocated for the creation of a database of copyrighted works, which would contribute to managing the system of opt-outs.

The discussion ended with a call upon the European Commission to provide guidance and lead technical discussions towards establishing a clear, reliable and transparent framework for opt-outs for TDM (see also Open Future’s recent policy brief on this issue). Only through dialogue between the concerned stakeholders, led by an independent third party, will we be able to establish best practices that uphold Articles 3 and 4 of the Copyright Directive while providing a fair and balanced framework for the training of machine learning models.

The post We need to talk about AI and transparency! appeared first on COMMUNIA Association.

Defining best practices for opting out of ML training – time to act

Leander Nielbock — Fri, 29 Sep 2023 11:50:04 +0000

In April of this year we published our Policy Paper #15 on using copyrighted works for teaching the machine which deals with the copyright policy implications of using copyrighted works for machine learning (ML) training. The paper highlights that the current European copyright framework provides a well-balanced framework for such uses in the form of the text and data mining exceptions in Articles 3 & 4 of the Copyright Directive.

In their new Policy brief on defining best practices for opting out of ML training published today, our member Open Future takes a closer look at a key element of the text and data mining exception: The rights reservation mechanism foreseen in Article 4(3) of the Directive, which allows authors and other rights holders to opt out of their works being used to train (generative) ML models.

The policy brief highlights that there are still many open questions relating to the implementation of the opt out mechanism. That is, how the machine-readable reservation of rights under Article 4 will work in practice. One of the key issues in this context is the fact that currently there are no generally accepted standards or protocols for the machine-readable expression of the reservation. The authors of the policy brief provide an overview of existing initiatives to provide standardized opt-outs which include initiatives by Adobe and a publisher-led W3C community working group, as well as the artist-led project Spawning, which provides an API that aggregates various opt-out systems. In addition, they highlight a number of proprietary initiatives from model developers, including Google and OpenAI.

Lack of a technical standard

According to the policy brief, one of the key problems facing creators and other rights holders who wish to opt out of ML training is that it is unclear whether and how their intentions to opt-out will be respected by ML model developers. According to the authors of the policy brief, this is deeply problematic and risks undermining the legal framework put in place by the 2019 Copyright Directive:

Continued lack of clarity on how to make use of the opt-out from Article 4 of the CDSM Directive creates the risk that the balanced regulatory approach adopted by the EU in 2019 might fail in practice, which would likely lead to a reopening of substantive copyright legislation during the next mandate. Given the length of EU legislative processes, this would prolong the status quo and, as a result, fail to provide protection for creators and other rightholders in the immediate future.

It seems clear that this scenario should be avoided, both in the interest of creators, who need to have meaningful tools to enforce their rights vis-à-vis commercial ML companies, and in order to preserve the hard-won compromise reflected in the TDM exceptions.

In line with this, the Open Future policy brief calls on the European Commission to “provide guidance on how to express machine-readable rights reservations”. According to the brief, the Commission needs to step in and “publicly identify data sources, protocols and standards that allow authors and rightholders to express a machine-readable rights reservation in accordance with Article 4(3) CDSM”. This guidance would provide important clarity about the availability of freely usable methods of reservation and certainty as to their functionality.

According to the authors, such an intervention would allow the Commission to support creators and other rightholders seeking means to opt out of ML training, while at the same time providing “more certainty to ML developers seeking to understand what constitutes best efforts to comply with their obligations under Article 4(3) of the CDSM Directive.

Time to act

The Open Future policy identifies an important shortcoming in the existing EU approach to the use of copyrighted works for ML training. Without clear guidelines for standardized machine-readable rights reservations, the opt-out mechanism foreseen in Article 4 is unlikely to work in practice. While there are a number of existing standards, the fragmentation of this system causes tremendous uncertainty for creators.

As Open Future points out, it is up to the Commission, which is responsible for ensuring the proper implementation of the Directive’s provisions, to intervene in this area and provide initial clarity to all stakeholders. In the longer term, it would be ideal to see the emergence of an open standard that is maintained independently of any direct stakeholders. Such an effort should not be limited to aggregating opt-outs, but should also be designed to ensure that works in the public domain or made available under licenses that allow and/or encourage reuse are clearly identified as such (In line with our Policy Recommendation #20).

The post Defining best practices for opting out of ML training – time to act appeared first on COMMUNIA Association.

Do 90s rappers dream of electric pastiche?

Paul Keller — Wed, 20 Sep 2023 10:28:43 +0000

Last week Germany’s highest court, the Bundesgerichtshof (BGH), for the 2nd time in less than a decade referred questions related to the Metall auf Metall case to the European Court of Justice. This time the BGH is asking the CJEU to explain the concept of pastiche so that it can determine if the use of a 2 second sample of Kraftwerks 1977 song Metall auf Metall in Sabrina Setlur’s 1997 song Nur Mir qualifies as such.

Last week’s referral is the newest development in the legal saga that started in 1999, when Kraftwerk sued Setlur‘s producer Moses Pelham for the unauthorized use of the sample, and that has seen Germany’s highest court deal with the matter for the fifth time already. In response to the previous referral, the CJEU had established that the use of the sample was legal under Germany’s pre-2002 copyright rules but that it was infringing under the post-2002 copyright rules (that implemented the 2001 Copyright in the Information Society Directive). This conclusion was largely based on the finding that following the adoption of the 2001 Copyright in the Information Society (InfoSoc) directive, the concept of free use (“Freie Benutzung”) in German copyright law was against EU law.

The new referral arises from the fact that, as part of its 2021 Copyright revision and in order to bring German copyright law into compliance with the EU directives, Germany had removed the free use provision and at the same time introduced a new exception for the purpose of Caricature, Parody and Pastiche (§ 51a UrhG). The Hamburg Court of Appeals, to which the BGH had returned the case for a final determination, has subsequently ruled that after the introduction of the new exception in 2021 the use of the sample was in fact legal again as it constituted a use for the purpose of pastiche.

This decision has since been appealed by Kraftwerk, which is how the case came back to the BGH for another round and in the context of this appeal the BGH has now again asked the CJEU for guidance, this time on the meaning of the the term Pastiche in Article 5(3)(k) of the 2001 InfoSoc Directive from which the German exception is derived. This means that this time around the CJEU’s ruling in the case will have much wider implications than for German copyright law alone. It is very likely to determine the EU legal regime for sampling.

The referral to the BGH contains two separate questions which are described in the court’s press release (the text of the actual decision which contains the questions has still to be released by the BGH). According to the press release (translation ours)…

… the question first arises as to whether the restriction on use for the purpose of pastiche within the meaning of Article 5(3)(k) of Directive 2001/29/EC is a catch-all provision at least for an artistic treatment of a pre-existing work or other subject matter, including sampling, and whether restrictive criteria such as the requirement of humour, imitation of style or homage apply to the concept of pastiche.

The idea that uses for the purpose of pastiche serve as a sort of exception of last resort to safeguard artistic freedom is a welcome one, as it would protect freedom to create at the EU level, as we recommend in our Policy Recommendation #7. Considering that the pastiche exception is already mandatory in the EU, a positive answer to the first part of that question by the CJEU would ensure an harmonized protection of freedom of artistic expression at the EU level.

The CJEU has been suggesting for a while now that the principles enshrined in the EU Charter of Fundamental Rights are already fully internalized by EU copyright law, namely through the existing list of EU exceptions. As we have noted in our Policy Paper #14 on fundamental rights as a limit to copyright during emergencies, that is not necessarily the case, as the existing exceptions do not appear to have exhausted all the fundamental rights considerations that are imposed by the Charter, and on the other hand not all of those balancing mechanisms have yet found full expression in the national laws of the EU Member States.

With this referral, however, the court will have the opportunity to analyze whether the EU copyright law is sufficiently taking into account artistic freedom considerations. In our view, an interpretation of the pastiche exception in light of that fundamental freedom should lead the Court to provide a broad scope that covers all forms of artistic treatment protected by the Charter.

In the press release the BGH expresses a very similar concern noting the inherent conflict between the rigid EU copyright system and the freedom of (artistic) expression:

The pastiche exception could be understood as a general exception for artistic freedom, which is necessary because the necessary scope of artistic freedom cannot be safeguarded in all cases by the immanent limitation of the scope of protection of exploitation rights to uses of works and performances in a recognisable form and the other exceptions such as, in particular, parody, caricature and quotation.

This understanding of the Pastiche exception would also align with the intent of the German legislator when introducing it in 2021. In his 2022 study on the Pastice Exception conducted for the Gesellschaft für Freiheitsrechte, Till Kreutzer notes that

The German legislator has deliberately phrased the pastiche term in an open manner. It is clearly stated in the legislative materials that sec. 51a UrhG is intended to have a broad and dynamic scope of application. The pastiche exception serves to legitimize common cultural and communication practices on the internet, especially user-generated content and communication in social networks. It is supposed to be applied to remixes, memes, GIFs, mashups, fan art, fan fiction and sampling, among others.

In the context of this study Kreutzer proposes the following “copyright-specific definition” of pastiche and concludes that the concept covers the practice of sampling:

A pastiche is a distinct cultural and/or communicative artifact that borrows from and recognizably adopts the individual creative elements of published third-party works.

It will be interesting to see how the CJEU will approach the same task. In this context the second question formulated by the BGH is slightly more troubling. Here the BGH wants to know …

… whether the use “for the purpose” of a pastiche within the meaning of Article 5(3)(k) of Directive 2001/29/EC requires a finding of an intention on the part of the user to use an object of copyright protection for the purpose of a pastiche or whether the recognisability of its character as a pastiche is sufficient for someone who is aware of the copyright object referred to and who has the intellectual understanding required to perceive the pastiche.

Taking into account the facts of the Metall auf Metall case this question does not make much sense. In 1997, when Nur Mir was recorded, the concept of pastiche did not exist in German copyright law (and neither did the InfoSoc directive which introduced the concept at the EU level). This makes it pretty much impossible for the record producers to have had the intention to use the snippet from Metall of Metall for the purpose of pastiche — a purpose that according the the BGH itself still need to be defined by the CJEU.

For the reasons of legal certainty alone the CJEU should reject the intention requirement and base any definition on the characteristics of the use alone, as suggested in the above quoted definition developed by Kreutzer.

In any case the new BGH referral is a very welcome development in the Metall auf Metall saga. It provides the CJEU with the much needed opportunity to clarify this important concept that played a major role in the recent discussions about Article 17 CDSM Directive. In order to secure a majority for the directive, the EU legislator made the pastiche exception mandatory in an effort to safeguard transformative uses of copyrighted works on user generated content platforms.

It would only be fitting that the final legacy of Kraftwerk‘s narrow-minded attempt to weaponize copyright to limit the creative expression of a subsequent generation of artists would almost three decades later result in a broad conceptualisation of pastiche as safeguarding artistic expression across the EU.

The post Do 90s rappers dream of electric pastiche? appeared first on COMMUNIA Association.

The AI Act and the quest for transparency

Leander Nielbock — Wed, 28 Jun 2023 07:00:33 +0000

Artificial intelligence (AI) has taken the world by storm and people’s feelings towards the technology range from fascination about its capabilities to grave concerns about its implications. Meanwhile, legislators across the globe are trying to wrap their heads around how to regulate AI. The EU has proposed the so-called AI Act which aims to protect European citizens from potential harmful applications of AI, while still encouraging innovation in the sector. The file, which was originally proposed by the European Commission in April of 2021 just entered into trilogues and will be hotly debated over the coming months by the European Parliament and Council.

One of the key issues for the discussions will most likely be how to deal with the rather recent phenomenon of generative AI systems (also referred to as foundational models) which are capable of producing various content ranging from complex text to images, sound computer code and much more with very limited human input.

The rise of generative AI

Within less than a year, generative AI technology went from having a select few, rather niche applications to becoming a global phenomenon. Perhaps no application represents this development like ChatGPT. Originally released in November 2022, ChatGPT broke all records by reaching one million users within just five days of its release with the closest competitors for this title, namely Instagram, Spotify, Dropbox and Facebook, taking several months to reach the same stage. Fast forward to today, approximately half a year later, and ChatGPT reportedly counts more than 100 million users.

One of the reasons for this “boom” of generative AI systems is that they are more than just a novelty. Some systems have established themselves as considerable competitors for human creators for certain types of creative expressions, being able to write background music or produce stock images that would take humans many more hours to create. In fact, the quality of the output of some systems is already so high while the cost of production is so low that they pose an existential risk to specific categories of creators, as well as the industries behind them.

But how do generative AI systems achieve this and what is the secret behind their ability to produce works that can comfortably compete with works of human creativity? Providing an answer to this question, even at surface level, is extremely difficult since AI systems are notoriously opaque, making it nearly impossible to fully understand their inner workings. Furthermore, developers of these systems have an obvious interest in keeping the code of their algorithm as well as the training data used secret. This being said, one thing is for certain: generative AI systems need data, and lots of it.

The pursuit of data

Creating an AI system is incredibly data intensive. Data is needed to train and test the algorithm throughout its entire lifecycle. Going back to the example of ChatGPT, the system was trained on numerous datasets throughout its iterations containing hundreds of gigabytes of data equating to hundreds of billions of words.

With so much data needed for training alone, this opens up the question how developers get their hands on this amount of information. As is fairly obvious by the sheer numbers, training data for AI systems is usually not collected manually. Instead, developers often rely on two sources for their data: curated databases which contain vast amounts of data and so-called web crawlers which “harvest” the near boundless information and data resources available on the open internet.

The copyright conundrum

Some of the data available in online databases or collected by web scraping tools will inevitably be copyrighted material which raises some questions with regards to the application of copyright in the context of training AI systems. Communia has extensively discussed the interaction between copyright and text and data mining (TDM) in our policy paper #15 but just as a short refresher about the clear framework established in the 2019 Copyright Directive:

Under Article 3, research organizations and cultural heritage institutions may scrape anything that they have legal access to, including content that is freely available online for the purposes of scientific research. Under Article 4, this right is extended to anyone for any purposes but rights holders may reserve their rights and opt out of text and data mining, most often through machine-readable means.

While this framework, in principle, provides appropriate and sufficient legal clarity on the use of copyrighted materials in AI training, the execution still suffers from the previously mentioned opacity of AI systems and the secrecy around training data as there is no real way for a rightsholder to check whether their attempt to opt out of commercial TDM has actually worked. In addition, there’s still a lot of uncertainty about the best technical way to effectively opt out.

Bringing light into the dark

Going back to the EU’s AI Act reveals that the European Parliament recognises this issue as well. The Parliament’s position foresees that providers of generative AI models should document and share a “sufficiently detailed” summary of the use of training data protected under copyright law (Article 28b). This is an encouraging sign and a step in the right direction. The proof is in the pudding, however. More clarity is needed with regards to what “sufficiently detailed” means and how this provision would look in practice.

Policy makers should not forget that the copyright ecosystem itself suffers from a lack of transparency. This means that AI developers will not be able – and therefore should not be required – to detail the author, the owner or even the title of the copyrighted materials that they have used as training data in their AI systems. This information simply does not exist out there for the vast majority of protected works and, unless right holders and those who represent them start releasing adequate information and attaching it to their works, it is impossible for AI developers to provide such detailed information.

AI developers also should not be expected to know which of their training materials are copyrightable. Introducing a specific requirement for this category of data adds legal complexity that is not needed nor advisable. For that and other reasons, we recommend in our policy paper that AI developers be required to be transparent about all of their training data, and not only about the data that is subject to copyright.

The fact that AI developers know so little about each of the materials that is being used to train their models should not, however, be a reason to abandon the transparency requirement.

In our view, those that are using publicly available datasets will probably comply with the transparency requirement simply by referring to the dataset, even if the dataset is lacking detailed information on each work. Those that are willing to submit training data with a data thrust that would ensure the accessibility of the repository for purposes of assessing compliance with the law would probably also ensure a reasonable level of transparency.

The main problem is with those that are not disclosing any information about their training data, such as OpenAI. These need to be forced to make some sort of public documentation and disclosure and at least need to be able to show that they have not used copyrighted works that have an opt-out attached to it. And that begs for the question: how can creators and other right holders effectively reserve their training rights and opt-out of the commercial TDM exception?

Operationalizing the opt-out mechanism

In our recommendations for the national implementation of the TDM exceptions we suggested that the proper technical way to facilitate web mining was by the use of a protocol like robot.txt which creates a binary “mine”/“don’t mine” rule. However, this technical protocol has some significant limitations when it comes to its application in the context of data mining for AI training data.

Therefore, one of the recommendations in our policy paper is for the Commission to lead these technical discussions and provide guidance on how the opt-out is supposed to work in practice to end some of the uncertainty that exists among creators and other rights holders.

In order to encourage a fair and balanced approach to both the opt-out and the transparency issues, the Commission could convene a stakeholder dialogue and include all affected parties, namely AI developers, creators and rights holders as well as representatives of civil society and academia. The outcome of this dialogue should be a way to operationalise the opt-out system itself and the transparency requirements that will uphold such a system without placing a disproportionate burden on AI developers.

Getting this right would provide a middle ground that allows creators and other rights holders to protect their commercial AI training rights over their works while encouraging innovation and the development of generative AI models in the EU.

The post The AI Act and the quest for transparency appeared first on COMMUNIA Association.

Something (the Public Domain) is rotting in the state of Italy

Justus Dreyling — Tue, 20 Jun 2023 09:10:23 +0000

We certainly didn’t ask for this, but Italy appears to have made it its mission to show why our work at COMMUNIA is as relevant as ever: by launching an attack on the Public Domain. Since October last year, Italian courts have applied the country’s Cultural Heritage Code (hereinafter shortened to “the Code”) in a number of landmark cases to forbid the reuse of works of Italian Renaissance artists.

Il nuovo rinascimento [“the new Renaissance”]

We have covered the lawsuits against French fashion label Jean Paul Gaultier for using Sandro Botticelli’s Birth of Venus on a collection and German toy maker Ravensburger for using Leonardo da Vinci’s Vitruvian Man on a jigsaw puzzle on the COMMUNIA blog in the past months. Both Gaultier and Ravensburger were brought to court by the respective museums that host these works in their collections, the Uffizi in Florence and the Gallerie dell’Accademia in Venice, respectively, for violations of the Italian Cultural Heritage Code. According to Art. 106 ff. of the Code, commercial uses of works require the authorization of the cultural heritage institution that has the work in question in its collection as well as the payment of a concession fee – even if that work is in the Public Domain.

More recently, the court of Florence has ruled in favour of the Gallerie dell’Accademia in Florence and the Italian Ministry of Culture for the use of the image of Michelangelo’s David on the cover of GQ Magazine Italy. The cover features a hologram, which, depending on the viewing angle, shows a photographic reproduction of Michelangelo’s statue or a bare-chested, muscular man posing in a similar fashion (see this interview with COMMUNIA member Deborah De Angelis as well as Eleonora Rosati’s post for the IPKat).

Copyright with a glued-on beard

All of the conclusions reached in these cases can be rebutted on the same grounds we’ve explained extensively in previous contributions. The reproduced works are clearly in the Public Domain, that is, they are completely free from any copyright restriction. Their creators Sandro Botticelli (1445-1510), Leonardo da Vinci (1452-1519) and Michelangelo (1475-1564) have all been dead for centuries. Even Michelangelo, the youngest of the bunch, lived long before any concrete notion of copyright ever existed. Yes, the Italian Cultural Heritage Code is an instrument of administrative law. The function of this Section of the Code is so similar to copyright, however, that one must wonder if its raison d’être isn’t simply to serve as a pseudo-copyright that the Italian state can use to generate income off of Public Domain works. When new laws are created to negate the effect of a carefully yet imperfectly-balanced copyright system to justify a dubious revenue model, we must react.

Because in doing so, the Code calls into question the social contract on which copyright is based. Copyright is granted for a limited period of time, allowing creators to extract monetary gain from their works for as long as they are copyright-protected. When a work’s term of protection ends, it enters the Public Domain and, as a rule, becomes free to use by everybody. Carving out Italian collections from this rule hinders access to our common European cultural heritage. The works in these collections belong to all of us in the sense that everyone should have access to them and be able to draw on them to create something new.

But this isn’t just a philosophical issue. It is also fundamentally at odds with copyright law and its intrinsic balance: that protection lasts for a limited time. As Roberto Caso comments on the Kluwer Copyright Blog: “The ex post facto judicial creation of an eternal and indefinite pseudo-intellectual property leads to the violation of the principle of the numerus clausus of intellectual property rights.” More specifically, the Code is incompatible with the spirit of Article 14 of the DSM Directive, which states that reproductions of works of visual art that are in the public domain cannot be subject to copyright or related rights, unless the reproduction itself is an original creative work (see Giulia Dore’s contribution to Kluwer).

Is there a method in the madness?

What is equally egregious is the fact that the Italian cultural heritage code establishes the Italian state as an arbiter to determine whether any given use of a work is appropriate. The idea that a state – more than 500 years after the creation of a work – claims to be able to determine what is an appropriate use of a work is not only frivolous, but dangerous for democracy, freedom of expression and participation in cultural life. There is no need for a state to determine if something is an appropriate or inappropriate use. Leave that decision to creators, their audiences, and to society as a whole, whose members can engage in free and democratic debates.

As a side note: It is even more frivolous if we consider that the Italian Ministry of Tourism runs a campaign full of clichés with a cartoonish Venus as a modern-day influencer — ironically the campaign is called “Open to Meraviglia” [English text in original, which translates into “Open to Marvel”] . To be clear, the Ministry of Tourism is well within their rights to do this, and this is a perfectly fine example of what parody might look like. So why should a public body be allowed but not a toy maker, magazine or fashion creator? These events set a very worrying precedent for artists and creators in Italy, Europe, and all over the world.

While it’s been fun writing about the absurdity of these lawsuits for some time, enough is enough. Italy must repeal this section of its cultural heritage code and ensure that Public Domain works can be freely reused by all.

The post Something (the Public Domain) is rotting in the state of Italy appeared first on COMMUNIA Association.