COMMUNIA Association - Full copyright protection should only be granted to works that have been registered by their authors. https://communia-association.org/policy-recommendation/full-copyright-protection-should-only-be-granted-to-works-that-have-been-registered-by-their-authors/ Website of the COMMUNIA Association for the Public Domain Mon, 18 Dec 2023 12:24:00 +0000 en-US hourly 1 https://wordpress.org/?v=6.4.2 https://communia-association.org/wp-content/uploads/2016/11/Communia-sign_black-transparent.png COMMUNIA Association - Full copyright protection should only be granted to works that have been registered by their authors. https://communia-association.org/policy-recommendation/full-copyright-protection-should-only-be-granted-to-works-that-have-been-registered-by-their-authors/ 32 32 An AI Christmas Miracle https://communia-association.org/2023/12/18/an-ai-christmas-miracle/ Mon, 18 Dec 2023 08:30:41 +0000 https://communia-association.org/?p=6455 With Christmas fast approaching, on December 8, the European Parliament wrapped up one of its biggest presents of the mandate: the AI Act. A landmark piece of legislation with the goal of regulating Artificial Intelligence while encouraging development and innovation. In sticking with the holiday theme, the last weeks of the negotiations have included everything […]

The post An AI Christmas Miracle appeared first on COMMUNIA Association.

]]>
With Christmas fast approaching, on December 8, the European Parliament wrapped up one of its biggest presents of the mandate: the AI Act. A landmark piece of legislation with the goal of regulating Artificial Intelligence while encouraging development and innovation. In sticking with the holiday theme, the last weeks of the negotiations have included everything from near-breakdowns of the discussions, not too dissimilar to the explosive dynamics of festive family gatherings, and 20+ hour trilogue meetings, akin to last-minute christmas shopping. But alas, it is done.

One of the key priorities for COMMUNIA was the issue of transparency of training data. In April, we issued a policy paper calling the EU to enact a reasonable and proportional transparency requirement for developers of generative AI models. We have followed the work up with several blogposts and a podcast, outlining ways to make the requirement work in practice, without placing a disproportionate burden on ML developers.

From our perspective, the introduction of some form of transparency requirement was essential to uphold the legal framework that the EU has for ML training, while ensuring that creators can make an informed choice about whether to reserve their rights or not. Going by leaked versions of the final agreement, it appears that the co-legislators have come to similar conclusions. The deal introduces two specific obligations on providers of general-purpose AI models, which serve that objective: an obligation to implement a copyright compliance policy and an obligation to release a summary of the AI training content.

The copyright compliance obligation

In a leaked version, the obligation to adopt and enforce a copyright compliance policy reads as follows:

[Providers of general-purpose AI models shall] put in place a policy to respect Union copyright law in particular to identify and respect, including through state of the art technologies where applicable, the reservations of rights expressed pursuant to Article 4(3) of Directive (EU) 2019/790

Back in November, we suggested that instead of focussing on getting a summary of the copyrighted content used to train the AI model, the EU lawmaker should focus on the copyright compliance policies followed during the scraping and training stages, mandating developers of generative AI systems to release a list of the rights reservation protocols complied with during the data gathering process. We were therefore pleased to see the introduction of such an obligation, with a specific focus on the opt-outs from the general purpose text and data mining exception.

Interestingly, the leaked version contains a recital on which the co-legislators declare their intent to apply this obligation to “any provider placing a general-purpose AI model on the EU market (…) regardless of the jurisdiction in which the copyright-relevant acts underpinning the training of these foundation models take place”. While one can understand why the EU lawmakers would want to ensure that all AI models released in the EU market respect these EU product requirements, the fact that these are also copyright compliance obligations, which apply previously to the release of the model in the EU market, would raise some legal concerns. It is not clear how the EU lawmakers intend to apply EU copyright law when the scrapping and training takes place outside the EU borders without an appropriate international legal instrument.

The general transparency obligation

The text goes on to require that developers of general-purpose AI models make publicly available a sufficiently detailed summary about the AI training content:

[Providers of general-purpose AI models shall] draw up and make publicly available a sufficiently detailed summary about the content used for training of the general-purpose AI model, according to a template provided by the AI Office

While we have previously criticized the formulation “sufficiently detailed summary” due to the legal uncertainty it could cause, having an independent and accountable entity draw-up a template for the summary (as we defended in here) could alleviate some of the vagueness and potential confusion.

We were also pleased to see that the co-legislators listened to our calls to extend this obligation to all training data. As we have said before, on the one hand introducing a specific requirement only for copyrighted data would add unnecessary legal complexity, since ML developers would first need to know which of their training materials are copyrightable, and on the other hand knowing more about the data that is feeding models that can generate content is essential for a variety of purposes, not all related to copyright.

We should also highlight that the co-legislators appear to have a similar understanding to ours in terms of how compliance with the transparency requirement could be achieved when the AI developers use publicly available datasets. In the leaked version there is a clarifying recital stating that “(t)his summary should be comprehensive in its scope instead of technically detailed, for example by listing the main data collections or sets that went into training the model, such as large private or public databases or data archives, and by providing a narrative explanation about other data sources used.”. When the training dataset is not publicly accessible, we maintain that there should be a way to ensure conditional access to the dataset, namely through a data trust, to confirm legal compliance.

Taking these amendments into account, the compromise found by the co-legislators manages to strike a good balance between what is technically feasible and what is legally necessary.

Merry Christmas!

The post An AI Christmas Miracle appeared first on COMMUNIA Association.

]]>
Statement on Transparency in the AI Act https://communia-association.org/2023/10/23/statement-on-transparency-in-the-ai-act/ Mon, 23 Oct 2023 18:12:49 +0000 https://communia-association.org/?p=6370 A fifth round of the trilogue negotiations on the Artificial Intelligence (AI) Act is scheduled for October 24, 2023. Together with Creative Commons, and Wikimedia Europe, COMMUNIA, in a statement, calls on the co-legislators to take a holistic approach on AI transparency and agree on proportionate solutions. As discussed in greater detail in our Policy […]

The post Statement on Transparency in the AI Act appeared first on COMMUNIA Association.

]]>
A fifth round of the trilogue negotiations on the Artificial Intelligence (AI) Act is scheduled for October 24, 2023. Together with Creative Commons, and Wikimedia Europe, COMMUNIA, in a statement, calls on the co-legislators to take a holistic approach on AI transparency and agree on proportionate solutions.

As discussed in greater detail in our Policy Paper #15, COMMUNIA deems it essential that the flexibilities for text-and-data mining enshrined in Articles 3 and 4 of the Copyright in the Digital Single Market Directive are upheld. For this approach to work in practice, we welcome practical initiatives for greater transparency around AI training data to understand whether opt-outs are being respected.

The full statement is provided below:

Statement on Transparency in the AI Act

The undersigned are civil society organizations advocating in the public interest, and representing  knowledge users and creative communities.

We are encouraged that the Spanish Presidency is considering how to tailor its approach to foundation models more carefully, including an emphasis on transparency. We reiterate that copyright is not the only prism through which reporting and transparency requirements should be seen in the AI Act.

General transparency responsibilities for training data

Greater openness and transparency in the development of AI models can serve the public interest and facilitate better sharing by building trust among creators and users. As such, we generally support more transparency around the training data for regulated AI systems, and not only on training data that is protected by copyright.

Copyright balance

We also believe that the existing copyright flexibilities for the use of copyrighted materials as training data must be upheld. The 2019 Directive on Copyright in the Digital Single Market and specifically its provisions on text-and-data mining exceptions for scientific research purposes and for general purposes provide a suitable framework for AI training. They offer legal certainty and strike the right balance between the rights of rightsholders and the freedoms necessary to stimulate scientific research and further creativity and innovation.

Proportionate approach

We support a proportionate, realistic, and practical approach to meeting the transparency obligation, which would put less onerous burdens on smaller players including non-commercial players and SMEs, as well as models developed using FOSS, in order not to stifle innovation in AI development. Too burdensome an obligation on such players may create significant barriers to innovation and drive market concentration, leading the development of AI to only occur within a small number of large, well-resourced commercial operators.

Lack of clarity on copyright transparency obligation

We welcome the proposal to require AI developers to disclose the copyright compliance policies followed during the training of regulated AI systems. We are still concerned with the lack of clarity on the scope and content of the obligation to provide a detailed summary of the training data. AI developers should not be expected to literally list out every item in the training content. We maintain that such level of detail is not practical, nor is it necessary for implementing opt-outs and assessing compliance with the general purpose text-and-data mining exception. We would welcome further clarification by the co-legislators on this obligation. In addition, an independent and accountable entity, such as the foreseen AI Office, should develop processes to implement it.

Signatories

The post Statement on Transparency in the AI Act appeared first on COMMUNIA Association.

]]>
The AI Act and the quest for transparency https://communia-association.org/2023/06/28/the-ai-act-and-the-quest-for-transparency/ Wed, 28 Jun 2023 07:00:33 +0000 https://communia-association.org/?p=6325 Artificial intelligence (AI) has taken the world by storm and people’s feelings towards the technology range from fascination about its capabilities to grave concerns about its implications. Meanwhile, legislators across the globe are trying to wrap their heads around how to regulate AI. The EU has proposed the so-called AI Act which aims to protect […]

The post The AI Act and the quest for transparency appeared first on COMMUNIA Association.

]]>
Artificial intelligence (AI) has taken the world by storm and people’s feelings towards the technology range from fascination about its capabilities to grave concerns about its implications. Meanwhile, legislators across the globe are trying to wrap their heads around how to regulate AI. The EU has proposed the so-called AI Act which aims to protect European citizens from potential harmful applications of AI, while still encouraging innovation in the sector. The file, which was originally proposed by the European Commission in April of 2021 just entered into trilogues and will be hotly debated over the coming months by the European Parliament and Council.

One of the key issues for the discussions will most likely be how to deal with the rather recent phenomenon of generative AI systems (also referred to as foundational models) which are capable of producing various content ranging from complex text to images, sound computer code and much more with very limited human input.

The rise of generative AI

Within less than a year, generative AI technology went from having a select few, rather niche applications to becoming a global phenomenon. Perhaps no application represents this development like ChatGPT. Originally released in November 2022, ChatGPT broke all records by reaching one million users within just five days of its release with the closest competitors for this title, namely Instagram, Spotify, Dropbox and Facebook, taking several months to reach the same stage. Fast forward to today, approximately half a year later, and ChatGPT reportedly counts more than 100 million users.

One of the reasons for this “boom” of generative AI systems is that they are more than just a novelty. Some systems have established themselves as considerable competitors for human creators for certain types of creative expressions, being able to write background music or produce stock images that would take humans many more hours to create. In fact, the quality of the output of some systems is already so high while the cost of production is so low that they pose an existential risk to specific categories of creators, as well as the industries behind them.

But how do generative AI systems achieve this and what is the secret behind their ability to produce works that can comfortably compete with works of human creativity? Providing an answer to this question, even at surface level, is extremely difficult since AI systems are notoriously opaque, making it nearly impossible to fully understand their inner workings. Furthermore, developers of these systems have an obvious interest in keeping the code of their algorithm as well as the training data used secret. This being said, one thing is for certain: generative AI systems need data, and lots of it.

The pursuit of data

Creating an AI system is incredibly data intensive. Data is needed to train and test the algorithm throughout its entire lifecycle. Going back to the example of ChatGPT, the system was trained on numerous datasets throughout its iterations containing hundreds of gigabytes of data equating to hundreds of billions of words.

With so much data needed for training alone, this opens up the question how developers get their hands on this amount of information. As is fairly obvious by the sheer numbers, training data for AI systems is usually not collected manually. Instead, developers often rely on two sources for their data: curated databases which contain vast amounts of data and so-called web crawlers which “harvest” the near boundless information and data resources available on the open internet.

The copyright conundrum

Some of the data available in online databases or collected by web scraping tools will inevitably be copyrighted material which raises some questions with regards to the application of copyright in the context of training AI systems. Communia has extensively discussed the interaction between copyright and text and data mining (TDM) in our policy paper #15 but just as a short refresher about the clear framework established in the 2019 Copyright Directive:

Under Article 3, research organizations and cultural heritage institutions may scrape anything that they have legal access to, including content that is freely available online for the purposes of scientific research. Under Article 4, this right is extended to anyone for any purposes but rights holders may reserve their rights and opt out of text and data mining, most often through machine-readable means.

While this framework, in principle, provides appropriate and sufficient legal clarity on the use of copyrighted materials in AI training, the execution still suffers from the previously mentioned opacity of AI systems and the secrecy around training data as there is no real way for a rightsholder to check whether their attempt to opt out of commercial TDM has actually worked. In addition, there’s still a lot of uncertainty about the best technical way to effectively opt out.

Bringing light into the dark

Going back to the EU’s AI Act reveals that the European Parliament recognises this issue as well. The Parliament’s position foresees that providers of generative AI models should document and share a “sufficiently detailed” summary of the use of training data protected under copyright law (Article 28b). This is an encouraging sign and a step in the right direction. The proof is in the pudding, however. More clarity is needed with regards to what “sufficiently detailed” means and how this provision would look in practice.

Policy makers should not forget that the copyright ecosystem itself suffers from a lack of transparency. This means that AI developers will not be able – and therefore should not be required – to detail the author, the owner or even the title of the copyrighted materials that they have used as training data in their AI systems. This information simply does not exist out there for the vast majority of protected works and, unless right holders and those who represent them start releasing adequate information and attaching it to their works, it is impossible for AI developers to provide such detailed information.

AI developers also should not be expected to know which of their training materials are copyrightable. Introducing a specific requirement for this category of data adds legal complexity that is not needed nor advisable. For that and other reasons, we recommend in our policy paper that AI developers be required to be transparent about all of their training data, and not only about the data that is subject to copyright.

The fact that AI developers know so little about each of the materials that is being used to train their models should not, however, be a reason to abandon the transparency requirement.

In our view, those that are using publicly available datasets will probably comply with the transparency requirement simply by referring to the dataset, even if the dataset is lacking detailed information on each work. Those that are willing to submit training data with a data thrust that would ensure the accessibility of the repository for purposes of assessing compliance with the law would probably also ensure a reasonable level of transparency.

The main problem is with those that are not disclosing any information about their training data, such as OpenAI. These need to be forced to make some sort of public documentation and disclosure and at least need to be able to show that they have not used copyrighted works that have an opt-out attached to it. And that begs for the question: how can creators and other right holders effectively reserve their training rights and opt-out of the commercial TDM exception?

Operationalizing the opt-out mechanism

In our recommendations for the national implementation of the TDM exceptions we suggested that the proper technical way to facilitate web mining was by the use of a protocol like robot.txt which creates a binary “mine”/“don’t mine” rule. However, this technical protocol has some significant limitations when it comes to its application in the context of data mining for AI training data.

Therefore, one of the recommendations in our policy paper is for the Commission to lead these technical discussions and provide guidance on how the opt-out is supposed to work in practice to end some of the uncertainty that exists among creators and other rights holders.

In order to encourage a fair and balanced approach to both the opt-out and the transparency issues, the Commission could convene a stakeholder dialogue and include all affected parties, namely AI developers, creators and rights holders as well as representatives of civil society and academia. The outcome of this dialogue should be a way to operationalise the opt-out system itself and the transparency requirements that will uphold such a system without placing a disproportionate burden on AI developers.

Getting this right would provide a middle ground that allows creators and other rights holders to protect their commercial AI training rights over their works while encouraging innovation and the development of generative AI models in the EU.

The post The AI Act and the quest for transparency appeared first on COMMUNIA Association.

]]>
Using Copyrighted Works for Teaching the Machine – New Policy Paper https://communia-association.org/2023/04/26/using-copyrighted-works-for-teaching-the-machine-new-policy-paper/ Wed, 26 Apr 2023 10:16:56 +0000 https://communia-association.org/?p=6173 The surge of generative artificial intelligence has gone alongside a renewed interest in questions about the relationship between machine learning and copyright law. In our newly published policy paper #15 entitled “Using copyrighted works for teaching the machine” (also available as a PDF file), we are looking at the input side of the equation within the […]

The post Using Copyrighted Works for Teaching the Machine – New Policy Paper appeared first on COMMUNIA Association.

]]>
The surge of generative artificial intelligence has gone alongside a renewed interest in questions about the relationship between machine learning and copyright law. In our newly published policy paper #15 entitled “Using copyrighted works for teaching the machine” (also available as a PDF file), we are looking at the input side of the equation within the EU copyright framework.

We discuss the considerations of the use of copyright-protected works and other protected subject matter as training data for generative AI models, and provide two recommendations for lawmakers. Here, we leave aside questions relating to the output of AI models (e.g. whether the output of generative AI models is copyrightable and in how far such output can be infringing exclusive rights), which we will address in another, yet to be published paper.

This paper is without prejudice to the position of COMMUNIA or individual COMMUNIA members regarding this discussion in other jurisdictions.

The post Using Copyrighted Works for Teaching the Machine – New Policy Paper appeared first on COMMUNIA Association.

]]>
Policy Paper #15 on using copyrighted works for teaching the machine https://communia-association.org/policy-paper/policy-paper-15-on-using-copyrighted-works-for-teaching-the-machine/ Wed, 26 Apr 2023 09:43:56 +0000 https://communia-association.org/?post_type=article&p=6166 Background We have witnessed a proliferation of (so-called) generative artificial intelligence (AI) models since OpenAI made DALL-E 2 available to the public in July 2022. This has gone alongside a renewed interest in questions about the relationship between machine learning (ML) and copyright law — as evidenced by a surge in publications on the topic […]

The post Policy Paper #15 on using copyrighted works for teaching the machine appeared first on COMMUNIA Association.

]]>
Background

We have witnessed a proliferation of (so-called) generative artificial intelligence (AI) models since OpenAI made DALL-E 2 available to the public in July 2022. This has gone alongside a renewed interest in questions about the relationship between machine learning (ML) and copyright law — as evidenced by a surge in publications on the topic both in scientific journals and general-audience media, as well as a number of lawsuits.

In this policy paper, we are looking at the input side of the equation within the EU copyright framework.1 We discuss the considerations of the use of copyright-protected works and other protected subject matter as training data for generative AI models, and provide two recommendations for lawmakers. Here, we leave aside questions relating to the output of AI models (e.g. whether the output of generative AI models is copyrightable and in how far such output can be infringing exclusive rights), which we will address in another, yet to be published paper.

The surge of generative AI raises concerns for creators and their livelihood. It also prompts broader questions about the implications of potentially inauthentic and untrustworthy AI-generated output for social cohesion as well as about the extraction and concentration of resources by only a few tech companies. We are mindful of these challenges, but copyright is not designed to address all of them in a way that does justice to the underlying grievances.

Training generative machine learning systems

This paper is based on the assumption that in order to train generative ML models their developers require access to large amounts of materials, including images and text, many of which – but far from all – are protected by copyright. Currently, the most prominent examples of such models are image generators (such as Dalll-E, Midjourney and Stable Diffusion) and large language models (such as BLOOM, GPT and LLaMA) that are able to generate text and sometimes software code. But it is only a question of time until the same questions will arise for music or video generators and the use of copyrighted musical or audiovisual materials.

Access to large amounts of training data enables developers of ML models to train their models so that they can generate output. Based on our understanding of the technology, we assume that copies of the works that have been used for training are in no way stored as part of the model weights2 or the model itself.

What we observe at a high level of abstraction is a situation where ML models are trained on very large numbers of works from a vast number of rightholders. Usually, the generated output will be based on the model as derived from the totality of the training data.

Traditionally, the copyright system has been badly equipped to deal with instances of large numbers of underlying rightholders. Rights clearance for mass digitization projects, for instance, is greatly encumbered by the amount of rightholders and difficulties in obtaining permission from those who either never have or are no longer actively managing their rights. Training ML models from the open internet involves even greater numbers of rightholders.

All of this combined with the novelty and rapid development of ML technologies has resulted in significant legal uncertainty. Unsurprisingly, the use of copyrighted works as part of AI training is already subject to legal dispute in the US and UK.

Most of the discussion about copyright and ML training has been conducted within the parameters provided by the US framework. The question dominating this discussion has been: Do uses of copyrighted works for the purpose of training (generative) ML constitute fair use?3

The EU copyright framework does not provide for a fair use defence; users can rely on the system of exceptions and limitations to copyright when they use a work without express permission of rightholders. Therefore the situation is different here.

We argue that questions relating to the input side of ML are sufficiently addressed by the existing EU copyright framework. Since the adoption of the 2019 Copyright in the Digital Single Market Directive (CDSM Directive), the EU copyright framework contains a set of harmonised exceptions that are applicable to ML as described above. These are the exceptions for text and data mining (TDM) introduced in Articles 3 and 4 of the Directive.

Machine learning and text and data mining

Even though not directly referenced in the Directive, the fight over TDM exceptions during the legislative battle over the CDSM Directive has always been about the ML revolution that was already on the horizon at that time.4

The CDSM Directive defines TDM as “any automated analytical technique aimed at analysing text and data in digital form in order to generate information which includes but is not limited to patterns, trends and correlations.” This definition clearly covers current approaches to ML that rely heavily on correlations between observed characteristics of training data. The use of copyrighted works as part of the training data is exactly the type of use that was foreseen when the TDM exception was drafted and this has recently been confirmed by the European Commission in response to a parliamentary question.

Article 35 of the Directive allows text and data mining for the purposes of scientific research by research organisations and cultural heritage institutions, as long as they have lawful access to the works to be mined, including content that is freely available online.

The Article 4 exception – which is the result of extensive advocacy by researchers, research organisations, open access advocates and technology companies to broaden the scope of the TDM exception in Article 3 – allows anyone to use lawfully accessible works, as defined above, for text and data mining unless such use has been “expressly reserved by their rightholders in an appropriate manner, such as machine-readable means.”

In sum, these two articles provide a clear legal framework for the use of copyrighted works as input data for ML training in the EU. Researchers at academic research institutions and cultural heritage institutions are free to use all lawfully accessible works (e.g. the entire public Internet) to train ML applications. Everyone else – including commercial ML developers – can only use works that are lawfully accessible and for which their rightholders have not explicitly reserved use for TDM purposes.

Opt-out and the limits of the copyright framework

The EU’s approach constitutes a forward-looking framework for dealing with the issues raised by the mass scale use of copyrighted works for ML training. Importantly, it ensures a fair balance between the interests of rightholders on the one side and researchers and ML developers on the other.

The exception in Article 3 provides much needed clarity for academic researchers and ensures that they have access to all copyrighted works for TDM/ML training purposes. The more limited exception in Article 4 addresses the interests of creators and other rightholders who want to control the use of their works and those who don’t.

Creators and rightholders who want to control the use of their works can opt out from TDM/ML either to prevent their works from being used for this purpose or to establish a negotiation position for licensing such uses of their works either collectively or individually. Here, the European Commission should also play an active role in defining technical standards for reserving the right in machine-readable form in order to increase certainty for all parties involved.

What is equally important is that this differentiation also recognizes the fact that for a significant amount of works the rights are not actively managed by their rightholders. This means that under a copyright-by-default regime (i.e works can only be used on the basis of explicit opt-in) these works could not be used for ML learning since obtaining permission from large numbers of rightholders not actively managing their rights would be impossible. Future EU rulemaking should thus maintain the opt-out approach for ML training and ensure that permissionless use remains the default.

Recommendation 1: The EU must maintain the exceptions for text and data mining established in Articles 3 and 4 of the CDSM Directive. The existing opt-out model for commercial uses should be preserved as it establishes a balance between the interests of ML developers on the one hand and creators on the other.

Transparency

However, we should not stop there. For this approach to work in practice it is essential that ML development becomes more transparent. The EU legislator should enact provisions that require providers of generative ML models to publicly disclose the use of all materials used as training data, including copyright-protected works, in a reasonable and proportionate manner so as not to create an undue burden. Creators should be empowered to know which of their works have been used for training and how.

Such a requirement would improve the transparency of ML development and deployment. As such, this requirement would ensure that adherence to EU legal framework governing the use of copyrighted works for ML training can be verified by anyone, particularly those rightholders who seek to control the use of their works.

Finally, a general transparency requirement contributes to the development of trustworthy and responsible AI and is in the public interest.

Recommendation 2: The EU should enact a robust general transparency requirement for developers of generative AI models. Creators need to be able to understand whether their works are being used as training data and how, so that they can make an informed choice about whether to reserve the right for TDM or not.

The two recommendations developed in this paper are closely tied to our Policy Recommendations #2 (Full copyright protection should only be granted to works that have been registered by their authors) and #16 (Creators should have the right to know their audience).

From a copyright perspective, the opt-out mechanism increases legal certainty for all parties involved. On the one hand, it allows creators and rightholders to indicate how their works are to be used in the context of ML training. On the other hand, it provides ML developers with the ability to ensure that their use of training data does not infringe copyright where applicable. Finally, by excluding scientific research from the scope of opt-outs, the system provides an important contribution to academic freedom and to ensuring that ML-related research can flourish in the EU.

We also maintain that transparency is paramount to a copyright and AI system that works for everyone. Creators should be able to track copyright-relevant uses of their works in order to be able to make informed decisions and improve their negotiating position vis-à-vis other actors in the value chain, and the public needs transparency to ensure that as AI continues to progress its benefits flow to society as a whole.

Endnotes

  1. This paper is without prejudice to the position of COMMUNIA or individual COMMUNIA members regarding this discussion in other jurisdictions.
  2. Model weights are an integral part of the model itself. The model consists of a structure or architecture that defines the way it processes the input data, and the weights are the parameters that are learned during the training process to optimise the model’s performance.
  3. For a recent overview, see: Henderson, P. et al. (2023) ‘Foundation Models and Fair Use’ [Unpublished]. Available at: https://arxiv.org/abs/2303.15715.
  4. The European Parliament’s summary published after the adoption of the Directive makes this explicit by noting that “the co-legislators agreed to enshrine in EU law another mandatory exception for general text and data mining (Article 4) in order to contribute to the development of data analytics and artificial intelligence.”
  5. This statement by 24 stakeholders stresses “the foundational role that TDM plays in Artificial Intelligence (AI).”

The post Policy Paper #15 on using copyrighted works for teaching the machine appeared first on COMMUNIA Association.

]]>