AIIM - Association for Information and Image Management

24/07/2024 | News release | Distributed by Public on 24/07/2024 19:19

AIIM's Take on AI Input Transparency Policy

Copyright is intended to incentivize creativity to serve the purpose of enriching the public by providing access to creative work. Generative AI engines use content (aka information or unstructured data) to develop large language models. This content can and often does include copyrighted works.

Information management practitioners focus on the collection, processing, storage, security, retention, and accessibility of unstructured data in an organization. Information managers are also responsible for the accuracy and transparency of information. AIIM's membership includes AI technology leaders as well as users of AI. Our members have been working with AI for nearly two decades to help manage information and prepare information for use by AI and automation tools.

In this post, I will share a recap of AIIM's participation in a session at the U.S. Patent and Trademark Office as well as AIIM's position on AI input transparency.

AIIM Invited to USPTO Listening Session

AIIM was invited to attend a listening session at the U.S. Patent and Trademark Office (USPTO) in Alexandria, Virginia, USA on July 11, 2024. Led by the Director of USPTO Kathi Vidal, the meeting was attended by representative from Adobe, IBM, Copyright Alliance, Motion Picture Association, News Media Alliance, Recording Industry Association of America, The Authors Guild, Association of American publishers, and the Copyright Clearance Center.

The invitation was the result of AIIM's response to the Library of Congress's request for comment in 2023 regarding copyright and AI. Download the "Navigating Copyright in the Age of AI" eBook by AIIM for more information.

What is input transparency?

USPTO defines AI input transparency as the disclosure of content ingested by AI. Regarding copyright, USPTO is mainly concerned with authorship, data scraping, and name/image likeness.

As USPTO develops policy to regulate input transparency, they recognize that policy could be determined by case law; voluntary commitments, like industry standards; or regulations.

USPTO Policy Making Process

Listening sessions with key stakeholders are a key part of USPTO's policy making process. Director Vidal explained in her introductory remarks, Direct Vidal explained that the USPTO's policy making process is shaped by:

  1. Public comments
  2. Collaboration with the copyright office
  3. Listening sessions

USPTO is also working closely with international counterparts.

Importantly, in the listening session, Director Vidal and USPTO staff shared that they were looking for specific actions to take under the U.S. President's Executive Order on AI. USPTO is also closely monitoring court cases regarding AI and copyright. USPTO shared that a new policy regarding patent eligibility for AI will be coming out soon and open for public comment this summer.

AI policies will take time to shape. While the Executive Order requires action within 180 days. USPTO staff clarified that each report or topic taken up by USPTO is completed in 180 days as opposed to all relevant policies.

What Copyright Holders Want

AIIM represents information management practitioners and solution providers. In other words, our members are most likely to be in the position to facilitate compliance with new policies regarding AI input transparency.

Most attendees at the session represented copyright holders and it was interesting to hear their perspective. Copyright holders want the following:

  1. AI developers should maintain records of source for training data as well as how they used the content.
  2. Copyright should be opt-in, not opt-out.
  3. Licensing should be used to compensate copyright holders for content usage.
  4. AI developers should disclose data source for training materials for large language models.
  5. AI developers should respect robots.txt or paywalls.

Several attendees representing copyright holders advocated for aggregation of content licenses. In fact, shortly after the meeting, the Copyright Clearance Center announced a collective licensing subscription for AI systems. This type of license would allow AI developers to license and gain permission to several copyright works through a single license or subscription.

Regulatory Limitations

Attendees at the listening session, including AIIM, agreed that there need to be limitations on input transparency requirements.

  1. Regulations should only pertain to generative AI systems, which use large language models and multimodal models that are dependent on content.
  2. AI developers should not be required to disclose commercial licenses
  3. Obligations for transparency should only pertain to public-facing AI systems and not internal systems
  4. AI developers should not need to obtain licenses to use their own work.

AI developers should not be obliged to follow copyright policy when obligations would conflict with privacy law or contract law.

Right now, it is unclear what and how AI developers will disclose data sources. There are multiple bills in the U.S. which could influence disclosure requirements.

AIIM's Stance on AI Input Transparency Policy

Governments need to account for the operational impact of information management requirements.

Requiring licensure or recordkeeping of any kind is no small order. AIIM and the information management industry are arguably built on compliance and regulatory requirements. While it's important for governments to protect intellectual property, it is also important for government and industry to understand the impact requirements or regulations may have on AI developers. AIIM members are information management practitioners and solution providers who help organizations manage their unstructured data, often to comply with regulatory requirements. AI developers will need compute power to store and process licenses and metadata. They may also need information management systems and to develop internal information management skillsets and practices.

Leverage Source citations.

AIIM believes that source citations are an important way to establish information provenance and to mitigate the risks of distributing and employing inaccurate, unverified, or unlawful information. If the origins of AI-generated content are maintained by developers, the training data would be more transparent. It would then make it easier to be verified later in the iterative path-and protect end-user businesses.

We need a universal framework and clear regulations.

It's important for ease of business and consistency that regulations are universal. Businesses of all sizes have experienced the challenge of data privacy laws, which vary by country and even state/territory. The U.S. government should use existing AI legislation and standards to shape requirements to alleviate future burden on businesses.

Governments need to support AI developers.

AI-development and innovation is important to economic growth. As governments consider mandating the retention of training material used to develop AI models, it's critical to support AI developers to ensure compliance while maintain the pace of innovation. There is a cost to retaining records, including technology and talent costs.

While AIIM avidly supports records retention in this instance, it's important to understand the impact on and costs to business when considering regulations. AIIM supports government and private initiatives to aid AI developers to implement required record management and document management practices and systems. This includes providing education, funding through grants or tax incentives, and clear regulations.

Governments could also help educate and advance innovation by providing test datasets. Director Vidal noted that small companies need data and compute power. USPTO is working with NSF to develop datasets.

Leverage existing, tested solutions.

AIIM advocates for standardization and leveraging existing technology and processes to ease the burden of implementation. AIIM recommends an approach that currently web search engines follow where the robots.txt metadata file indicates whether it's acceptable for a website's content to be indexed or not. Many AI models based on publicly available datasets could draw upon such an approach. Metadata and metadata automation could be used to facilitate an efficient opt-out approach that occurs before data is mined for training models.

Licensure potentially slows innovation.

AIIM's recommendation would be to mandate the inclusion of source information as metadata as part of legislation as opposed to requiring licensure to use content. Restrictions on content stymie innovation and human creativity. Generative AI has the potential to build upon existing human knowledge at a rapid scale by connecting and combining data from a vast number of sources. Licensure could slow the potential of generative AI whereas source citations would support and grow use of generative AI by building build trust in AI outputs amongst users while providing credit to the original copyright owners.

What's Next?

While there are no clear or enforceable regulations on AI at present, the regulatory landscape is rapidly evolving. Developers and users of AI should become acquainted with potential AI regulations to begin anticipating future requirements. IAPP provides an AI legislation tracker and OECD provides a live repository of national AI policies and strategies.

If you are an AI developer, this is a good time to proactively assess how your company is acquiring and tracking content and learn more about information management. AIIM can help any organization develop robust, intelligent information management programs. Learn more about membership.