How OpenAI and Microsoft Stole Content from the New York Times

December 27, 2023

3.2K views

6 minute read

Picture by: https://library.rice.edu/features/unlimited-access-new-york-times

Introduction

Hi, I’m Fred, a journalist and an AI enthusiast. I’ve been following the developments of artificial intelligence (AI) for a long time, and I’m fascinated by its potential to transform various industries and domains, including journalism and media. However, I’m also aware of the challenges and risks that AI poses, especially when it comes to ethical and legal issues.

One of the most controversial cases of AI in journalism is the recent lawsuit filed by The New York Times against OpenAI and Microsoft, two of the leading companies in the field of generative AI. The lawsuit accuses them of using the Times’ content without permission to train their AI models, such as ChatGPT, an AI text generator, and DALL-E, an AI image generator. The lawsuit claims that this constitutes a violation of the Times’ intellectual property rights, and that it could harm the quality and credibility of journalism, as well as create unfair competition.

In this article, I will explore the background and details of this lawsuit, and examine the ethical and legal implications of using generative AI to create journalism. I will also discuss some of the possible solutions and best practices to address these issues, and how journalists and media outlets can leverage AI in a responsible and beneficial way.

AI in L&D — Image by: https://www.ibm.com/blog/what-is-generative-ai-what-are-foundation-models-and-why-do-they-matter/

What is generative AI and how does it work?

Generative AI is a branch of AI that focuses on creating new content, such as text, images, audio, or video, based on existing data. Generative AI uses deep learning, a technique that involves training neural networks, which are mathematical models that mimic the structure and function of the human brain, on large amounts of data. The neural networks learn the patterns and features of the data, and then generate new content that resembles the original data, but is not identical to it.

One of the most popular and powerful generative AI models is ChatGPT, developed by OpenAI, a research organization backed by Microsoft and other tech giants. ChatGPT is an AI text generator that can produce coherent and fluent text on almost any topic, given a few words or sentences as input. ChatGPT can write anything from essays, speeches, and stories, to code, lyrics, and jokes. ChatGPT is based on a neural network architecture called Transformer, which can process large amounts of text and learn the relationships between words and sentences. ChatGPT has been trained on a massive corpus of text from the internet, including news articles, books, blogs, social media posts, and more.

Another generative AI model developed by OpenAI is DALL-E, an AI image generator that can create realistic and diverse images based on text descriptions. For example, DALL-E can generate images of “a cat wearing a suit and tie”, “a snail made of harp”, or “a painting of a capybara sitting in a field at sunrise”. DALL-E is also based on the Transformer architecture, and has been trained on a large dataset of text and image pairs from the internet.

What is the lawsuit about and why is it important?

The lawsuit filed by The New York Times against OpenAI and Microsoft is one of the first and most high-profile legal battles over the use of generative AI in journalism. The lawsuit alleges that OpenAI and Microsoft have infringed on the Times’ intellectual property rights by using the Times’ content to train their AI models, without obtaining the Times’ consent or paying any fees. The lawsuit claims that this amounts to “unlawful copying, distribution, and public display” of the Times’ content, and that it violates the Times’ terms of service, which prohibit the use of its content for the development of any software program, including AI systems.

The lawsuit also argues that the use of the Times’ content by OpenAI and Microsoft could have negative consequences for the quality and credibility of journalism, as well as create unfair competition. The lawsuit states that generative AI models like ChatGPT and DALL-E could use the Times’ content to produce text and images that are similar to or derived from the Times’ original reporting and writing, and that could potentially mislead, confuse, or deceive the public. The lawsuit also asserts that generative AI models could use the Times’ content to create new content that competes with or substitutes the Times’ products and services, and that could harm the Times’ reputation, brand, and revenue.

The lawsuit seeks an injunction to stop OpenAI and Microsoft from using the Times’ content to train their AI models, and to delete or destroy any copies of the Times’ content that they have obtained or generated. The lawsuit also seeks monetary damages, including statutory damages of up to $150,000 for each piece of infringing content, as well as punitive damages and attorney’s fees.

The lawsuit is important because it raises several ethical and legal questions about the use of generative AI to create journalism, such as:

Who owns the intellectual property rights of the content generated by AI models, and who is responsible for its accuracy, quality, and impact?
How can journalists and media outlets protect their content from being used without permission or compensation by AI companies or other parties?
How can journalists and media outlets ensure the transparency and accountability of the content generated by AI models, and how can they prevent or detect the misuse or abuse of AI-generated content?
How can journalists and media outlets leverage AI to enhance their reporting and storytelling, and to serve the public interest, without compromising their ethical standards and values?

OpenAI Venture — Photo by Mariia Shalabaieva on Unsplash

What are some of the possible solutions and best practices?

The lawsuit filed by The New York Times against OpenAI and Microsoft is likely to set a precedent for the future of generative AI in journalism, and to influence the development of new laws and regulations on AI. However, the lawsuit is not the only way to address the ethical and legal issues of using generative AI to create journalism. There are also some possible solutions and best practices that journalists and media outlets can adopt to use AI in a responsible and beneficial way, such as:

Establishing clear and consistent policies and guidelines on the use of AI-generated content, and communicating them to the public and the stakeholders.
Obtaining the consent and paying the fees of the original content creators or owners before using their content to train AI models, and respecting their terms of service and intellectual property rights.
Providing the source and the method of the content generated by AI models, and disclosing the use of AI-generated content to the readers or the viewers, and indicating the level of human involvement and verification.
Verifying and fact-checking the content generated by AI models, and correcting any errors or inaccuracies, and acknowledging any limitations or uncertainties.
Monitoring and moderating the content generated by AI models, and preventing or removing any harmful, offensive, or inappropriate content, and reporting any illegal or unethical activities.
Evaluating and measuring the impact and the value of the content generated by AI models, and seeking feedback and input from the audience and the experts.
Collaborating and cooperating with other journalists and media outlets, as well as AI companies and researchers, to share best practices and to develop common standards and norms on the use of AI-generated content.

Conclusion

Generative AI is a powerful and promising technology that can create new content, such as text and images, based on existing data. However, generative AI also poses ethical and legal challenges, especially when it comes to journalism and media. The lawsuit filed by The New York Times against OpenAI and Microsoft is one of the first and most high-profile cases of using generative AI to create journalism, and it highlights the issues of intellectual property rights, quality and credibility of journalism, and unfair competition.

The lawsuit is not the only way to address these issues, though. There are also some possible solutions and best practices that journalists and media outlets can adopt to use generative AI in a responsible and beneficial way, such as establishing clear policies and guidelines, obtaining consent and paying fees, providing transparency and accountability, verifying and fact-checking, monitoring and moderating, evaluating and measuring, and collaborating and cooperating. By following these principles, journalists and media outlets can leverage AI to enhance their reporting and storytelling, and to serve the public interest, without compromising their ethical standards and values.