The New York Times and Other Major Publishers Opt Out of Apple’s AI Training Tool – Find Out How to Do It Yourself

2024/09/02 18:04

Rising Opt-Out Trend Among Major Publishers

Many prominent publishers and social platforms are opting to exclude their data from Apple’s AI training.

This development comes less than three months after Apple introduced Applebot-Extended, a tool designed to give website owners the ability to opt out of having their data used to train Apple's AI models.

High-profile entities such as Facebook, Instagram, Craigslist, Tumblr, The New York Times, The Financial Times, The Atlantic, Vox Media, and WIRED’s parent company, Condé Nast, have taken advantage of this option.

The New York Times was among the first to block.

This significant reaction showcases a growing conflict over the use of web data in training AI systems and highlights a shift in the perception of web crawlers, which have traditionally been employed to gather information for various internet services.

The Evolution of Applebot and the Emergence of Applebot-Extended

Applebot, originally launched in 2015, was designed to enhance Apple's search functionalities, including Siri and Spotlight.

However, as Apple's AI initiatives expanded, so did the purpose of Applebot.

The data it collected began being used to train Apple’s foundational AI models.

To address concerns from publishers and content creators about how their data was being utilised, Apple introduced Applebot-Extended.

This new extension allows website owners to specifically request that their data not be used for AI training purposes.

Unlike the original Applebot, which continues to crawl websites for search functions, Applebot-Extended focuses solely on data usage for AI projects.

Publisher Reactions and Data Insights

The reaction to Applebot-Extended has been significant, with many publishers opting to block it.

Data from Ontario-based AI-detection startup Originality AI shows that, as of last week, about 7 percent of high-traffic websites—primarily news and media outlets—were blocking Applebot-Extended.

This week, an analysis by Dark Visitors revealed that approximately 6 percent of websites had blocked the bot.

This relatively low percentage indicates that many website owners either do not yet perceive a conflict or remain unaware of the option to exclude Applebot-Extended.

Ben Welsh, a data journalist, found that just over a quarter of the news websites he surveyed were blocking Applebot-Extended.

This compares to 53 percent of news sites blocking OpenAI’s bot and nearly 43 percent blocking Google’s AI-specific bot, Google-Extended.

Welsh notes that the number of sites blocking Applebot-Extended has been "gradually moving" upward, suggesting increasing awareness and action.

Strategic Decisions and Partnerships

The decisions made by major publishers to block or allow Applebot-Extended often reflect broader strategic considerations.

Condé Nast, for instance, previously blocked OpenAI’s web crawlers but unblocked them following a recent partnership announcement.

We’re partnering with Condé Nast to deepen the integration of quality journalism into ChatGPT and our SearchGPT prototype. https://t.co/tiXqSOTNAl
— OpenAI (@OpenAI) August 20, 2024

Publishers and news organizations that have inked deals with OpenAI:

- Condé Nast
– Associated Press
– Axel Springer
– The Atlantic
– Dotdash Meredith
– Financial Times
– LeMonde
– NewsCorp
– Prisa Media
– Time
– Vox Media https://t.co/9xUHfrgrQl pic.twitter.com/KBCiT7Tj26
— Variety (@Variety) August 20, 2024

This move suggests a business strategy where data access is negotiated as part of commercial agreements.

Vox Media has similarly opted to block Applebot-Extended and other AI scraping tools unless a partnership is in place, emphasising their intent to protect the value of their published content.

In contrast, The New York Times, which is currently engaged in a lawsuit against OpenAI over copyright issues, has criticised the opt-out nature of Applebot-Extended.

NEWS: The NY Times Sues OpenAI and Microsoft Over Use of Copyrighted Work

Lawsuit claims that Millions of articles from The New York Times were used to train chatbots that now compete with it pic.twitter.com/UAeyznJBfD
— X Daily News (@xDaily) December 27, 2023

Charlie Stadtlander, NYT’s director of external communications, pointed out:

"As the law and The Times' own terms of service make clear, scraping or using our content for commercial purposes is prohibited without our prior written permission."

This stance highlights the ongoing debate over how content rights and AI training intersect.

How to Opt Out of Applebot-Extended

For website owners looking to opt out of Applebot-Extended, the process is straightforward.

First, locate or create the robots.txt file on your website.

To block Applebot, add the following lines:

User-agent: Applebot
Disallow: /

To specifically block Applebot-Extended, include:

User-agent: Applebot-Extended
Disallow: /

Lastly, save the file and upload it to the root directory of your website.

By doing this, Apple will not use your site’s data to train its AI models, though your content will still be accessible for search functions.

As Apple explains:

"Applebot-Extended does not crawl webpages. Webpages that disallow Applebot-Extended can still be included in search results. Applebot-Extended is only used to determine how to use the data crawled by the Applebot user agent."

This adjustment in the digital landscape reflects a broader debate over data rights and the evolving role of AI in content creation and distribution.

The future will likely bring further developments as publishers, tech companies, and AI developers navigate these complex issues.

Artificial Intelligence

Gain a broader understanding of the crypto industry through informative reports, and engage in in-depth discussions with other like-minded authors and readers. You are welcome to join us in our growing Coinlive community:https://t.me/CoinliveSG

Add Comment

LoginLeave your comments

0 Comments

Earliest

Load more comments

More news about user-agent: * disallow: /

Aug 15
Questflow Network officially releases Multi Agent Beta version, and community incentive testing begins
Bullish
Bearish
Jul 29
Web3 integrated platform Agent Exchange completes $1 million in financing
Bullish
Bearish
Jun 05
OKX user loses $2 million in crypto in AI deepfake hack
Bullish
Bearish
Jun 03
Decentralized Data Network Streamr Launches AI Agent StreamGPT
Bullish
Bearish
Jan 19
MetaMask Portfolio launches validator agent operation service
Bullish
Bearish
Jan 04
Memeland founder proposes to remove Fud user Farming rewards
Bullish
Bearish
Dec 22
Will SEC Approve BlackRock’s Prime Agent Execution Appointment?
Bullish
Bearish
Aug 29
Coinbase: The problem of "some wallet asset balances are displayed as 0" has been fixed, and user funds are safe
Bullish
Bearish1
Feb 11
慢雾创始人余弦：新型钓鱼攻击通过 User-Agent 值来输出特定内容欺骗 Twitter 用户
Bullish
Bearish
Feb 07
DebtDAO will burn 18 million FUD (FTX Users' Debt) tokens
Bullish
Bearish1

The New York Times and Other Major Publishers Opt Out of Apple’s AI Training Tool – Find Out How to Do It Yourself

Rising Opt-Out Trend Among Major Publishers

The Evolution of Applebot and the Emergence of Applebot-Extended

Publisher Reactions and Data Insights

Strategic Decisions and Partnerships

How to Opt Out of Applebot-Extended

More news about user-agent: * disallow: /

More news about user-agent: * disallow: /

AI completes first on-chain transaction with AI Where is AI Agent on the public chain?

New focus for VCs and developers: Web3 x AI Agent and analysis of potential projects

Agentic Protocols: New Economic Primitives

AO - the cornerstone of autonomous agent activities

What is AI Agent subverting after AIGC?

AI Agent 为什么是AIGC最后的杀手锏？

Bitfinex Customer Support Agent Hacked, Phishing Attacks Carried Out On Customers

AI Agents重新定义Web3游戏的创新之路

AI Agent：重新定义Web3游戏的创新之路

Crypto Scammers Pose ‘Significant Threat’ On LinkedIn, FBI Agent Warns