OpenAI's unveiling of GPTBot, their latest web crawling bot, has stirred anticipation for the upcoming release of GPT-5, as indicated by the trademark filing.
The move, though aiming to enhance AI training, has raised discussions around consent and transparency.
OpenAI has introduced GPTBot to amass broader data sources for their next-generation AI systems.
The company's intent is to expand their dataset while taking steps to address privacy concerns and copyright issues.
GPTBot is designed to collect publicly accessible data from websites, adopting a similar opt-out system to popular search engines like Google, Bing, and Yandex.
It assumes data is usable unless a website owner uses a "disallow" rule in a server file to prevent the crawler from accessing their content.
OpenAI asserts that GPTBot will proactively scan gathered data to remove sensitive information and content that violates their policies.
Some technology ethicists express reservations about the opt-out approach, noting potential consent-related challenges.
While some users support OpenAI's need for comprehensive data, others voice concerns about proper attribution and transparency, comparing the practice to derivative works without citation.
The application for the trademark "GPT-5" adds weight to the assumption that OpenAI is preparing their next AI model for release.
This step suggests a shift towards a more expansive data collection approach, emphasizing the importance of updated and diverse training data.
ChatGPT boasts a vast user base, drawing in over 1.5 billion monthly active users.
Restricting GPTBot Access
In the event that website owners intend to limit GPTBot's access to their site, they can make adjustments to their robots.txt file.
If they wish to do this, they can block GPTBot's entry to their entire website.
However, those that want to grant partial access can do so by customising the directories that GPTBot can access.
To do this, they have to edit their robots.txt file.