Researchers Develop AI Tool to Monitor and Prevent Harmful Outputs in Language Models

21/11/2023 01:20

According to Cointelegraph, a team of researchers from artificial intelligence (AI) firm AutoGPT, Northeastern University, and Microsoft Research have developed a tool that monitors large language models (LLMs) for potentially harmful outputs and prevents them from executing. The agent is described in a preprint research paper titled “Testing Language Model Agents Safely in the Wild.” The research states that the agent is flexible enough to monitor existing LLMs and can stop harmful outputs such as code attacks before they happen. Existing tools for monitoring LLM outputs for harmful interactions seemingly work well in laboratory settings, but when applied to testing models already in production on the open internet, they often fall short of capturing the dynamic intricacies of the real world. This is largely due to the existence of edge cases and the impossibility of researchers imagining every possible harm vector before it happens in the field of AI. Even when humans interacting with AI have the best intentions, unexpected harm can arise from seemingly innocuous prompts. To train the monitoring agent, the researchers built a dataset of nearly 2,000 safe human/AI interactions across 29 different tasks ranging from simple text-retrieval tasks and coding corrections to developing entire webpages from scratch. They also created a competing testing dataset filled with manually-created adversarial outputs, including dozens designed to be unsafe. The datasets were then used to train an agent on OpenAI’s GPT 3.5 turbo, a state-of-the-art system, capable of distinguishing between innocuous and potentially harmful outputs with an accuracy factor of nearly 90%.

Bullish

Bearish

Live Updates

Yesterday
Ethereum Surpasses 3900 USDT With Modest 24-Hour Gain
Bullish
Bearish
Yesterday
Ethereum Surpasses 3900 USDT Mark With Modest Gains
Bullish
Bearish
Yesterday
ETH breaks through 3900 USDT, up 0.34% in 24 hours
Bullish
Bearish
Yesterday
Greeks.live: Currently, ETF inflows are strong, and the strength of US stocks may be good for cryptocurrencies
Bullish
Bearish
Yesterday
Magic Eden Co-founder: ME token is just the beginning, there are more plans in the future
Bullish1
Bearish
Yesterday
Trump may use Bitcoin as US reserve asset on 'day one' — How high will BTC price go?
Bullish
Bearish
Yesterday
On-Chain Analysts Water Down Ether Bulls $5,000 Price Expectations This Year
Bullish
Bearish
Yesterday
Ripple Takes On SEC's 'Lawless Tactics' — Top Lawyer Alderoty Reveals
Bullish
Bearish
Yesterday
[이주의 토큰 언락] ADA, ARB, APE, ENA, ID
Bullish
Bearish
Yesterday
5 عمليات فتح رموز يجب مراقبتها الأسبوع المقبل
Bullish
Bearish

Researchers Develop AI Tool to Monitor and Prevent Harmful Outputs in Language Models

Live Updates

Trending News

Telegram’s Gaming Token Notcoin ($NOT) Falls 70% After Launch, Could Solana’s TapSwap $TAP Overtake?

Binance Shunned from Forbes' Top 20: CZ's Impending Jail Term a Factor?

DeGods NFT Founder Says, 'I Was Being a Pussy' for Leaving Solana During Its Crash

American CPI Shocks! Fed’s "Hawk-Dove Battle" Unfolds, Middle East Conflict Erupts, Gold Reclaims $2410, Bitcoin Bulls Explode

Two Chinese Nationals Arrested and Arraigned in the U.S. – What’s Going On?

Biden Might Reconsider SAB 121 Vote Veto Amid Political Support for Cryptocurrency

Bitcoin May Start a New Bull Market, No Deep Corrections Expected Under Strong Support

China, US, and Europe Take Action! USDT Struggles to Decouple from Illegal Forex Transactions; Kraken Evaluates Delisting European Tether Pairs

SEC Approval Soon? Coinbase Releases Significant Signal! Ether Surges Nearly $200

Is Global Web3 Integration Imminent or Will Web2 Dominate Forever?