DeepSeek Introduces Transparent AI
China-based AI company, DeepSeek, has unveiled its latest AI system, DeepSeek-R1-Lite-Preview, marking a significant advancement in reasoning and problem-solving capabilities.
The system, positioned as a competitor to OpenAI's o1, sets itself apart by enhancing transparency and improving the way it processes complex queries.
Unlike traditional models, which often overlook nuances, DeepSeek-R1-Lite allocates more time to fact-check and thoroughly consider questions, reducing common errors.
Similar to OpenAI's o1, DeepSeek-R1 plans its responses step-by-step, spending up to tens of seconds on complex inquiries to ensure accuracy.
Commentators have pointed out the irony in DeepSeek's transparency, especially when compared to Western models that have yet to fully address reasoning gaps.
DeepSeek's latest version has already demonstrated impressive results on problem-solving benchmarks like the American Invitational Mathematics Examination (AIME) and MATH, which assess mathematical and logical proficiency.
This performance positions DeepSeek-R1 as a serious contender to OpenAI's ChatGPT and its specialised o1 model.
With generative AI rapidly advancing, the release of DeepSeek-R1-Lite-Preview and recent updates from Mistral AI's Le Chat signal growing competition in the AI space, pushing companies to address weaknesses and deliver more robust, transparent solutions.
DeepSeek Wins in Step-by-Step Reasoning
DeepSeek highlights its AI's ability to provide step-by-step real-time reasoning, enhancing transparency and allowing users to better understand its thought process.
In addition to this feature, the company plans to release an open-source model and developer tools through an API in the near future.
A recent comparison chart by AI expert Andrew Curran shows that DeepSeek-R1-Lite-Preview outperforms competitors like OpenAI's o1-preview and Claude 3.5 Sonnet in key metrics such as AIME (52.5) and Codeforces (1450), as well as excelling in advanced problem-solving tasks like MATH-500 (91.6).
However, it trails behind in areas like GPQA Diamond (58.5) and Zebra Logic (56.6), where OpenAI's o1-preview performs better, scoring 73.3 and 71.4, respectively.
These figures suggest that while DeepSeek's AI shows significant promise in certain advanced reasoning domains, there remains room for improvement in general knowledge and logical reasoning.
AI Models from Major Labs Improving Minimally
DeepSeek's AI has raised concerns due to its vulnerability to being jailbroken, allowing users to prompt the model in ways that bypass its safeguards.
For instance, one X (formerly known as Twitter) user successfully prompted the AI to provide a detailed meth recipe.
On the other hand, DeepSeek-R1 is notably sensitive to political queries, particularly those related to Chinese leadership, events like Tiananmen Square, or contentious geopolitical topics like Taiwan.
This behaviour likely stems from regulatory pressure in China, where AI models are required to adhere to the government's "core socialist values" and undergo scrutiny by the country's internet regulator.
Reports indicate that AI systems in China are often restricted from using certain sources, resulting in models that avoid responding to politically sensitive topics to ensure compliance with state mandates.
As these regulatory challenges unfold, the broader AI community is re-evaluating the long-standing concept of "scaling laws."
This theory posited that increasing data and computing power would continuously improve a model's performance.
However, recent reports suggest that models from major labs like OpenAI, Google, and Anthropic are no longer showing the rapid advancements they once did.
This shift has sparked a search for alternative AI approaches, architectures, and techniques, including test-time compute—an innovation seen in models like o1 and DeepSeek-R1.
Also known as inference compute, this method grants models additional processing time during task completion, offering a potential pathway to overcome the limitations of traditional scaling methods.
When asked if it is better than OpenAI's ChatGPT, it evaded the question as seen below.
Diving into DeepSeek
DeepSeek, a company with plans to open-source its DeepSeek-R1 model and release an API, operates in a fascinating niche within the AI landscape.
Backed by High-Flyer Capital Management, a Chinese quantitative hedge fund that leverages AI for trading decisions, DeepSeek's approach is both ambitious and strategic.
One of its early innovations, the general-purpose DeepSeek-V2, which analyses both text and images, prompted major competitors like ByteDance, Baidu, and Alibaba to lower their model usage fees and even make certain services entirely free.
High-Flyer, known for its sizable investments in AI infrastructure, builds its own server clusters for model training.
The latest iteration reportedly boasts 10,000 Nvidia A100 GPUs, with a cost nearing 1 billion yen (~$138 million).
Founded by computer science graduate Liang Wenfeng, High-Flyer Capital Management aims to push the boundaries of AI with DeepSeek, targeting the development of "superintelligent" systems that could redefine the future of AI.