Claimed to be an AI Agent tool that can “take over computers and liberate hands,” OpenClaw has recently become extremely popular in the tech circle.
It is praised as an “AI worker,” seemingly capable of writing articles, sending emails, or even buying coffee just by giving commands. But is that really the case? Is it a productivity神器, or just a “toy” for tech enthusiasts to try out?
Recently, reporters from Daily Economic News, together with developers from Meiri Technology, conducted an in-depth test. We connected OpenClaw to five domestic large models—Qwen3-Max, Kimi-K2.5, MiniMax-M2.1, MiniMax-M2.5, and Zhipu GLM-4.7—as well as OpenAI’s GPT-5-mini, and tasked them with local file retrieval, web searches, article writing, and email sending, aiming to reveal the true capabilities of this “conductor.”
The results showed that some models performed poorly, especially in steps requiring browser control, such as web searches and email sending, where most failed. Experts bluntly stated that current OpenClaw is not only difficult to use and expensive but also a security nightmare.
Comparison of real tests: GPT-5, MiniMax, and Zhipu complete tasks, while the other two large models lack “actionability”
OpenClaw itself is not a large model; it functions more like a “conductor,” responsible for receiving user commands, calling tools, organizing workflows, and delegating understanding and specific tasks to the external large models it connects to.
Therefore, the capabilities, stability, and expression of the connected large models determine the final success or failure of the tasks.
Current large models supported by OpenClaw (Image source: OpenClaw configuration interface)
To better simulate real work scenarios, testers set a comprehensive task:
Allow OpenClaw connected to different large models to find a transcript of an interview with “Electric Vehicle Guru” Andy Palmer on the computer, requiring it to summarize the content, then combine the search results to write a feature article, and finally send the article via email to a specified address.
This task involves understanding commands, local file retrieval, browser control for web searches, information integration, article writing, and application control across multiple dimensions.
In the first round of testing, the performance of each model varied significantly.
● OpenClaw + Qwen3-Max
First, testing Qwen3-Max. The model struggled with local file retrieval. Even when testers clearly indicated the file location on the computer, Qwen3-Max took about five minutes to search and still couldn’t locate it accurately.
In subsequent isolated tests for email sending, Qwen3-Max also failed to execute, merely repeating commands without actual action.
● OpenClaw + Kimi-K2.5
Kimi-K2.5 performed slightly better, successfully retrieving the file within five minutes and summarizing its content. However, when performing web searches to supplement industry news, it triggered a “429 error” (usually indicating too many requests), failing to complete the search.
In the email sending step, Kimi-K2.5 could not successfully control the browser to send an email to the specified address.
● OpenClaw + MiniMax-M2.1
MiniMax-M2.1 faced no obvious obstacles in file retrieval, web search, or writing. During email sending, it encountered difficulties controlling the browser but did not get stuck; instead, it proactively suggested a feasible solution.
Testers manually operated based on its hints, successfully resolving the issue and enabling email sending.
However, the email sent by MiniMax-M2.1 only contained “key quotes” from the稿件, not the full text.
● OpenClaw + MiniMax-M2.5
Although both are under MiniMax, the February 12 release MiniMax-M2.5 performed better than M2.1, successfully completing file retrieval, web search, writing, and email sending without manual intervention.
● OpenClaw + Zhipu GLM-4.7
Since OpenClaw has not yet integrated Zhipu’s latest model GLM-5 released on February 12, this test used GLM-4.7.
Results showed that during email sending, GLM-4.7 would input an incorrect email URL in the browser, causing page access failure and requiring manual correction.
Apart from that, GLM-4.7 processed other steps relatively quickly.
● OpenClaw + GPT-5-mini
GPT-5-mini performed more stably and smoothly. From file retrieval, content summarization, web search, data supplementation to email sending, almost no manual intervention or prompts were needed, with only occasional network instability.
To ensure rigorous testing, the entire process was repeated twice.
Second round of testing results:
● Kimi-K2.5: Successfully retrieved and read local files, supplemented web search data, but still failed at email sending. Its report indicated issues with reading email network code and obtaining input box nodes.
● Qwen3-Max: Successfully read files and supplemented web data, but experienced significant lag during email sending and failed.
● MiniMax-M2.1/2.5: Completed all steps.
● GLM-4.7: Completed all steps.
● GPT-5-mini: Completed all steps.
Third round of testing results:
● Kimi-K2.5: Successfully retrieved local files but encountered issues during web search (errors in reading webpage content, incorrect website URLs, inability to understand browser console commands), and still failed at email sending.
● Qwen3-Max: Read files successfully but could not control the browser for web searches, and failed at email sending.
● MiniMax-M2.1/2.5: Completed all steps.
● GLM-4.7: Completed all steps.
● GPT-5-mini: Completed all steps.
Industry perspective: OpenClaw’s upper limit depends on the large model it connects to; it has not yet become a qualified productivity tool.
This conclusion is also widely recognized in the industry.
A programmer who uses OpenClaw to assist with online store operations, designing posters and coupons, told Meiri reporter, he usually connects to OpenAI’s Codex-5.3 and Gemini 3 Pro models, which outperform domestic large models by far.
Several industry insiders and experienced users pointed out that OpenClaw is more like a “task framework,” whose ultimate performance heavily depends on the capabilities of the connected large models. Like a commander with clear instructions but limited ability, the strength of its “soldiers” (the large models) directly determines the outcome of the campaign.
Huan Jiazhen, head of research at Feifan Product, told Meiri that “the impact of models on OpenClaw really depends on task complexity. Top international large models have higher ceilings, but for ordinary tasks, domestic models like Zhipu GLM-4.7 and Kimi-K2.5 are quite good, after all Claude is too expensive for most budgets.”
Although some large models show potential for complex tasks, OpenClaw still has a long way to go to become a truly effective productivity tool.
Former Xiaomi OS AI product expert and now founder of overseas AI application company ExcelMaster.ai, Zhang He, bluntly said in an interview with Meiri, “OpenClaw at its current version is not a qualified productivity tool.” He believes OpenClaw is to some extent a wrapper around the popular programmer tool Claude Code from Anthropic, which, although better encapsulated with chat interface and built-in skills, does not surpass core capabilities. “I haven’t found many things OpenClaw can do that Claude Code can’t, and its data querying level isn’t as good as Claude Code’s.”
He added, “Once large model capabilities improve further, OpenClaw will become better and more widespread. Even if it does nothing but wait for new models, the barrier will lower.” Zhang emphasized that the progress and adoption of OpenClaw fundamentally depend on breakthroughs in underlying large model technology.
Akamai cloud and AI product manager Dr. Zhang Lu also shared similar views. He believes that for OpenClaw to be truly used in production, it must undergo secondary development and fine-tuning, as the current version is still “a bit immature and often stalls.”
High barriers, high costs, and high risks discourage ordinary users
In addition to dependence on large model capabilities, technical barriers, usage costs, and security risks make OpenClaw difficult for general users.
First, the deployment and usage require significant technical expertise. OpenClaw currently does not offer a “one-click” simplified deployment solution; users must operate via command line on their computers to configure local settings, dependencies, and permissions. Meiri’s tech developers said the process demands a certain technical background, at least basic development experience, which undoubtedly discourages most non-technical users. While cloud providers like Alibaba Cloud, Tencent Cloud, and Amazon Cloud offer cloud deployment services for OpenClaw, claiming easy setup on their configured servers, these cloud deployments do not provide control over the user’s local computer.
High costs are another reality. Since OpenClaw frequently calls large models during tasks, token consumption is huge, making it a “token burner.” Some users told Meiri that just 20 interactions with Zhipu GLM-4.7 cost around 200 yuan.
Dr. Zhang Lu also mentioned that he spent dozens of yuan in a day using DeepSeek. For more powerful models, bills could be staggering—hundreds of yuan daily.
High costs force many users to opt for free or cheaper models, but this impacts performance. Some users reported that they chose Qwen-8B due to cost, but OpenClaw only answered questions without executing commands.
More concerning than high barriers and costs are the inherent security risks. Since OpenClaw is designed to “do things” rather than just chat, it requires high system permissions to control local files and applications.
Amy Chang, head of Cisco’s AI Threat Research and Security Team, bluntly said, “From a security perspective, OpenClaw is a nightmare,” as it can run shell commands, read/write files, and execute scripts on the user’s machine with high privileges. If misconfigured or exploited by malicious instructions, the consequences could be disastrous.
Cybersecurity firm Dvuln’s founder Jamieson O’Reilly demonstrated this risk, discovering vulnerabilities that could allow attackers to access months of private messages, account credentials, API keys, and other sensitive information. Even more alarming, bank accounts, crypto wallets, and API keys stored in plaintext for AI tasks could be stolen in seconds if hacked.
OpenClaw’s developer Peter Steinberger admitted that it is a free, amateur open-source project requiring careful configuration to ensure security. He clearly stated, “It’s not suitable for non-technical users.”
(Article source: Daily Economic News)
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
Is the so-called "AI Worker" OpenClaw worth using? In-depth testing by a reporter: can't find files, search errors, email sending gets stuck!
Claimed to be an AI Agent tool that can “take over computers and liberate hands,” OpenClaw has recently become extremely popular in the tech circle.
It is praised as an “AI worker,” seemingly capable of writing articles, sending emails, or even buying coffee just by giving commands. But is that really the case? Is it a productivity神器, or just a “toy” for tech enthusiasts to try out?
Recently, reporters from Daily Economic News, together with developers from Meiri Technology, conducted an in-depth test. We connected OpenClaw to five domestic large models—Qwen3-Max, Kimi-K2.5, MiniMax-M2.1, MiniMax-M2.5, and Zhipu GLM-4.7—as well as OpenAI’s GPT-5-mini, and tasked them with local file retrieval, web searches, article writing, and email sending, aiming to reveal the true capabilities of this “conductor.”
The results showed that some models performed poorly, especially in steps requiring browser control, such as web searches and email sending, where most failed. Experts bluntly stated that current OpenClaw is not only difficult to use and expensive but also a security nightmare.
Comparison of real tests: GPT-5, MiniMax, and Zhipu complete tasks, while the other two large models lack “actionability”
OpenClaw itself is not a large model; it functions more like a “conductor,” responsible for receiving user commands, calling tools, organizing workflows, and delegating understanding and specific tasks to the external large models it connects to.
Therefore, the capabilities, stability, and expression of the connected large models determine the final success or failure of the tasks.
Current large models supported by OpenClaw (Image source: OpenClaw configuration interface)
To better simulate real work scenarios, testers set a comprehensive task:
Allow OpenClaw connected to different large models to find a transcript of an interview with “Electric Vehicle Guru” Andy Palmer on the computer, requiring it to summarize the content, then combine the search results to write a feature article, and finally send the article via email to a specified address.
This task involves understanding commands, local file retrieval, browser control for web searches, information integration, article writing, and application control across multiple dimensions.
In the first round of testing, the performance of each model varied significantly.
● OpenClaw + Qwen3-Max
First, testing Qwen3-Max. The model struggled with local file retrieval. Even when testers clearly indicated the file location on the computer, Qwen3-Max took about five minutes to search and still couldn’t locate it accurately.
In subsequent isolated tests for email sending, Qwen3-Max also failed to execute, merely repeating commands without actual action.
● OpenClaw + Kimi-K2.5
Kimi-K2.5 performed slightly better, successfully retrieving the file within five minutes and summarizing its content. However, when performing web searches to supplement industry news, it triggered a “429 error” (usually indicating too many requests), failing to complete the search.
In the email sending step, Kimi-K2.5 could not successfully control the browser to send an email to the specified address.
● OpenClaw + MiniMax-M2.1
MiniMax-M2.1 faced no obvious obstacles in file retrieval, web search, or writing. During email sending, it encountered difficulties controlling the browser but did not get stuck; instead, it proactively suggested a feasible solution.
Testers manually operated based on its hints, successfully resolving the issue and enabling email sending.
However, the email sent by MiniMax-M2.1 only contained “key quotes” from the稿件, not the full text.
● OpenClaw + MiniMax-M2.5
Although both are under MiniMax, the February 12 release MiniMax-M2.5 performed better than M2.1, successfully completing file retrieval, web search, writing, and email sending without manual intervention.
● OpenClaw + Zhipu GLM-4.7
Since OpenClaw has not yet integrated Zhipu’s latest model GLM-5 released on February 12, this test used GLM-4.7.
Results showed that during email sending, GLM-4.7 would input an incorrect email URL in the browser, causing page access failure and requiring manual correction.
Apart from that, GLM-4.7 processed other steps relatively quickly.
● OpenClaw + GPT-5-mini
GPT-5-mini performed more stably and smoothly. From file retrieval, content summarization, web search, data supplementation to email sending, almost no manual intervention or prompts were needed, with only occasional network instability.
To ensure rigorous testing, the entire process was repeated twice.
Second round of testing results:
● Kimi-K2.5: Successfully retrieved and read local files, supplemented web search data, but still failed at email sending. Its report indicated issues with reading email network code and obtaining input box nodes.
● Qwen3-Max: Successfully read files and supplemented web data, but experienced significant lag during email sending and failed.
● MiniMax-M2.1/2.5: Completed all steps.
● GLM-4.7: Completed all steps.
● GPT-5-mini: Completed all steps.
Third round of testing results:
● Kimi-K2.5: Successfully retrieved local files but encountered issues during web search (errors in reading webpage content, incorrect website URLs, inability to understand browser console commands), and still failed at email sending.
● Qwen3-Max: Read files successfully but could not control the browser for web searches, and failed at email sending.
● MiniMax-M2.1/2.5: Completed all steps.
● GLM-4.7: Completed all steps.
● GPT-5-mini: Completed all steps.
Industry perspective: OpenClaw’s upper limit depends on the large model it connects to; it has not yet become a qualified productivity tool.
This conclusion is also widely recognized in the industry.
A programmer who uses OpenClaw to assist with online store operations, designing posters and coupons, told Meiri reporter, he usually connects to OpenAI’s Codex-5.3 and Gemini 3 Pro models, which outperform domestic large models by far.
Several industry insiders and experienced users pointed out that OpenClaw is more like a “task framework,” whose ultimate performance heavily depends on the capabilities of the connected large models. Like a commander with clear instructions but limited ability, the strength of its “soldiers” (the large models) directly determines the outcome of the campaign.
Huan Jiazhen, head of research at Feifan Product, told Meiri that “the impact of models on OpenClaw really depends on task complexity. Top international large models have higher ceilings, but for ordinary tasks, domestic models like Zhipu GLM-4.7 and Kimi-K2.5 are quite good, after all Claude is too expensive for most budgets.”
Although some large models show potential for complex tasks, OpenClaw still has a long way to go to become a truly effective productivity tool.
Former Xiaomi OS AI product expert and now founder of overseas AI application company ExcelMaster.ai, Zhang He, bluntly said in an interview with Meiri, “OpenClaw at its current version is not a qualified productivity tool.” He believes OpenClaw is to some extent a wrapper around the popular programmer tool Claude Code from Anthropic, which, although better encapsulated with chat interface and built-in skills, does not surpass core capabilities. “I haven’t found many things OpenClaw can do that Claude Code can’t, and its data querying level isn’t as good as Claude Code’s.”
He added, “Once large model capabilities improve further, OpenClaw will become better and more widespread. Even if it does nothing but wait for new models, the barrier will lower.” Zhang emphasized that the progress and adoption of OpenClaw fundamentally depend on breakthroughs in underlying large model technology.
Akamai cloud and AI product manager Dr. Zhang Lu also shared similar views. He believes that for OpenClaw to be truly used in production, it must undergo secondary development and fine-tuning, as the current version is still “a bit immature and often stalls.”
High barriers, high costs, and high risks discourage ordinary users
In addition to dependence on large model capabilities, technical barriers, usage costs, and security risks make OpenClaw difficult for general users.
First, the deployment and usage require significant technical expertise. OpenClaw currently does not offer a “one-click” simplified deployment solution; users must operate via command line on their computers to configure local settings, dependencies, and permissions. Meiri’s tech developers said the process demands a certain technical background, at least basic development experience, which undoubtedly discourages most non-technical users. While cloud providers like Alibaba Cloud, Tencent Cloud, and Amazon Cloud offer cloud deployment services for OpenClaw, claiming easy setup on their configured servers, these cloud deployments do not provide control over the user’s local computer.
High costs are another reality. Since OpenClaw frequently calls large models during tasks, token consumption is huge, making it a “token burner.” Some users told Meiri that just 20 interactions with Zhipu GLM-4.7 cost around 200 yuan.
Dr. Zhang Lu also mentioned that he spent dozens of yuan in a day using DeepSeek. For more powerful models, bills could be staggering—hundreds of yuan daily.
High costs force many users to opt for free or cheaper models, but this impacts performance. Some users reported that they chose Qwen-8B due to cost, but OpenClaw only answered questions without executing commands.
More concerning than high barriers and costs are the inherent security risks. Since OpenClaw is designed to “do things” rather than just chat, it requires high system permissions to control local files and applications.
Amy Chang, head of Cisco’s AI Threat Research and Security Team, bluntly said, “From a security perspective, OpenClaw is a nightmare,” as it can run shell commands, read/write files, and execute scripts on the user’s machine with high privileges. If misconfigured or exploited by malicious instructions, the consequences could be disastrous.
Cybersecurity firm Dvuln’s founder Jamieson O’Reilly demonstrated this risk, discovering vulnerabilities that could allow attackers to access months of private messages, account credentials, API keys, and other sensitive information. Even more alarming, bank accounts, crypto wallets, and API keys stored in plaintext for AI tasks could be stolen in seconds if hacked.
OpenClaw’s developer Peter Steinberger admitted that it is a free, amateur open-source project requiring careful configuration to ensure security. He clearly stated, “It’s not suitable for non-technical users.”
(Article source: Daily Economic News)