A pair of researchers have proven that Anthropic’s downloadable demo of its generative AI model Claude for developers completed an online transaction requested by one of them — in seemingly direct violation of the AI’s accumulated learning and baseline programming.

Sunwoo Christian Park, a researcher, Waseda School of Political Science and Economics in Tokyo and Koki Hamasaki, a research student at Bioresource and Bioenvironment at Kyushu University in Fukuoka, Japan found the discovery as part of a project evaluating the safeguards and ethical standards surrounding various AI models.

“Starting next year, AI agents will increasingly perform actions based on prompts, opening the door to new risks. In fact, many AI startups are planning to implement these models for military uses, which adds an alarming layer of potential harm if these agents can be easily exploited through prompt hacking,” explained Park in an email exchange.

In October, Claude was the first generative AI model that could be downloaded to a user’s desktop as demo for developer use. Anthropic assured developers — and users who jumped through the techie hoops to get the Claude download onto their systems — that the generative AI would take limited control of desktops to learn basic computer navigation skills and search the internet.

However, within two hours of downloading the Claude demo, Park says that he and Hamasaki were able to prompt the generative AI to visit Amazon.co.jp — the localized Japanese storefront of Amazon using this single prompt.

Not only were the researchers able to get Claude to visit the Amazon.co.jp website, locate a product and enter the product in the shopping cart — the basic prompt was enough to get Claude to ignore its learnings and algorithm — in favor of finishing the purchase.

A three-minute video of the entire transaction can be viewed below.

It’s interesting to see at the end of the video the notification from Claude alerting the researchers that it had completed the financial transaction — deviating from its underlying programming and aggregated training.

“Although we do not yet have a definitive explanation for why this worked, we speculate that our ‘jp.prompt hack’ exploits a regional inconsistency in Claude’s compute-use restrictions,” explained Park.

“While Claude is designed to restrict certain actions, such as making purchases on .com domains (e.g., amazon.com), our testing revealed that similar restrictions are not consistently applied to .jp domains (e.g., amazon.jp). This loophole allows unauthorized real world actions that Claude’s safeguards are explicitly programmed to prevent, suggesting a significant oversight in its implementation,” he added.

The researchers point out that they know that Claude is not supposed to make purchases on behalf of people because they asked Claude to make the same purchase on Amazon.com — the only change in the prompt was the URL for the U.S. storefront versus the Japan storefront. Here was the response Claude provided for the specific Amazon.com query.

The full video of the Amazon.com purchase attempt by researchers using the same Claude demo can be viewed below.

The researchers believe the issue is related to how the AI identifies various websites as it clearly differentiated between the two retail sites in different geographies, however, it’s unclear as to what may have triggered Claude’s inconsistent actions.

“Claude’s compute-use restrictions may have been fine tuned for .com domains due to their global prominence, but regional domains like .jp might not have undergone the same rigorous testing. This creates a vulnerability specific to certain geographic or domain-related contexts,” wrote Park.

“The absence of uniform testing across all possible domain variations and edge cases may leave regionally specific exploits undetected. This underscores the difficulty of accounting for the vast complexity of real world applications during model development,” he noted.

Anthropic did not provide comment to an email inquiry sent Sunday evening.

Park says that his current focus is on understanding if similar vulnerabilities exist across different e-commerce websites as well as raising awareness regarding the risks of this emerging technology.

“This research highlights the urgency of fostering safe and ethical AI practices. The evolution of AI technology is moving quickly, and it’s crucial that we don’t just focus on innovation for innovation’s sake, but also prioritize the safety and security of users,” he wrote.

“Collaboration between AI companies, researchers, and the broader community is vital to ensure that AI serves as a force for good. We must work together to make sure that the AI we develop will bring happiness, enhance lives, and not cause harm or destruction,” concluded Park.

Share.

Leave A Reply

Exit mobile version