AI Researchers Discover Method to Jailbreak Bard and ChatGPT
Researchers from Carnegie Mellon University in Pittsburgh and the Center for AI Safety in San Francisco have reportedly discovered a method to bypass the safety measures in place for AI chatbots like Google’s Bard and OpenAI’s ChatGPT, according to a Business Insider report.
These researchers have effectively “jailbroken” the chatbots, a term usually associated with software modifications that grant total system access. They utilized jailbreak tools typically used for open-source AI models on closed systems such as ChatGPT.
One primary tactic used in this process is known as automated adversarial attacks. By adding extra characters to the end of a user query, they found a way to outmaneuver the protective barriers installed by Google and OpenAI. This strategy could potentially be used to prompt the chatbots into generating harmful or misleading content.
According to the researchers, this method is entirely automated and could enable an almost limitless number of similar attacks. Google, OpenAI, and Anthropic have all been informed about these techniques.
In response, a representative from Google mentioned that while such issues affect all language learning models, the company has embedded critical safety features in Bard. The spokesperson assured that these protections would continue to be enhanced over time.
Nevertheless, the researchers expressed some uncertainty regarding the ability of AI developing companies to fend off such attacks.