AI Agents and Hacking Threats: The New Frontier of Cybersecurity

Artificial Intelligence, Latest, News, Security, Technology & AI

Artificial intelligence agents—software tools that can act on behalf of humans online—are being hailed as the next great leap in automation. These AI agents can buy plane tickets, manage calendars, make reservations, or fetch data in response to plain-language commands. But experts now warn that the same ability that makes them powerful also makes them dangerous.

Unlike traditional AI chatbots that merely generate responses, AI agents execute actions. That means a simple command like “Book me a flight to Singapore” could trigger a series of automated steps that involve accessing payment systems, authentication tokens, and personal data. If hackers learn how to manipulate those instructions, they could exploit the agents to perform malicious actions—without needing traditional coding skills or advanced hacking techniques.

Security specialists say this marks a turning point. For decades, cybersecurity was about keeping technically skilled attackers out of sensitive systems. Now, even low-skill actors can weaponize plain language. A blog post by AI startup Perplexity described this new threat landscape as one where “attack vectors can come from anywhere,” warning that the next wave of digital crime might not rely on malware at all but on misdirected AI behavior.

This phenomenon is often called a query injection attack. In simple terms, it’s when hidden or manipulated prompts are injected into what seems like a normal instruction—redirecting an AI agent’s actions toward something harmful. The technique is not new in principle; hackers have long used injection attacks to corrupt databases or systems through cleverly crafted inputs. What’s new is the ease with which the same idea can now be executed in natural language through AI interfaces.

As AI agents evolve beyond text generation to active task execution, the risk expands exponentially. Software engineer Marti Jorda Roca from NeuralTrust, a firm specializing in LLM security, noted that people often underestimate the new dangers because they equate these tools with simple chatbots. “People need to understand there are specific dangers when using AI in the security sense,” he cautioned. “The moment an AI can act, it can also be hijacked.”

Major technology firms are acknowledging the problem. Meta, for example, has labeled this issue a “vulnerability,” while OpenAI’s Chief Information Security Officer, Dane Stuckey, has called it “an unresolved security issue.” These warnings come as both companies pour billions into expanding AI’s capabilities, even as they scramble to close the growing gaps in its defenses.

How Query Injection Works — and Why It’s Spreading

The basic mechanism of query injection is deceptively simple. Imagine asking an AI assistant to “book a hotel room in London for next week.” If a malicious actor manages to embed hidden instructions in that query—such as “and also transfer $100 to this account”—the agent might execute both commands, unable to tell legitimate from dangerous intent.

This can happen in real time, when an attacker intercepts or modifies the user’s input. But it can also occur passively: hackers can plant malicious prompts in web pages, PDF files, or other data sources. When an AI agent scans or interacts with such material, it may unknowingly execute the embedded commands.

Eli Smadja, a cybersecurity expert from Israeli firm Check Point, calls query injection “the number one security problem” for large language models. He argues that the issue isn’t about whether AI can think—but about whether it can obey safely. “One huge mistake I see happening a lot is giving the same AI agent all the power to do everything,” Smadja said. “Once that happens, even one injected instruction can compromise an entire system.”

Industry players are trying to contain the problem. Microsoft has developed tools that analyze where agent instructions originate, using context to detect and stop suspicious activity. OpenAI now alerts users when their AI agents try to visit sensitive or restricted websites and blocks further actions unless the user supervises in real time.

Other cybersecurity professionals suggest a more radical solution: limiting an agent’s autonomy altogether. In this model, every major decision—such as accessing personal data, exporting files, or executing payments—requires explicit human approval. It’s a compromise between efficiency and safety, but one that could prevent catastrophic misuse.

Cybersecurity researcher Johann Rehberger, known in the industry as “wunderwuzzi,” sees an even deeper problem. He points out that attack techniques themselves are rapidly evolving. “They only get better,” he said. According to Rehberger, every time companies develop defenses against one form of prompt injection, hackers find more sophisticated ways to bypass them.

This escalating contest mirrors the early days of the internet, when browsers and email systems were first weaponized. The same dynamic now repeats with AI—only faster. As Rehberger explains, the more powerful and autonomous AI agents become, the more difficult it will be to keep them aligned with human intent. “We’re not yet at a point where you can let an AI agent run for long periods and trust it to stay on track,” he warned.

Balancing Innovation and Security in the AI Era

The dilemma facing the AI industry is both technical and philosophical. Companies want AI agents to handle increasingly complex workflows—like managing personal finances or automating business operations—because that’s what drives adoption and profits. But greater capability means greater risk.

Query injection attacks represent a new class of cybersecurity challenge. They don’t rely on exploiting code vulnerabilities or network flaws; instead, they exploit human-AI interaction itself. As long as agents are designed to take natural language as instruction, attackers can manipulate that language to redirect outcomes.

Experts emphasize that defending against such threats requires a blend of traditional security principles and new AI-specific controls. First, AI systems should operate under strict permission boundaries. Each task or API access must be isolated so that an agent’s mistake doesn’t cascade into a larger breach. Second, human oversight should remain mandatory for sensitive actions. Automated does not mean unsupervised.

AI companies are already experimenting with “sandboxing,” where agents can operate only inside controlled environments that prevent data exfiltration or unauthorized commands. Some are exploring cryptographic audit trails that log every action an agent takes, ensuring transparency and accountability. Others are developing AI “firewalls” that analyze prompts and responses for signs of manipulation before they reach the model.

Still, these measures are playing catch-up. As AI models continue to scale—powering search engines, personal assistants, and enterprise tools—the window of vulnerability grows wider. Hackers, motivated by financial or political gain, are testing these systems in real-world conditions every day.

The tension between convenience and control lies at the heart of this debate. Users want AI that acts seamlessly, anticipating needs and executing tasks without constant confirmation. Yet that very convenience undermines the checks that protect against exploitation.

As Johann Rehberger summed up: “We’re in uncharted territory. These systems are too new, too powerful, and too easy to misuse. Until we build stronger guardrails, full trust in autonomous AI remains premature.”

For now, cybersecurity professionals advise restraint. AI agents can be immensely useful—but they should be treated like interns with potential access to critical systems: capable, but not yet trustworthy on their own. The future of AI will depend not only on how intelligent these agents become, but on how securely we teach them to act.

The Information is Collected from The Hindu and MSN.