Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI. learn more
among us first installmentWe’ve outlined key strategies for leveraging AI agents to improve enterprise efficiency. Unlike standalone AI models, we demonstrated how agents use context and tools to iteratively refine tasks and enhance outcomes, such as code generation. We also discussed how multi-agent systems facilitate cross-departmental communication, create a unified user experience, and promote productivity, resiliency, and faster upgrades.
The success of building these systems depends on not only mapping roles and workflows, but also establishing safeguards such as human oversight and error checking to ensure safe operation. Let’s take a closer look at these important elements.
Safeguards and autonomy
Because agents include autonomy, agents in multi-agent systems must incorporate various safeguards to reduce errors, waste, legal exposure, and harm when agents operate autonomously. . Applying all of these safeguards to all agents may be overkill and create resource problems, but consider all agents in the system and decide which of these safeguards are needed. I highly recommend making a conscious decision. If any of these conditions are met, the agent should not be allowed to operate autonomously.
Explicitly defined human intervention conditions
Triggering one of a set of predefined rules determines the conditions under which a human must observe the agent’s behavior. These rules must be defined on a case-by-case basis and can be declared at the agent’s system prompt. Or, for more critical use cases, it can be applied using deterministic code outside of the agent. For purchasing agents, one such rule is: “All purchases must first be verified and confirmed by a human. Do not call the ‘check_with_human’ function and proceed until it returns a value.” ”
protection agent
Safeguard agents can be combined with agents whose role is to check for risky, unethical, or non-compliant behavior. An agent can force the safeguard agent to constantly check all or specific elements of its operations and not proceed unless the safeguard agent returns the go-ahead.
uncertainty
Our laboratory has recently paper Research into techniques that can provide a measure of uncertainty in what large-scale language models (LLMs) produce. Because LLM has a tendency for confabulation (commonly known as hallucination), prioritizing certain outputs greatly increases the agent’s reliability. Again, there are costs that must be paid. To assess uncertainty, you need to generate multiple outputs for the same request, rank the outputs based on their certainty, and be able to choose the behavior with the least uncertainty. This should be considered for more critical agents in the system, as this can slow down the system and increase costs.
release button
It may be necessary to stop all autonomous agent-based processes. This may be because you need consistency or because you have detected a behavior in your system that needs to be stopped while you figure out what the problem is and how to fix it. For more critical workflows and processes, we recommend provisioning a deterministic fallback mode of operation, as it is important that this release does not halt all processes or make them completely manual. Masu.
Agent-generated work order
Not every agent in your agent network needs to be fully integrated into your app or API. This can be time consuming and requires several iterations to get it right. My recommendation is to add a generic placeholder tool to the agent (usually a leaf node in the network) that simply publishes a report or work order with recommended actions to be performed manually on the agent’s behalf. This is a great way to bootstrap and operate agent networks in an agile manner.
test
LLM-based agents allow you to achieve robustness at the expense of consistency. Also, given the opaque nature of LLM, you will be dealing with black-box nodes in your workflow. This means that agent-based systems require different testing regimes than those used with traditional software. Fortunately, however, we have been operating human-driven organizations and workflows since the dawn of industrialization, so we are accustomed to testing such systems.
Although the example shown above has a single entry point, all agents in a multi-agent system have an LLM as their brain and can act as an entry point to the system. You should use divide and conquer to test a subset of the system, starting with different nodes in the hierarchy.
Generative AI can also be used to come up with test cases that can be run against the network to analyze its behavior and push to uncover weaknesses.
Finally, I’m a big proponent of sandboxing. Such systems must first be launched on a small scale within a controlled and secure environment before being rolled out in stages to replace existing workflows.
Fine adjustment
A common misconception about gen AI is that the more you use it, the better it gets. This is clearly wrong. The LLM is pre-trained. That being said, it can be tweaked to bias the behavior in a variety of ways. Once a multi-agent system has been devised, we can choose to improve its behavior by taking logs from each agent and labeling them with preferences to build a fine-tuned corpus.
Pitfall
Multi-agent systems can go into a tailspin. This means that agents are constantly communicating with each other and queries may never finish. This requires some form of timeout mechanism. For example, you can review the communication history of the same query, and if the query becomes too large or you detect repetitive behavior, you can terminate the flow and start over.
Another problem that can occur is what I call overloading. In other words, expect too much from a single agent. In today’s state-of-the-art LLMs, you can’t give agents long, detailed instructions and expect them to always follow those instructions. Did I also mention that these systems can be inconsistent?
A way to alleviate this situation is what I call granulation. That is, splitting the agent into multiple connected agents. This reduces the load on each agent, increases the consistency of agent behavior, and reduces the chance of tailspins. (An interesting research area that our lab is working on is the automation of the granulation process.)
Another common problem with how multi-agent systems are designed is that they tend to define a coordinator agent that calls various agents to complete a task. This creates a single point of failure and can lead to fairly complex roles and responsibilities. My suggestion in cases like this is to consider the workflow as a pipeline, where one agent completes a portion of the work and hands it off to the next agent.
Multi-agent systems also have a tendency to cascade context to other agents. This can overload and confuse other agents, but is often unnecessary. It’s a good idea to allow the agent to keep its own context, and reset the context when it sees that it’s handling a new request (kind of like how sessions work on websites).
Finally, it is important to note that there are relatively high hurdles to the ability of an LLM to be used as the brain of an agent. Smaller LLMs may require rapid engineering and fine-tuning to meet your requests. The good news is that there are already several commercial and open source agents that meet the criteria, albeit on a relatively large scale.
This means that cost and speed must be important considerations when building large-scale multi-agent systems. We must also assume that while these systems are faster than humans, they are not as fast as the software systems we are used to.
Babak Hodjat is CTO of AI. I recognize.
data decision maker
Welcome to the VentureBeat community!
DataDecisionMakers are experts, including data technologists, who can share their data-related insights and innovations.
If you want to read about cutting-edge ideas, updates, best practices, and the future of data and data technology, join DataDecisionMakers.
Why not consider contributing your own articles?
Read more about DataDecisionMakers