Agents in a virtual world need a brain to provide fun to the players: the smarter the brain, the merrier. However, this smartness is bound by two limits: processing power and algorithms. As the game Republique has shown, processing power is not an issue, once the detail is kept under control. On the algorithmic side, there are several alternatives that are now successfully in use, each one with their pros and cons, of course. These are hierarchical state machines (HFSM), behaviour trees (BT), goal oriented planning (GOAP, a variant of the STRIPS planner) and hierarchical task networks (HTN). Our first intention was to provide a comparison of these approaches to highlight their similarities and differences. However, we found two excellent summaries in AI Game Dev and the Intrinsic Algorithm blog that already fill this gap. So, we would like to take a somewhat more academic approach to try and tackle the problem of a unified approach that could cover all these alternatives.
First, we would like to point to the Open Course Ware provided by Professor Daniel Borrajo’s Planning and Learning Group, at the Carlos III University in Madrid. They contain a comprehensive guide to automated planning and related areas that we will discuss in this post.
Behaviour trees and goal oriented planning are, actually, just two parts of a bigger animal: planning algorithms. GOAP is exactly a small example of such algorithms: if you provide them with a goal and a set of possible actions, then they will provide a plan, a sequence of actions that will finally reach the goal. This is commonly known as classical planning, the simplest form of planning. Each action has some preconditions (conditions to be met before the action can be performed) and some postconditions (conditions that will be true when the action is performed). The agent tries some sort of forward chaining (from current situation to goal) or backward chaining (from goal to the current situation) in its “head” to try and find a path of actions that will lead us to the goal. As we are exploring the implicit graph made of states and actions between states, this can take some time. A useful algorithm called A* is used in GOAP to accelerate this task. It is more commonly known as the main algorithm for navigation in a mesh, but it can be used in any kind of graph as long as we have a useful heuristic, an “optimistic” measurement of the distance between any state and the goal. This heuristic must always be less than or equal to the real distance between a state and a goal. In an environment with obstacles, for example, the straight distance between points can be used since any path avoiding obstacles. Finding a heuristic for an abstract graph of an “state space” with actions is somewhat trickier, but there are alternatives like relaxing intermediate steps. Planning can get much more complex in several dimensions (see slide 4 from this set from Stanford), like rich temporal models, durative and nondeterministic actions, or partial observability of the results of our actions.
Any FPS designer will immediately recognize a couple of limitations in this approach. What if we fail in an action? When do we consider an action as finished? Let us approach each of these aspects separately.
First of all, we have said that actions have a set of postconditions. This is not entirely true; they have a distinct set for each possible outcome of the action. Imagine the action of firing: we could hit the target, or we could miss. The outcome is different in each case, and depending on it we may need to take one action or another, and probabilistic in nature, so that the end state depends on the current state, the action and some random number (it is, formally, a Markov decision process). One possible approach is to find the best sequence of actions in average. Yet another approach is to provide contingent planning: instead of finding a sequence of actions, we compute, at least for some paths, corrective actions in case our actions fail. That leaves us with a tree of actions to execute when the main actions fail. That rings a bell, right?
Exactly, a behaviour tree can be considered exactly this: a plan with additional contingent plans on several levels. Of course, we are missing the whole pre- and postconditions parts that enable us to reason about plans. This prevents us from adapting our actions in the tree or discarding branches that we may know will fail. Maybe if we added pre- and post- to the actions and separated the tree declaration from its execution we could unify the approaches. But we will try that out in a later post.
And what about FSMs and HTNs? FSMs are a way of specifying traces of actions with mechanisms that are hard to build into behaviour trees, like recursion. Imagine switching from the alert to the relaxed state and viceversa in a behaviour tree. One of the states must be the parent of the other; adding a third state will further confuse things because we will need to work out an artificial hierarchy among them. We might use sequence and loop nodes to organize actions, but state switching as described here can be hard to express in a behaviour tree.
Regarding HTNs, they also come from the planning domain, as a sort of “hint” to alleviate the computational burden on the planner by specifying some conditions that need to be reached at some steps. Instead of leaving the planner explore all the state space, the designer will mark some spots that the agent has to reach: We do not fix actions in this way, but rather subgoals. If we want to buy meat, we first need to go to the butcher, and then get the meat. The agent can go to the butcher walking, taking a taxi or driving the car, and fulfill these subgoals with different actions (like building a car or hiring a driver to drive the taxi). The specific actions selected for the subgoal may depend on some metric, like probability of success or a utility measurement (e.g. being “green” instead of being fast). HTNs are probably a good middle ground among behaviour trees, GOAP and utility based actions.
In a future post we will try to find some middle ground among all these. Warning! It may include Haskell, graph algorithms and cognitive architectures…