Working NLP on Morrowind dialogues

Jan 21 2019. Written by brainific

Morrowind, the third installment in the Elder Scrolls series by Bethesda Softworks, is an epic fantasy action RPG with an incredibly alien setting and a complex narrative. It includes a sheer amount of in-game content texts, describing past history and current events in minute detail, even including conflicting accounts of critical game events of, really, historical importance. With twelve joinable factions, plus a few better hidden ones, there is a huge number of quests to take, some locking away other choices. What is more interesting, both main and side quests includes lengthy dialogues with different NPCs about a whole range of topics, both quest-centered and background. Maybe its graphics are not as flashy as the latest games, but its story and setting still beats a lot of them by far.

Poor Fargoth lost his stash because of you
(via https://calezane.home.xs4all.nl/mw/mwmods.htm)

During FDG18, we approached Judith van Stegeren, a (now friend of us) researcher interested in text generation in games and well versed in formal methods. It happens that Morrowind is also a favourite of Judith, so we decided to share some common NLP code with her to extract and analyze dialogues in the game. Let’s start with the basics.

Extracting dialogue from Morrowind

Both Fallout and Elder Scrolls games by Bethesda use a format called ESM to bundle the scripts and contents used in the game, as well as a few more. Modders provide new ESM files, often (but not always) created with Bethesda’s own toolkits, that bundles quests, NPCs or terrain features, to name a few categories. In the case of Morrowind, we can use the Creation Kit included (available here for the Steam GOTY edition) to dump all the dialogue lines in the game. After starting the tool, we must load the Morrowind ESM file:

The Object View window allows us to inspect and modify different objects in the game, like ingredients, books or NPCs:

By double-clicking an NPC, we can inspect their dialogue. The Dialogue items are well explained in this article. They are presented in response to topic selection in the dialogue box, according to a list of rules evaluated in order from top to bottom each having up to six slots for functions or conditions. Conditions can check quest progress or PC identity, among others.

But we are, in fact, interested in exporting all of the dialogue. To do this, we use the File menu in the main window, selecting the Export Data option. As you can see, we can export other object classes with dialogue like books. The resulting file contains all the items in the dialogue window separated by tabs.

POS tagging and WordNet analysis

Now we need to get some NLP tool to work with. In this case we will use the widely known NLTK Python package, including Stanford’s CoreNLP Java implementation. First, we will need to install NLTK itself, as shown here, and then load the packages we want into NLTK, as described in this other page. The CoreNLP server starts an HTTP server that will process requests received on a configured port, and NLTK handles the communication for the user.

Using Stanford’s CoreNLP we can perform POS tagging. We will use the grammatical category to select the verbs and cluster them in a tree using the existing categories in WordNet. The idea is to try and analyze the semantics with a double purpose:

  • Separate mechanics from “flavour” verbs. We want our agents to understand and use in-game actions using verbs, like in “I believe that Fargoth will attack you if you find his stash” (a sentence that can be formalized with a slightly more complex logic than Dynamic Epistemic Logic). But, at the same time, we want to add evoking background content that does not tie into mechanics, e.g. describing Fargoth’s family or raising.
  • Find useful mechanics that are described by semantically related words. Verbs like “give”, “take”, “lend”, “buy”, indicate a “property” mechanics, possibly augmented with the possibility of changing the property of an object, maybe in exchange for other goods. The amount of times these verbs appear in the dialogue might also give us an idea of how important they are in terms of mechanics.

But first we need to convert the exported files into a proper Python object. The code we have used for these examples is stored in a git repository for you to use freely. The items are strings separated by tabs, so a simple split will handle the values. The DialogueItem class takes the list of strings from the split and organizes the values for the dialogue item configuration.

def read_items():
reader = MWDialogueReader('csmwinddial.txt')
f = open(reader.fname, 'r')
ditems = []
for line in f:
terms = line.split('\t')[:-1]
for i in range(0,len(terms)):
terms[i] = terms[i].strip('"')

Once we have all the dialogue items, we only need to extract the lines from each one and use a POS (“part-of-speech”) tagger to obtain the grammatical category for each word. In this example, we will focus on verbs, passing them to the next stage.

parser = CoreNLPParser(url=''.join(['http://localhost:', str(nlp_port)]))
[...]
lemmatizer = WordNetLemmatizer()
[...]
for sentence in parser.parse_text(parag):
postagged = sentence.pos()
for (word, categ) in postagged:
if categ.startswith('V'):
words_per_category[categ].add(lemmatizer.lemmatize(word.lower(), pos='v'))

For each word, we will obtain the WordNet categories associated to that word. WordNet provides a hierarchical semantic classification where the meaning of a word is increasingly narrower in each step. For example, “obtain” (come into possession of) is a specialization of a general “get” meaning (come into the possession of something concrete or abstract), different from, for example, “accept” (receive willingly something given or offered). These meanings will be combined into a tree formed by these specialiations. However, manual intervention would be needed at this point since the specific meaning from the set of possible options is probably difficult to extract automatically. Part of this tree is reproduced as an example.

"accept.v.03": [
"give an affirmative reply to; respond favorably to",
[
"=accept",
2,
{
"agree.v.02": [
"consent or assent to a condition, or agree to do something",
[
"=agree",
0,
{}
]
],
"permit.v.01": [
"consent to, give permission",
[
"=allow",
4,
{
[...]

While avoiding explaining the complete output format, we can see that “agree” and “allow” are present in the game dialogue, possibly with a more specific meaning than the parent term “accept”, also present. It is then up to the designer then decide if these verbs can be tied to a game mechanic, suggesting some kind of negotiation actions whereby a proposal or a piece of information is presented for either allowance or agreement; or is just part of some narrative text presented to the player but not represented in the game mechanics.

We hope that you liked this short introduction to NLP tools for game dialogue analysis. Further articles will delve into mechanics for negotiation and belief for game agents, as well as additional language tools. Stay tuned!

Tags: , , ,

Leave a Reply