Gamalyzer

By Joseph C. Osborn, advised by Prof. Michael Mateas.

Documentation for CMPS 261, Fall 2013.

Fig. 1. Gamalyzer applied to play traces from the puzzle game Refraction. Each trace is a polyline beginning at the bottom, each input is a colored glyph, and similar traces are grouped together. Thick lines stand in for several similar traces.

This page describes the Gamalyzer, a tool for the exploratory visualization of arbitrary games which clusters together similar game play traces (sequences of user decisions). Our two main contributions are a game-independent play trace similarity metric based on edit distance (the Gamalyzer metric) and a visualization technique (the Gamalyzer Visualization) which uses this metric to meaningfully show similarities and differences among large numbers of play traces.

Unlike previous play trace visualization tools, Gamalyzer is more concerned with trajectories than with states. Gamalyzer shows play traces directly and leaves spatiotemporal and other displays of game state up to game- or genre-specific auxiliary visualizations. The Gamalyzer metric can be applied to player and opponent modeling, goal recognition, adaptive tutorials, or even domains outside of games: anywhere one might want to compare sequences of (goal-directed) actions. The Gamalyzer visualization can quickly and inexpensively show designers the strategic landscape of their (arbitrary) game, without requiring specialist statistical or machine learning knowledge on the part of the designer.

We validate the Gamalyzer metric against one synthetic and two real-world data sets, finding that our general-purpose metric is nearly as good as hand-tuned feature selection for discerning designer-relevant differences between play traces.

The Gamalyzer Metric

Gamalyzer is based on the Gamalyzer Metric, an application of Chhieng and Wong's Constraint Continuous Edit Distance (CCED) to play traces. The key differences between CCED and regular edit distance are that edit paths are constrained to prevent matches of early parts of one sequence against later parts of the other, and that change operations replace match operations. These changes may be of variable cost; Gamalyzer's are defined based on an input-versus-input dissimilarity function grounded in a lexicographically weighted product order.

That's a mouthful. Basically, the input to Gamalyzer is a set of sequences of game events. Each game event (also called an input) has four properties: a time point, a player identifier, a determinant, and a value. The determinant is a token which identifies an input type—a regular move versus castling in chess—and the value is a compound term comprised of round lists (terminated with parentheses), square lists (terminated with square brackets), keywords, and numbers. The difference between two keywords is 1.0, the difference between two numbers depends on their distance (normalized with respect to their observed range), the difference between two square lists is the average distance of the corresponding members, and the difference between two round lists is a weighted sum with larger weights near the front of the list. The difference between two events with different determinants is also 1.0.

As is traditional, we implement CCED with a bottom-up dynamic programming algorithm. Because we adopt the Sakoe-Chiba Band (the constraint mentioned earlier), we have an algorithm which is basically linear with respect to the length of the first sequence of the pair being compared.

The Gamalyzer metric emits a time-series of dissimilarity matrices for every pair of play traces up to a given edit operation, and the Gamalyzer visualization makes use of these data.

The Gamalyzer Visualization

Fig. 2. Gamalyzer applied to data from the Mario AI Benchmark — Fig. 2. Gamalyzer applied to downsampled human play traces from the Mario AI Benchmark.

Fig. 3. The overall similarity of the same traces shown in Fig. 2. — Fig. 3. The overall similarity of the same traces shown in Fig 2.

Fig. 2 shows a typical Gamalyzer display for data from the Mario AI Benchmark. The most prominent features are the colored circles, one for each input. Each polyline connecting those inputs represents a play trace, and time proceeds from the bottom of the image towards the top. The Mario AI Benchmark receives 24 inputs per second and games last up to one minute; this has been downsampled to one aggregate action per second to simplify the visual display. Similar traces are nearer to each other than they are to dissimilar traces, and as traces become more and less similar over time they converge and diverge. The degree to which the incremental similarities influence the layout can be controlled by the user; Fig. 3 shows the case where only overall similarity is used. To faciliate quick visual comparison, different determinants are arbitrarily assigned different glyphs (see Fig. 1) and different value combinations are arbitrarily assigned different colors. When the user mouses over one or more inputs, their contents are shown in a hovering information box.

Using the Gamalyzer

This is academic prototype software at the moment, so configuration unfortunately requires some coding. Also, the data sets cannot be packaged with Gamalyzer; please email me and I can share them on a limited instructional basis.

Make sure you have a working JVM and Leiningen installation.
Check out the Github repository (link in the top right corner).
To try out one of the built-in data sets, change the mode definition in cli/ui/core.cljs to one of :mario, :refraction, or :synthetic, then run lein cljsbuild auto.
Background that process or open a new terminal on the repository, then run lein ring server and you're off to the races! Your server is at localhost:3000.
To use your own data sets, either output them in Gamalyzer's native format (see resources/traces/refraction for examples) or implement a new reader (see the modules in srv/gamalyzer/read/). Update srv/gamalyzer/rsrc/data.clj to handle your data set and pass along your path or call your reader, then modify core.cljs to make a request against your data set.

Native trace format

Terms in ALL-CAPITALS are standins.

[:properties ([:game :GAME-NAME] [:root :MAIN-GAME-PROC] [:events (:i)])]
[:log_start "UUID"]
[:i :PLAYER-ID :GAME-PROC DETERMINANT-BASE DETERMINANT-FINAL VALUE]
...
:log_end
[:log_start "UUID"]
[:i :PLAYER-ID :GAME-PROC DETERMINANT-BASE DETERMINANT-FINAL VALUE]
...
:log_end
...

The time point of each input is inferred by the event ordering. "Procs" are not necessary for Gamalyzer, but they can be a mnemonic for the designer; Chess might only have two procs ("move" and "promote"), and a game like Mario only one ("game"). The "main proc" is just a hook for establishing a little hierarchy; this trace format is a specialized version of a more generic play trace format, and it drops out all game events except for (:i) inputs.

This format also splits the determinant into a base and final part. The base part should be a round list, the final part should be a :keyword, and the value should be either a round or a square list. Apologies for the awkwardness of the format.

Fork me on GitHub