Video Game Machine Learning
Full Report

Predicting market trends of virtual video game markets with neural networks.

Joshua Xu, Wayne Xun
Machine Learning (EECS 349) : Spring 2016
Professor Douglas Downey
Northwestern University



Introduction

Motivation

In this project, we apply machine learning techniques to predict the prices of goods in a video game’s virtual economy. This task is similar to the classic task of predicting stock prices on the stock market, yet also has the added novelty of being tied into the video game’s well defined dynamics, making this a very interesting task. For example, while stocks in the real marketplace exhibit chaotic and uncertain relationships with one another, video game market items have pre-programmed and well-documented interactions and relationships (e.g. metal ore → bar → weapons/armor). These relationships are seldom leveraged by people to run predictions on the virtual market. Choosing a virtual game-based economy gives us an opportunity to pioneer the entire machine learning process all the way from data acquisition and processing to feature and learner selection in a classically familiar yet unique space.

In our investigation, we focus on Runescape’s virtual economy due to the game’s centralization of the economy in a well documented virtual marketplace where data is aggregated and historical records can be accessed through provided APIs. When we initially set out, we had a specific metric for prediction validity in mind, but as machine learning projects tend to go, every step of the project proved to be a pathfinding process of its own. Ultimately, we settled on aiming to predict items’ day to day % change.


Methodology

Data

The Runescape API provides price and category data, but does not provide any other attributes such as other pricing or trading metrics. Using their provided API, we first scraped this and arrived at a daily updated times series list of prices for all 4106 tradeable virtual items over the course of 205 days, each with a category attribute attached.

Feature Set - Initial (Traditional)

Initially, we constructed a feature set using Node.js and Ruby scripts to mimic what stock sites offer in classical applications of stock market prediction. This traditional approach set contains summary variables for each data point as follows:

  • current price

  • yesterday’s price

  • price 1 week ago

  • price 30 days ago

  • average price over past 30 days

  • minimum price over past 30 days

  • and minimum price over past 30 days

  • Applying various regression learners in Weka on this set, we tried to predict the price of any given item 1 day into the future. However, here we found that although the virtual market displays interesting trends, majority of the items are very static with very little day to day change. Even when using learners in Weka with relatively less insight such as 1 nearest neighbor, we achieved > 0.99 R on the market predictions. This is a high accuracy but is far from insightful, since predicting no change yielded similar accuracy, so we adjusted our approach following this discovery.

    Feature Set - Iteration (Delta)

    Thereafter, we tried a more machine learning oriented approach for feature generation. To try to provide the learner with as much relevant trend information as possible, we then created a new feature set based on daily changes. This delta approach feature set contained summary variables based on adjusted daily changes as a set of changes over 30 days as follows:

  • % change from today to yesterday

  • % change from yesterday to 2 days ago

  • % change from 28 days ago to 29 days ago

  • Using this set, we pivoted to not predict the price of an item in the future, but to try predicting the direction (e.g. up/down/same) of the change, and also the exact % change (e.g. --2%,0,+1%) for the next day. This way, we get to test regression and classification, and see how a trade off in prediction exactness (for just predicting a direction without giving a magnitude) can affect accuracy. We reformulated our feature set to create the delta feature approach set at this point. For our learners, we tried using Decision Trees, Nearest neighbor, and Multilayer Perceptrons.

    Enhancing Model Generated Predictions

    After building this model, we wanted to test how well the model could predict farther into the future. To do that, we created a long term predictor which iteratively fed results repeatedly through the 1 day regressor to predict prices further into the future (up to 25 days), before outputting the final change direction, instead of percent, since the actual change has a wider variance and will result in lower correlation values. With this approach, we also compare the predictions for next day prices (for fast flip, higher risk/reward investments) and with over about a month’s time (for longer term investments).

    Testing and Training

    After observing the data, we found the “Crafting Materials” category to be of particular interest because of their high frequency of trading and thus volatile prices. In game, these are raw materials that people can process to make goods such as food, weapons, and armor, and are also used in bulk for people to train their in game skills levels with, so they are commonly traded. This makes items in this category the most meaningful test set not only due to higher difficulty due to price volatility, but because of the potential impact a good model could make.

    As a result, we used this category as our main set in an additional test to additionally observe if these items behave differently. Here, we trained models on everything, and also the category alone, and used this category for testing. We adjusted our classifier and data split settings in Weka before deciding on a final format. For classifiers, we used accuracy and F-measure as performance metrics, while for regressors we used the correlation coefficient R. For our 1 day models on crafting alone, we used 10 fold cross validation on our first 180 days of data. For our 1 day models trained on all data, we used a 75/25 split based on time, training on the first 135 days of data and validating with the last 45 days of crafting material prices. For the long term changes, we trained a regressor on the first 180 days, fed in the 181st day, and compared to the actual result for the 205th day and calculated an F-measure.


    Results

    Traditional set regressor, crafting only

    Learner R Value
    1-nearest neighbor 0.2442
    3-nearest neighbor 0.3100
    REPTree 0.1700
    Multilayer Perceptron 0.0328

    The best result came from 3 nearest neighbor with an R value of 0.31 (increasing from 0.2442 with 1 nearest). The REPTree only gave an R value of 0.17, and MultilayerPerceptron had 0.0328.

    Traditional set classifier, crafting only

    Learner Accuracy F-Measure
    ZeroR 56.685% 0.410
    10-Nearest Neighbor 70.5627% 0.690
    J48 Decision Trees 70.0979% 0.684
    Multilayer Perceptron 56.685% 0.037

    Baseline of ZeroR: 56.685% accuracy and 0.41 F-measure. The best result came from 10 nearest neighbor and J48 decision trees with accuracy/F-measure of 70.5627%/0.69 and 70.0979%/0.684. MultilayerPerceptron had 56.685% accuracy, doing little better than ZeroR.

    Although R and F-measure aren’t directly related, the R values above show that our regressors explain very little (less than 10%) of the actual variance, and the classifiers do a much better job with 70% accuracy. The perceptron does a very poor job, likely because our features are in absolute terms while the prediction classes are in terms relative to one another, in percent.

    Delta set regressor, crafting only

    Learner R Value
    Additive Regressor (Stumps) 0.4507
    Additive Regressor (depth of 3 REPTrees) 0.4713
    1-Nearest Neighbor 0.2695
    Multilayer Perceptron 0.3015

    The best result came from an additive regressor with a depth of 3 REPTrees, giving an R value of 0.4713 (increasing from 0.4507 with stumps). The nearest neighbor only gave an R value of 0.2695 (didn’t improve with more neighbors), and MultilayerPreceptron gave an R value of 0.3015.

    Delta set classifier, crafting only

    Learner Accuracy F-Measure
    ZeroR 56.685% 0.410
    J48 74.8443% 0.743
    Multilayer Perceptron 69.6051% 0.081
    k-Nearest Neighbor 66.7737% 0.064

    Baseline of ZeroR: 56.685% accuracy and 0.41 F-measure. The best result came from J48 decision trees + reduced error pruning with accuracy/F-measure of 74.8443%/0.743. MultilayerPreceptron had 69.6051% accuracy and kNN fell to 66.7737% accuracy.

    With the additional attributes, we see that the accuracy improves almost universally, although the nearest neighbor suffers from the additional dimensions diluting the distance measure.

    1 day models with all data training

    Although all values increased marginally, the change was statistically negligible.

    Long term predictions, crafting and all

    Learner Accuracy on All Data Accuracy on Crafting Data Only
    Perceptron 41.3% 32.7%
    REPTree 52.3% 52.3%
    Bagging REPTree 44.2% 48.1%
    Random Tree (depth of 15) 56.7% 61.5%
    ZeroR 60.6% 58.1%

    Most learners actually did worse over a long term period than ZeroR. In fact, these results fare worse than our 1 day predictions. We didn’t have too many examples for this test, but we see that this method we created might not work very well. Since we are only relying on price patterns and the game itself places restrictions on how the changes work to prevent price manipulations, it’s possible that we just don’t have enough data available to predict with appreciable accuracy, although further investigation is needed.


    Conclusion

    Analysis

    Good Learners from Validation Comparison

    Figure 1
    Legend

    When plotting the learner's predictions against the validation set, we see different learners perform differently. Here, the Random Tree learner surprisingly outperforms the other learners.

    Inconclusive Learners from Validation Comparison

    Figure 1
    Legend

    Here, no learner predicts accurately enough to confidently set them apart from the others. This illustrates the limitations of machine learning on data with no clear trends, but also illustrates the reliance of most machine learning techniques on large data sets to be effective.

    Long Term Failure

    Figure 1
    Legend

    With even simple cases of clear trends over time, we see here that all learners fail at long term predictions. Even a simple exponential function would have been sufficient here.

    Future work

    We did a lot of experimenting with the price data, explored a lot of ML techniques, and got decent results. We can see that a learner based on price patterns alone can perform pretty well. However, we still had additional data available that we weren’t yet able to incorporate, such as the RuneScape Wiki’s data. One idea we had was to explore semantic relatedness to create trade groups or indexes, and combine the features of items within the group per data point.

    Another attribute we’d like to consider are some of the soft price floors built into the game/market system. There’s an in-game spell which allows a player to convert an item into some specified amount of gold. This effectively creates a price floor on items which market prices rarely drop below.

    Additionally, since we mainly care about investing, we may want to use a different performance model focusing on our positive predictions that we’d buy, and weight based on the actual change. This would allow our machine learning project to ultimately deliver a model that is not only insightful, but also practical.


    Resources