In this project, we apply machine learning techniques to predict the prices of goods in a video game’s virtual economy. This task is similar to the classic task of predicting stock prices on the stock market, yet also has the added novelty of being tied into the video game’s well defined dynamics, making this a very interesting task. For example, while stocks in the real marketplace exhibit chaotic and uncertain relationships with one another, video game market items have pre-programmed and well-documented interactions and relationships (e.g. metal ore → bar → weapons/armor). These relationships are seldom leveraged by people to run predictions on the virtual market. Choosing a virtual game-based economy gives us an opportunity to pioneer the entire machine learning process all the way from data acquisition and processing to feature and learner selection in a classically familiar yet unique space.
In our investigation, we focus on Runescape’s virtual economy due to the game’s centralization of the economy in a well documented virtual marketplace where data is aggregated and historical records can be accessed through provided APIs. When we initially set out, we had a specific metric for prediction validity in mind, but as machine learning projects tend to go, every step of the project proved to be a pathfinding process of its own. Ultimately, we settled on aiming to predict items’ day to day % change.
The Runescape API provides price and category data, but does not provide any other attributes such as other pricing or trading metrics. Using their provided API, we first scraped this and arrived at a daily updated times series list of prices for all 4106 tradeable virtual items over the course of 205 days, each with a category attribute attached.
Initially, we constructed a feature set using Node.js and Ruby scripts to mimic what stock sites offer in classical applications of stock market prediction. This traditional approach set contains summary variables for each data point as follows:
current price
yesterday’s price
price 1 week ago
price 30 days ago
average price over past 30 days
minimum price over past 30 days
and minimum price over past 30 days
Applying various regression learners in Weka on this set, we tried to predict the price of any given item 1 day into the future. However, here we found that although the virtual market displays interesting trends, majority of the items are very static with very little day to day change. Even when using learners in Weka with relatively less insight such as 1 nearest neighbor, we achieved > 0.99 R on the market predictions. This is a high accuracy but is far from insightful, since predicting no change yielded similar accuracy, so we adjusted our approach following this discovery.
Thereafter, we tried a more machine learning oriented approach for feature generation. To try to provide the learner with as much relevant trend information as possible, we then created a new feature set based on daily changes. This delta approach feature set contained summary variables based on adjusted daily changes as a set of changes over 30 days as follows:
% change from today to yesterday
% change from yesterday to 2 days ago
…
% change from 28 days ago to 29 days ago
Using this set, we pivoted to not predict the price of an item in the future, but to try predicting the direction (e.g. up/down/same) of the change, and also the exact % change (e.g. --2%,0,+1%) for the next day. This way, we get to test regression and classification, and see how a trade off in prediction exactness (for just predicting a direction without giving a magnitude) can affect accuracy. We reformulated our feature set to create the delta feature approach set at this point. For our learners, we tried using Decision Trees, Nearest neighbor, and Multilayer Perceptrons.
After building this model, we wanted to test how well the model could predict farther into the future. To do that, we created a long term predictor which iteratively fed results repeatedly through the 1 day regressor to predict prices further into the future (up to 25 days), before outputting the final change direction, instead of percent, since the actual change has a wider variance and will result in lower correlation values. With this approach, we also compare the predictions for next day prices (for fast flip, higher risk/reward investments) and with over about a month’s time (for longer term investments).
After observing the data, we found the “Crafting Materials” category to be of particular interest because of their high frequency of trading and thus volatile prices. In game, these are raw materials that people can process to make goods such as food, weapons, and armor, and are also used in bulk for people to train their in game skills levels with, so they are commonly traded. This makes items in this category the most meaningful test set not only due to higher difficulty due to price volatility, but because of the potential impact a good model could make.
As a result, we used this category as our main set in an additional test to additionally observe if these items behave differently. Here, we trained models on everything, and also the category alone, and used this category for testing. We adjusted our classifier and data split settings in Weka before deciding on a final format. For classifiers, we used accuracy and F-measure as performance metrics, while for regressors we used the correlation coefficient R. For our 1 day models on crafting alone, we used 10 fold cross validation on our first 180 days of data. For our 1 day models trained on all data, we used a 75/25 split based on time, training on the first 135 days of data and validating with the last 45 days of crafting material prices. For the long term changes, we trained a regressor on the first 180 days, fed in the 181st day, and compared to the actual result for the 205th day and calculated an F-measure.
Learner | R Value |
---|---|
1-nearest neighbor | 0.2442 |
3-nearest neighbor | 0.3100 |
REPTree | 0.1700 |
Multilayer Perceptron | 0.0328 |
The best result came from 3 nearest neighbor with an R value of 0.31 (increasing from 0.2442 with 1 nearest). The REPTree only gave an R value of 0.17, and MultilayerPerceptron had 0.0328.
Learner | Accuracy | F-Measure |
---|---|---|
ZeroR | 56.685% | 0.410 |
10-Nearest Neighbor | 70.5627% | 0.690 |
J48 Decision Trees | 70.0979% | 0.684 |
Multilayer Perceptron | 56.685% | 0.037 |
Baseline of ZeroR: 56.685% accuracy and 0.41 F-measure. The best result came from 10 nearest neighbor and J48 decision trees with accuracy/F-measure of 70.5627%/0.69 and 70.0979%/0.684. MultilayerPerceptron had 56.685% accuracy, doing little better than ZeroR.
Although R and F-measure aren’t directly related, the R values above show that our regressors explain very little (less than 10%) of the actual variance, and the classifiers do a much better job with 70% accuracy. The perceptron does a very poor job, likely because our features are in absolute terms while the prediction classes are in terms relative to one another, in percent.
Learner | R Value |
---|---|
Additive Regressor (Stumps) | 0.4507 |
Additive Regressor (depth of 3 REPTrees) | 0.4713 |
1-Nearest Neighbor | 0.2695 |
Multilayer Perceptron | 0.3015 |
The best result came from an additive regressor with a depth of 3 REPTrees, giving an R value of 0.4713 (increasing from 0.4507 with stumps). The nearest neighbor only gave an R value of 0.2695 (didn’t improve with more neighbors), and MultilayerPreceptron gave an R value of 0.3015.
Learner | Accuracy | F-Measure |
---|---|---|
ZeroR | 56.685% | 0.410 |
J48 | 74.8443% | 0.743 |
Multilayer Perceptron | 69.6051% | 0.081 |
k-Nearest Neighbor | 66.7737% | 0.064 |
Baseline of ZeroR: 56.685% accuracy and 0.41 F-measure. The best result came from J48 decision trees + reduced error pruning with accuracy/F-measure of 74.8443%/0.743. MultilayerPreceptron had 69.6051% accuracy and kNN fell to 66.7737% accuracy.
With the additional attributes, we see that the accuracy improves almost universally, although the nearest neighbor suffers from the additional dimensions diluting the distance measure.
Although all values increased marginally, the change was statistically negligible.
Learner | Accuracy on All Data | Accuracy on Crafting Data Only |
---|---|---|
Perceptron | 41.3% | 32.7% |
REPTree | 52.3% | 52.3% |
Bagging REPTree | 44.2% | 48.1% |
Random Tree (depth of 15) | 56.7% | 61.5% |
ZeroR | 60.6% | 58.1% |
Most learners actually did worse over a long term period than ZeroR. In fact, these results fare worse than our 1 day predictions. We didn’t have too many examples for this test, but we see that this method we created might not work very well. Since we are only relying on price patterns and the game itself places restrictions on how the changes work to prevent price manipulations, it’s possible that we just don’t have enough data available to predict with appreciable accuracy, although further investigation is needed.
When plotting the learner's predictions against the validation set, we see different learners perform differently. Here, the Random Tree learner surprisingly outperforms the other learners.
Here, no learner predicts accurately enough to confidently set them apart from the others. This illustrates the limitations of machine learning on data with no clear trends, but also illustrates the reliance of most machine learning techniques on large data sets to be effective.
With even simple cases of clear trends over time, we see here that all learners fail at long term predictions. Even a simple exponential function would have been sufficient here.
We did a lot of experimenting with the price data, explored a lot of ML techniques, and got decent results. We can see that a learner based on price patterns alone can perform pretty well. However, we still had additional data available that we weren’t yet able to incorporate, such as the RuneScape Wiki’s data. One idea we had was to explore semantic relatedness to create trade groups or indexes, and combine the features of items within the group per data point.
Another attribute we’d like to consider are some of the soft price floors built into the game/market system. There’s an in-game spell which allows a player to convert an item into some specified amount of gold. This effectively creates a price floor on items which market prices rarely drop below.
Additionally, since we mainly care about investing, we may want to use a different performance model focusing on our positive predictions that we’d buy, and weight based on the actual change. This would allow our machine learning project to ultimately deliver a model that is not only insightful, but also practical.