Ultimate Stock Price Prediction System (USPPS)

Select a stock to examine.

Random forest settings:

Performance metrics of the asset


All performance metrics above are calculated for simple daily returns (not log) using adjusted close values of the asset with risk free rate and target rate of 0%. For the omega, linear interpolation of cumulative distribution function of the daily returns is used. More information, see: http://cran.r-project.org/web/packages/PerformanceAnalytics/

Price chart of the asset

Distribution of daily returns (adj. close)

Plots for daily returns are generated using simple daily returns (not log) for adjusted close prices.

QQ-plot of daily returns (adj. close)

ACF of daily returns (adj. close)

Forecast statistics


In the statistics listed above, abbreviation 'PI' means prediction interval and 'Correctly predicted changes' means the percentage of correct predictions of rising or falling price in the future (prediction horizon). Confidence intervals for correctly predicted changes are calculated empirically by simulating how many correct predictions would be generated by random guess (uniform distribution, binary values).

Forecast & price chart

In the chart above, red dashed line represents predictions for the test set and red zone represents prediction intervals. Prediction intervals are calculated empirically from the errror distribution of the test set.

Predictor histograms

Predictor-residual scatter plots

Residual statistics


ACF of residuals

Distribution of residuals

Residuals-predicted scatter plot

Lag plots of residuals

What is this system?

Aim of this system is to be simple analysis and visualization tool for financial time series data. The system is developed by Juho Uusi-Luomalahti during the data analysis project course in Tampere University of Technology. The system is not intended to produce accurate predictions in any means since predicting the future is commonly considered as witchcraft.

Predictions of the system are generated with Random Forest machine learning algorithm. Number of trees used in the forest is 1000 and the number of variables randomly sampled at each split is sqrt(p), where p is the total number of predictors

Financial time series data are fetched from Quandl.com and it is divided into training and test sets according to 80 % and 20 % rule.

Predictors (input variables) fed into the random forest are generated by calculating daily returns in sliding window for adjusted close, adjusted open, adjusted low, adjusted high and adjusted volume.

The system is implemented with R and Shiny. Following R packages have been used: Quandl, dygraphs, ggplot2, forecast, ggfortify, reshape2, pastecs, PerformanceAnalytics and caret.

15th April 2017