The subject of this post is the use of LSTM models for time series analyses and stock price predictions in particular. In February this year, I took the Udemy course “PyTorch for Deep Learning with Python Bootcamp” by Jose Portilla. I would like to mention that this is a good introductory course on some Deep Learning topics. As introductory courses go, after taking it one definitely will need to work with other material to improve their knowledge on the subject.
I finished the course at the end of February and decided to create my own projects beyond the course material. One of these projects was LSTM for time series analysis. My work on this project will be discussed here.
The code used in the project will not be provided in this post. Providing it will make the post too cumbersome to read and will distract from the main points I would like to bring here. The entire code can be found on my Github page https://github.com/marin-stoytchev/data-science-projects/tree/master/lstm_pytorch_project
I am also not going to try to provide an explanation of LSTM. This is well beyond the scope of this post. For a good and detailed explanation on LSTM I would refer the reader to the post which is most often cited on Quora: http://colah.github.io/posts/2015-08-Understanding-LSTMs/.
Here, I will provide only brief descriptions of the data used and the LSTM model, and will spend some time discussing the results. After that I will provide three examples of the use of LSTM for stock price prediction I have discovered on the web and will share my opinions on these.
My LSTM Project
Data used
In my LSTM project two different time series data sets were used. The first data set was the monthly sales of used cars in USA from 1992-01-01 to 2019-12-01 (https://fred.stlouisfed.org/series/MRTSSM44112USN). The second data set was the daily price of gold from 2015-02-23 to 2020-02-21 (https://fred.stlouisfed.org/series/GOLDPMGBD228NLBM)
I intentionally selected these two data sets to examine the performance of the same LSTM model on time series with dramatically different behavior.
Below is a plot of the time series for car sales.
As one can see from the plot, the data is very periodical in nature with a period of 12 months (one year). The dip at the 2008 mark corresponds to the recession in 2008. This data set is very close in nature to the data set used in the example project from the course by Jose Portilla I mentioned in the beginning.
And here is the plot of the daily gold price.
The behavior here is completely different. There is no clear periodic pattern. To use a term from my Physics days, I would say that it exhibits “random walk” behavior.
The values from the first data set excluding the last 12 data points were used for training the LSTM model. The last 12 points (one year) were used for validation of the model.
The values from the second data set excluding the last 30 data points were used for training the LSTM model. The last 30 points (one month) were used for validation of the model.
Both training sets were scaled using MinMaxScaler from sklearn.
Brief model description:
In the project, LSTM model with input_size = 1, hidden_size = 256, and output_size = 1 was used. MSELoss was used for a cost function; Adam optimizer was used as an optimizer.
The model was trained with each data set for 200 epochs after which a prediction was made.
The model was not optimized in any way in regards with hidden layer size, learn rate, cost function or optimizer.
Predictions results:
The predictions from the model were re-normalized to match the original data. For each data set the predictions were plotted together with the original data for comparison/validation.
Predictions for data set 1
The comparison plot for the car sales data is shown below.
For a closer look we zoom in on the last few cycles of the data.
This result closely resembles the results from the LSTM example project in the course. I remember thinking at that time that yes the prediction is close to the real data, but the data has such a well-defined periodic pattern that one does not need a sophisticated model to make a similar prediction. It is clear from the data that just using what I would call “copy and paste” technique one would get a similar result. To elaborate on the “copy and paste” technique: 1) if one picked the data points from the previous three cycles separately; 2) then added the appropriate deltas to these three sets, so that the first point of each cycle has the same value; 3) averaged over the three cycles and 4) translated the result to match the starting point of the test data the result would be very close.
Predictions for data set 2
The comparison plot for the gold prices data looked like this:
After zooming in:
In contrast with the first data set, here we can see that the future forecast deviates significantly from the true data values.
This study confirmed what I was already suspecting when taking this portion of the Udemy course:
- LSTM is good at future forecasting in cases of time series which have well-defined periodic behavior.
- For time series of the random walk type, such as stock prices, LSTM does not have good accuracy (this will be confirmed in the examples discussions below). The reason is that without a well-defined pattern in the data the model cannot learn the behavior of the time series and, thus, cannot make accurate predictions for the future.
The last statement is made only in regard with LSTM models applied alone using the time series data points as a single input. I am still new to this field and I am continuing my studies and research on the subject. So, it is possible that LSTM combined with another algorithm or having additional inputs could have better performance. However, at this point of my journey into the field of Data Science and Machine Learning I have not encountered good examples of this.
The research I have done on the subject has been focused on studies which are close to the approach I have used above. Below I share three examples of such studies and my thoughts on those.
Examples of LSTM use for stock price predictions
LSTM Example #1:
The first example is of LSTM being applied to predict the fluctuations of S&P500: https://www.altumintelligence.com/articles/a/Time-Series-Prediction-Using-LSTM-Deep-Neural-Networks
Here, I am presenting only the results from the LSTM model and the author’s conclusion based on these results which are most relevant to my findings. Although this is only a small part of his presentation, nothing has been taken out of context or misrepresented in any away.
Here is a plot from that study which most closely relates to the results from my project. Please, note that the stock price presented here is normalized.
Although the predictions in this plot appear smoother than the results for the gold price data, many of the predictions have similar behavior – namely, the predictions diverge significantly from the true data. I took the liberty to place arrows in the plot pointing to these predictions. The marked predictions either completely miss the data trend or miss significant changes in the data behavior within that time interval (the two dashed arrows belong to the later group).
When I saw this plot, I was very happy that another study is confirming my findings. However, the author’s conclusions based on their result were completely opposite to what I have concluded. Below is the exact quote related to this particular result:
“We can see from the multi-sequence predictions that the network does appear to be correctly predicting the trends (and amplitude of trends) for a good majority of the time series. Whilst not perfect, it does give an indication of the usefulness of LSTM deep neural networks in sequential and time series problems. Greater accuracy could most certainly be achieved with careful hyperparameter tuning.”
Let’s break down this statement into its parts:
1) “The network does appear to be correctly predicting the trends (and amplitude of trends) for a good majority of the time series.”
Using terms like “good majority” is not really in the spirit of a scientific study. One could say in X% of the cases we find good agreement between predictions and real data. Saying “good majority” is simply avoiding to provide a quantitative measure. So, if we agree that the predictions marked with arrows are inaccurate, then the prediction accuracy is only 46% (with seven inaccurate predictions there are six accurate predictions out of 13 total predictions, and, thus, the accuracy is 6/13). Even if we do not count the two dashed arrows as inaccurate predictions (which I would not), then the prediction accuracy increases to 62% (eight accurate out of 13 predictions). That hardly is a good majority of the data.
2) “Whilst not perfect, it does give an indication of the usefulness of LSTM deep neural networks in sequential and time series problems.”
I don’t argue about the usefulness of LSTM in general. However, this particular study deals with stock price predictions and its results show that the model’s predictions are inaccurate more than half of the time. So, the study here does not provide any evidence in support of the above claim, which makes this claim baseless.
3) “Greater accuracy could most certainly be achieved with careful hyperparameter tuning.”
Drawing from my long experience in science and engineering, statements like this have always raised red flags for me. The expression “could most certainly” itself is a contradictory one – “could” implies possibility which is far from “certainty”. One cannot claim certainty without a proof. As many scientist and engineers would say: show me the data. If greater accuracy can been achieved, why not prove it? Also, what exactly does careful hyperparameter tuning mean? And which hyperparameters are being considered? If one were to suggest such an approach, one should at least provide some details on what that approach entails.
LSTM Example #2:
The second example is from a tutorial at datacamp.com by datacamp instructor: https://www.datacamp.com/community/tutorials/lstm-python-stock-market
As in the first example we are not going to discuss in detail the entire presentation. We will focus on the results which are closely related to our own study.
After performing optimization of the LSTM model, the author presents the best predictions in the following plot (the plot from the post is intentionally truncated here to provide better details). Also, as in the first example, the stock price presented is normalized.
Here again we can see that there is a large number of predictions which deviate significantly from the real data. Yet, the author’s comment on the result is: “Though not perfect, LSTMs seem to be able to predict stock price behavior correctly most of the time.”
Let’s test that claim. There are total of 20 predictions, 12 of which are inaccurate and the remaining eight prediction are (somewhat) accurate. Thus, the prediction accuracy is 40% (8 out of 20). Once again, this is not what one would call predicting correctly most of the time. In reality, the plot reveals that the model’s predictions are incorrect most of the time – 60%.
Looking at the plot, one could make the argument that the predictions more or less follow the stock price behavior (think of using wide marker and connect roughly all predictions). However, this is an erroneous observation. The fact is that each forecast is made with the knowledge of all data prior to the forecast interval, i.e. the forecast starts immediately after the last preceding point of the true data. This explains the big vertical deltas between some consecutive forecasts.
Before moving on to the third example, I would like to make a brief comment regarding the first two examples. In both of them, the authors have correctly applied LSTM and their results are correct. The problem is not with the results from their studies, but with the gross misrepresentation of the accuracy of the predictions and the usefulness of LSTM in predicting short-term behavior in stock prices.
LSTM Example #3:
The third example uses several different algorithms to make predictions based on the same stock price data (https://www.analyticsvidhya.com/blog/2018/10/predicting-stock-price-machine-learningnd-deep-learning-techniques-python/).
The majority of data is used for training; a small portion at the end of the time series is used for model validation.
Since I was eager to see the results for the LSTM predictions, I quickly jumped to the corresponding plot. That plot is shown below.
When I saw this plot, my first reaction was a complete disbelief. How is this possible – the predictions match perfectly the real validation data? I thought that the author has found a gold mine.
For comparison, this is what the predictions plot for one of the other models, Prophet, looked like.
This plot made much more sense to me based on my own study and on the studies I have previously found in my research. So, the question became: How come the LSTM predictions are so accurate?
That’s when I started examining more closely the code used in the previous models and in the LSTM model. The mystery was revealed soon enough. For all other models, the machine did not have any knowledge of the validation data, which is how the predictions should be made. For the LSTM model, however, the predictions were made using the entire set of validation data. Here is the code which shows that.
The first marked line shows that inputs include all validation data points plus the preceding 60 data points (these 60 points are needed for predicting the first validation point). The second marked block of code shows that test data is built of all inputs points in series of 60 points and moving forward one data point at a time. Since all inputs points are used (but last) that means that the test set includes all validation points as well. Finally, the third marked line shows that the predictions are indeed made by using the test data constructed as described above with full knowledge of the validation data.
After discovering this, I no longer wondered how the LSTM predictions came to be so accurate. The answer is that the model was using 60 real data points, including all points from the validation set, for predicting the next data point. Unfortunately, this is not how future forecast in stock prices is supposed to work. Future forecast means that there is no data existing. Thus, when validating the model we treat the validation data exactly in the same way – as if it did not exist and the machine cannot have any knowledge of it when making predictions. When making predictions one point in the future, the machine uses only available data and makes the next prediction by adding the data point which was just predicted in order to move forward. And so on until the very last predictions are made based (almost) entirely on previously predicted values. Predicting the next day value based on knowledge of the previous 60 days is not really the goal, and is not what was done with the others models in that post.
In addition, I would like to mention that despite the seemingly amazing agreement of the LSTM predictions with the data (which is achieved by knowing the future), the author makes a baffling statement that one most certainly cannot trust the LSTM predictions of stock prices. What was then the goal of this exercise? This indicates to me that the author had no clear idea of what exactly they were trying to solve or demonstrate, and why the different models were implemented differently.
This was supported by the author’s response to several comments below the post. These comments/questions pointed out the fundamental flaw in the LSTM predictions. In my opinion, the author of the post did not answer these questions in a straightforward manner, instead providing vague and inadequate answers.
In Conclusion
LSTM could be a valuable tool for tackling different sequential and time series problems. However, it is my opinion that it is not suitable to deal with time series data of the random walk type. LSTM is an artificial recurrent neural network which learns from historic data to make predictions into the future. When the historic data has no bearing to the short-term future outcome LSTM does not work well. So, it is my conclusion that LSTM alone is not a suitable tool for predicting short-term stock price behavior.
I would like to finish with the following appeal in regard with teaching and writing (online and on paper) on scientific subjects, particularly in the fields of Data Science and Machine Learning which are rapidly growing and the number of articles and opinions in this area is becoming overwhelming.
To the teachers, lecturers and instructors: In addition to providing examples where a certain model, a method or a technique shines, please, include cases where the weaknesses of these techniques are revealed. Teach both the advantages and disadvantages of a technique and illustrate with examples. As a particular example regarding LSTM, please, do not limit your lectures to predicting the future outcome of a sine wave or a similarly behaving time series. This is a highly biased way of presenting such models and paints way too optimistic of a picture.
To the professionals (examples 1 and 2): Please, do not fall into the trap of overselling something you believe to be good. Be objective when discussing what certain tools can and cannot do, and the implications of the results from your work. Also, quantifying your findings is way more useful than generalized and vague statements. There are thousands of students out there who take your words as the gospel and as a result they are being seriously mislead in some cases.
To the enthusiasts (example 3): Write and publish only what you are convinced to be true and has value. Otherwise, your posts will hurt many others who follow your writings trying to learn something valuable. Pay attention to comments and questions. If necessary, correct your statements. Learn from your mistakes, instead of brushing them aside and deflecting or ignoring questions from others.
And to all: keep learning!