Kisaragi
7 min readAug 23, 2020

Bilibili Stock Prediction Based On LSTM · Complete Tutorial

Personal Blog: ksrgtech.com

This article is split into 8 parts as below:

  1. Problem statement
  2. Preparation for Python module
  3. Data preparation
  4. Model building
  5. Model fitting
  6. Model prediction
  7. Results visualization
  8. Defects of this article

And as a supplement, readers who want to know how exactly LSTM works can take a look at colah’s blog:
English original:
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Chinese translation version:
https://www.jianshu.com/p/95d5c461924c

Let’s get started!!

1. Problem statement

What is Bilibili? You can see it as a Chinese version of Youtube but more related to ACG(Anime, Comic and Games). It started in 2009 and now is a very promising and fast-growing platform with over 170m users.

In this article, we will apply a Recurrent Neural Network (RNN) extension called Long short-term memory (LSTM) to Bilibili (NASDAQ: BILI) stock data. We will use 80% of the data to train the model and use the rest of the data to predict and to verify.

It is impossible to predict precise stock prices. Neural network algorithms can only help explore the general trend of stocks, so

2. Preparation for Python module

If you haven’t installed some modules, such as pandas_datareader, you can enter the following code in the command window to download it.

pip install pandas_datareader

Almost every Python module or package is downloaded like this.

pip install [Module Name]

Next, load all the modules we gonna be using in this article.

#For loding data
from pandas_datareader import data as pdr
#For data preprocessing
import math
from sklearn.preprocessing import MinMaxScaler
import numpy as np
#For model building
from keras.models
import Sequential
from keras.layers import Dense, Dropout, LSTM, TimeDistributed
#For data visualization
import missingno as msno
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(color_codes='Ture')

3. Data preparation

3.1 Import data

#The data is obtained from Yahoo Finance
#Bilibili ticker symbol: BILI
df=pdr.get_data_yahoo('BILI',start='2018-03-28',end='2020-08-18')
#View the data
df

Output:

Img created by Author

The data has 603 rows and 6 columns. That is, the sample size is 603, and the feature number is 6.
We will be only using column Close in this article.

3.2 Missing value processing

Next, we need to check the missing values. If there are missing values, we need to do some corresponding processing, depending on the location they are at, whether to delete or fill.

#Visualize missing values
msno.matrix(df)

Output:

Img created by Author

As can be seen from the left figure, this data is complete, there are no missing values, we can continue with the next steps.

3.3 Split training set and test set

We will take the first 80% of the data as the training set to train the model, and the last 20% of the data as the test set to test the model’s predictive ability.

#Arrange the dataframe in descending order by index
data = df.sort_index(ascending=True, axis=0)
#We will only be using the Close column
dataset=data[['Close']].values
#Take 80% of the data as the training set
training_data_len=math.ceil(len(dataset)*.8)
train_data=dataset[0:training_data_len,:]
#Take the remaining data as the test set
#When making predictions, the first data of the remaining data requires the data of the previous 60 days to predict, so this goes back 60 days
test_data = dataset[training_data_len-60:,:]

3.4 Data normalization

In this step, the data will be mapped to the [0,1] interval, the purpose is to improve the convergence speed of the model and improve the accuracy of the model.
The method used here is called min-max normalization or 0–1 normalization.

Img created by Author
#Feature scaling
scaler=MinMaxScaler(feature_range=(0,1))
scaled_train=scaler.fit_transform(train_data)
scaled_test=scaler.fit_transform(test_data)

3.5 Reconstructing the data

The purpose of this section is to make the data meet the input requirements of the algorithm. There are three steps.

1. Split the data set into x and y two parts, use x to predict y.

Img created by Author

For the simplicity of the image description, if we take 3 days as a timestep, we would use the data of the previous three days (large orange rectangle, large green rectangle, x) to predict the data of the fourth day (small orange rectangle, small green rectangle, y) . In this blog, the actual timesteps is 60, that is, the data of the first 60 days is used to predict the data of the 61st day.

2. Convert list type data to array data for use in the third step

3. Convert two-dimensional data into three-dimensional data.
The input of LSTM requires a three-dimensional array (samples, timesteps, feature), which respectively refer to the sample size, time steps, and feature number.
x_train and x_test will be used as the input data of LSTM, and for now they are stil two-dimensional data (samples, timsteps=60), we need to add one more dimension to these two, and because we only take the closing price for analysis, then set feature=1.

#Reconstruction of training set
#1 separate x and y
x_train=[]
y_train=[]
for i in range(60,len(scaled_train)):
x_train.append(scaled_train[i-60:i,0])
y_train.append(scaled_train[i,0])
#2 Convert list type data into array
x_train,y_train=np.array(x_train),np.array(y_train)
#3 Turn 2D data into 3D data
x_train=np.reshape(x_train,(x_train.shape[0],x_train.shape[1],1))
#Test set reconstruction
#1 separate x and y
x_test = []
y_test = dataset[training_data_len:,:]
for i in range(60,len(scaled_test)):
x_test.append(scaled_test[i-60:i,0])
#2 Convert list type data into array
x_test = np.array(x_test)
#3 Turn 2D data into 3D data
x_test = np.reshape(x_test, (x_test.shape[0],x_test.shape[1],1))

4. Model building

#Initialize model
model = Sequential()
#LSTM layer
model.add(LSTM(units=50, return_sequences=True,input_shape=(x_train.shape[1],1)))
#Dropout layer
model.add(Dropout(.2))
#LSTM layer
model.add(LSTM(units=50, return_sequences=False))
#Dropout layer
model.add(Dropout(.2))
#Fully connected layer
model.add(Dense(units=1))
#Model compilation
model.compile(optimizer='adam', loss='mean_squared_error')

🔸 Initialize the neural network: Sequential()

🔸 LSTM layer: LSTM()
Generally, a two-layer LSTM can fit the data well. A multi-layer LSTM will improve the fit of the model, but it also increases the complexity of the model and the difficulty of training.

The parameter units=50 means that the layer has 50 LSTM neurons, and the output of this layer is a 50-dimensional vector.
The parameter input_shape requires us to input a two-dimensional array, including timesteps and features.
The parameter return_sequences is used to set whether to return an array containing timesteps.

  • True returns a three-dimensional array (batch size, timesteps, number of units).
  • False returns a two-dimensional array (batch size, number of units).

🔸 The dropout layer to prevent overfitting: Dropout()
Function Dropout(0.2) means that 20% of the data is randomly deleted from the data output by the previous layer

🔸 Fully connected neural network layer: Dense()
Also called Densely connected layer, it means that each node of this layer is connected to all nodes of the previous layer. The parameter unites=1 means that there is one neuron in this layer.

🔸 Neural network compilation: compile()
Use Adam to optimize, with MSE (mean square error) as the loss function

The structure of the overall neural network is shown in the figure:

Img created by Author
#Model structure
model.summary()

Output:

Img created by Author

We can see the output data shape of each layer and the number of parameters in each layer.

5. Model fitting

Use the training set data x_train, y_train to train the model. This article does not include the steps of adjusting parameters. Here I just randomly give two values 25 and 10 to the parameters batch_size and epochs.

  • batch_size: If the dataset is too large, it’s impractical to input it all in once, one method is to divide the complete data into multiple inputs, the sample size of one input is called batch_size
  • epochs: The complete data is passed through a neural network once, called an epoch

In fact, these two values have a great influence on how well can we fit a model.

6. Model prediction

After training the model with the training set, use the test set to make predictions;
Since x_test has been normalized before, we need to use the inverse_transform() function to reverse it.

#prediction
predictions = model.predict(x_test)
#Reverse the scaling
predictions = scaler.inverse_transform(predictions)

7. Visualization of the results

#Create a dataframe for plotting
train = data[:training_data_len]
valid = data[training_data_len:]
#Add a new column to valid, and assign predictions to the new column
valid['Predictions'] = predictions
#Start plotting
#set figure size
plt.figure(figsize=(16,8))
#set title
plt.title('Model')
#set x-axis name
plt.xlabel('Date', fontsize=18)
#set y-axis name
plt.ylabel('Close Price USD ($)', fontsize=18)
#Draw the line chart for training set
plt.plot(train['Close'])
#Draw the line graph of the true value and the predicted value separately
plt.plot(valid[['Close','Predictions']])
#Display legend
plt.legend(['Train','Val','Predictions'], loc='lower right')
plt.show()

Output:

Img created by Author

It can be seen from the figure that the algorithm LSTM can indeed predict the trend of stocks quite accurately.

8. Faults of this article

  • This article only uses one method LSTM to make predictions. In fact, there are many other methods, such as traditional ARIMA, and machine learning methods such as k nearest neighbors.
  • This article does not involve tuning parameter. There are many ways to tune parameters. In addition to manual parameter adjustment, you can use some algorithms such as Grid Search, Random Search, and Bayesian methods.
  • After predicting a value, when we want to predict the next value, the data of the previous 60 days (which is used to predict this “next value”)should include our predicted value, and this article uses all the real data to make the prediction instead of the predicted data.
  • …(If you find more deficiencies of this article, please leave comments~~

Thanks for your time~🌸

Kisaragi
Kisaragi

Written by Kisaragi

A student study in Statistics, Mathematics, Machine Learning and Deep learning.

Responses (1)