Lecture 17. RNN Algorithm

Recommended Post: 【Algorithm】 Algorithm Index

3. Type 1. Regression Models Before RNN

4. Type 2. LSTM

5. Type 3. GRU

1. Overview

⑴ RNN (recurrent neural network): An algorithm in which the hidden layer has a recursive neural network in a multilayer perceptron consisting of an input layer, hidden layer, and output layer.

⑵ Structure

Figure 1. Structure of the RNN Algorithm

⑶ Applications

① Natural Language: Uses preceding and following words in text

② Speech Signals

③ Time-Series Data: Uses past and future values along with the current value

Predicting the future by observing changes in data over time is challenging but very important. Such problems can also be solved using deep learning algorithms. By providing multiple training datasets where x(t), x(t - τ), …, x(t - kτ) are input values and x(t + τ) is the target value, the predicted value y can be expressed as y = f(x(t), x(t - τ), …, x(t - kτ)). With deep learning algorithms, a network that represents this function f can be created. Not only can a single predicted value be output, but multiple predicted values can also be generated. In this case, y = F(x(t), x(t - τ), …, x(t - kτ)) = (x₁, … x_m) are the predicted values, and G(t + τ) = (x₁, …, x_m) are the target values.

2. Considerations

⑴ Long-Term Dependency Problem: The further back data is from the current time, the harder it becomes to process context.

⑵ Gradient Vanishing (GV): A problem where the gradient value converges to 0 as it moves toward the input layer during backpropagation, making updates ineffective.

① Solution: Use ReLU instead of the sigmoid activation function.

⑶ Gradient Exploding (GE): A problem where gradients become excessively large, causing abnormal weight updates.

① Gradient Clipping: A technique to prevent gradient explosion by setting an upper limit on gradients.

3. Type 1. Regression Models Before RNN

⑴ Moving Average Model

⑵ Autoregressive Model

⑶ ARMA (Autoregressive Moving Average)

⑷ ARMAX (Autoregressive Moving Average with Exogenous Inputs): Concerning the external variable x,

⑸ Issues: Assumption of linearity, lack of criteria on how much past data should be used.

4. Type 2. LSTM

⑴ LSTM (Long Short-Term Memory): A neural network algorithm designed to address the long-term dependency problem of RNNs.

⑵ Structure: LSTM consists of an input gate, forget gate, and output gate.

① State 1. Cell State: Information is passed along unchanged.

② State 2. Forget Gate: If the sigmoid output is 1, the information is retained; if 0, it is discarded.

③ State 3. Input Gate: Determines which new information will be stored in the cell state.

④ State 4. Cell State Update: Updates the cell.

⑤ State 5. Output Gate: Determines the output.

5. Type 3. GRU

⑴ GRU (Gated Recurrent Unit): Similar to LSTM but with a simpler structure.

⑵ Structure

Figure 2. Structure of GRU

① When r_t is close to 0, the intermediate memory unit ignores h_t-1.

② When z_t is close to 1, h_t ignores x_t and tries to maintain the h_t-1 value.

③ Input (Text) - Output (Sentiment: Positive, Negative, Neutral) has hidden layers in between, and the model’s performance depends on how these hidden layers are structured.

Input: 2023.06.27 00:35

870