Korean, Edit

Chapter 17. RNN Algorithm

Recommended Post: 【Algorithm】 Algorithm Index


1. Overview

2. Considerations

3. Type 1. Regression Models Before RNN

4. Type 2. LSTM

5. Type 3. GRU



1. Overview

⑴ Definition

① RNN (Recurrent Neural Network): An algorithm in which, within a multilayer perceptron composed of an input layer, a hidden layer, and an output layer, the hidden layer contains a recurrent neural network structure.

② It can be expressed as: ht = tanh(Wxxt + Whht-1 + b)

③ Advantages

○ The model is relatively simple, and the network structure can accept inputs and outputs regardless of sequence length.

○ The structure can be designed in various flexible ways depending on the need.

④ Disadvantages

○ Relatively slow computation speed

○ Long-term dependency problem

○ Vanishing gradient problem

⑵ Structure


image

Figure 1. Structure of the RNN Algorithm


⑶ Applications

① Natural Language: Uses preceding and following words in text

② Speech Signals

③ Time-Series Data: Uses past and future values along with the current value

Predicting the future by observing changes in data over time is challenging but very important. Such problems can also be solved using deep learning algorithms. By providing multiple training datasets where x(t), x(t - τ), …, x(t - kτ) are input values and x(t + τ) is the target value, the predicted value y can be expressed as y = f(x(t), x(t - τ), …, x(t - kτ)). With deep learning algorithms, a network that represents this function f can be created. Not only can a single predicted value be output, but multiple predicted values can also be generated. In this case, y = F(x(t), x(t - τ), …, x(t - kτ)) = (x1, … xm) are the predicted values, and G(t + τ) = (x1, …, xm) are the target values.



2. Considerations

Long-term dependency problem

① As data points get farther back in time from the current moment, it becomes harder to process the context.

Gradient vanishing (GV)

① As more layers are added, during backpropagation the gradient values tend to converge toward zero as they move closer to the input layer, so parameters are not updated effectively.

② Neural network research stagnated due to the limited performance of multi-layer networks and the slow speed of computers.

Solution: Use activation functions such as ReLU instead of sigmoid.

○ Sigmoid: sig(t) = 1 / (1 + exp(-t))

○ Its derivative is largest at t=0, but the maximum value is 0.25, which is less than 1.

○ Therefore, through repeated computations across layers, the gradient tends to shrink and converge toward 0.

○ ReLU: ReLU(t) = max(0, t)

○ Helps address the vanishing gradient problem

○ Faster computation

○ Because it outputs no negative values, it can cause neuron deactivation (the “dying ReLU” problem)

○ A commonly tried first-choice activation function in deep learning

Gradient exploding (GE)

① A problem where gradients grow larger and larger, causing weights to be updated to abnormally large values.

Gradient clipping: A technique that sets an upper bound on gradient values to prevent gradient explosion.

Why gradient clipping speeds up model training



3. Type 1. Regression Models Before RNN

⑴ Moving Average Model


스크린샷 2025-02-12 오후 12 21 30


⑵ Autoregressive Model


스크린샷 2025-02-12 오후 12 21 47


⑶ ARMA (Autoregressive Moving Average)


스크린샷 2025-02-12 오후 12 22 01


⑷ ARMAX (Autoregressive Moving Average with Exogenous Inputs): Concerning the external variable x,


스크린샷 2025-02-12 오후 12 22 15


⑸ Issues: Assumption of linearity, lack of criteria on how much past data should be used.



4. Type 2. LSTM

⑴ LSTM (Long Short-Term Memory): A neural network algorithm designed to address the long-term dependency problem of RNNs.

⑵ Structure: LSTM consists of an input gate, forget gate, and output gate.

State 1. Cell State: Information is passed along unchanged.

State 2. Forget Gate: If the sigmoid output is 1, the information is retained; if 0, it is discarded.

State 3. Input Gate: Determines which new information will be stored in the cell state.

State 4. Cell State Update: Updates the cell.

State 5. Output Gate: Determines the output.



5. Type 3. GRU

⑴ GRU (Gated Recurrent Unit): Similar to LSTM but with a simpler structure.

⑵ Structure


image

Figure 2. Structure of GRU

스크린샷 2025-02-12 오후 12 23 15


① When rt is close to 0, the intermediate memory unit ignores ht-1.

② When zt is close to 1, ht ignores xt and tries to maintain the ht-1 value.

③ Input (Text) - Output (Sentiment: Positive, Negative, Neutral) has hidden layers in between, and the model’s performance depends on how these hidden layers are structured.



Input: 2023.06.27 00:35

results matching ""

    No results matching ""