Korean, Edit

2025 National Civil Service Exam (Technical) Question 2]

Recommended Post: 【Artificial Intelligence】 AI Technical Exam Table of Contents


a. Stochastic Control Theory

b. Reinforcement Learning



Q.

The following is a 4 × 4 grid board on which a robot can move around. Cells 1 and 16, marked in gray, are terminal points. The robot can move in four directions—east, west, south, and north—from its current position. If it chooses a direction that leads outside the board, it stays in place and receives a reward of ‘-1’. If it chooses a direction that stays within the board, it moves in that direction and receives a reward of ‘0’. When it enters a terminal point, it receives a reward of ‘5’. Answer the following questions.


스크린샷 2025-10-20 오후 2 22 03


Q1.

(b) shows an example of a movement path from the starting position, cell 10 (marked with a square), to the terminal point, cell 1. Find the state set, action set, and reward set for this movement problem.

A1.

State set: {Move, Terminal}

Action set: {East, West, South, North, Wait} (Note: East, West, South, North correspond to the ‘Move’ state, while Wait corresponds to the ‘Terminal’ state.)

Reward set: {-1, 0, 5}

Q2.

Calculate the cumulative reward (total reward) of ⒝.

A2.

0 → 0 → 0 → 0 → 0 → -1 → -1 → 4

Therefore, the cumulative reward (total reward) is 4.




Practice Problem.

Controlled Markov Chain Formulation (Source: ECE 558, UMich)

Problem.

Suppose a system evolves as discrete-time Markov chain on finite state space S = {1, 2, . . . , I } evolves according to a fixed transition law Pt(j i) at time t = 0,1,2,…, and it generates costs ct(i) if it is in state i. At any stage, the decision maker may either let the system evolve uninterrupted or instead intervene and choose an action u ∈ U where U is finite, which leads to transition law Pt(j i, u) at time t and generates cost ct(i,u) at time t. Formulate this as a Controlled Markov Chain (aka Markov Decision Process). Clearly identify the action sets, costs and the transition probabilities.

Answer.

Denote the choice of the decision-maker to not interrupt the system as action 0, and then append to action set 𝒰 to get new action set Ũ = {0} ∪ 𝒰; when the decision-maker chooses to interrupt the system, then they do so by choosing a specific action u ∈ 𝒰. Then, define Pt(j i,0) = Pt(j i) for all i,j ∈ S and t ∈ ℤ+, and ct(i,0) = ct(i) for all i ∈ S and t ∈ ℤ+. Then, the new problem with states space S, action space Ũ, transition kernels {Pt(j i,u) : i,j ∈ S andu ∈ Ũ, ∀t ∈ ℤ+}, and cost functions {ct(i,u) : i ∈ S and u ∈ Ũ, ∀t ∈ ℤ+} is a Markov Decision Process (MDP) formulation of the original problem.



Input: 2025.07.14 01:32

results matching ""

    No results matching ""