2025 National Civil Service Exam (Technical) Question 2]

Recommended Post: 【Artificial Intelligence】 AI Technical Exam Table of Contents

Q.

The following is a 4 × 4 grid board on which a robot can move around. Cells 1 and 16, marked in gray, are terminal points. The robot can move in four directions—east, west, south, and north—from its current position. If it chooses a direction that leads outside the board, it stays in place and receives a reward of ‘-1’. If it chooses a direction that stays within the board, it moves in that direction and receives a reward of ‘0’. When it enters a terminal point, it receives a reward of ‘5’. Answer the following questions.

Q1.

(b) shows an example of a movement path from the starting position, cell 10 (marked with a square), to the terminal point, cell 1. Find the state set, action set, and reward set for this movement problem.

A1.

State set: {Move, Terminal}

Action set: {East, West, South, North, Wait} (Note: East, West, South, North correspond to the ‘Move’ state, while Wait corresponds to the ‘Terminal’ state.)

Reward set: {-1, 0, 5}

Q2.

Calculate the cumulative reward (total reward) of ⒝.

A2.

0 → 0 → 0 → 0 → 0 → -1 → -1 → 4

Therefore, the cumulative reward (total reward) is 4.

Practice Problem.

Controlled Markov Chain Formulation (Source: ECE 558, UMich)

Problem.

Suppose a system evolves as discrete-time Markov chain on finite state space S = {1, 2, . . . , I } evolves according to a fixed transition law P_t(j i) at time t = 0,1,2,…, and it generates costs c_t(i) if it is in state i. At any stage, the decision maker may either let the system evolve uninterrupted or instead intervene and choose an action u ∈ U where U is finite, which leads to transition law P_t(j i, u) at time t and generates cost c_t(i,u) at time t. Formulate this as a Controlled Markov Chain (aka Markov Decision Process). Clearly identify the action sets, costs and the transition probabilities.

Answer.

Denote the choice of the decision-maker to not interrupt the system as action 0, and then append to action set 𝒰 to get new action set Ũ = {0} ∪ 𝒰; when the decision-maker chooses to interrupt the system, then they do so by choosing a specific action u ∈ 𝒰. Then, define P_t(j i,0) = P_t(j i) for all i,j ∈ S and t ∈ ℤ₊, and c_t(i,0) = c_t(i) for all i ∈ S and t ∈ ℤ₊. Then, the new problem with states space S, action space Ũ, transition kernels {P_t(j i,u) : i,j ∈ S andu ∈ Ũ, ∀t ∈ ℤ₊}, and cost functions {c_t(i,u) : i ∈ S and u ∈ Ũ, ∀t ∈ ℤ₊} is a Markov Decision Process (MDP) formulation of the original problem.

Input: 2025.07.14 01:32

2441

2025 National Civil Service Exam (Technical) Question 2]

Q.

Q1.

A1.

Q2.

A2.

Practice Problem.

Problem.

Answer.

results matching ""

No results matching ""