Korean, Edit

Data Analysis: Kaplan-Meier Survival Curve

Recommended Article : 【Bioinformatics】 Table of Contents for Bioinformatics Analysis


1. Number of patients who died this time

2. # of censored

3. Probability of having survived so far

4. R code

5. Python code



Total Time (t) Number of Patients Died (d) Number of Patients Alive Until Now (n) Number of Censored Probability of Death in this Period (d/n) Probability of Survival in this Period (1 - d/n) Probability of Being Alive Up to Now (L)
6 1 23 0 0.0435 0.9565 0.9565
12 1 22 0 0.0455 0.9545 0.9130
21 1 21 0 0.0476 0.9524 0.8695
27 1 20 0 0.0500 0.9500 0.8260
32 1 19 0 0.0526 0.9474 0.7826
39 1 18 0 0.0556 0.9444 0.7391
43 2 17 1 0.1176 0.8824 0.6522
89 1 14 5 0.0714 0.9286 0.6056
261 1 8 0 0.1250 0.8750 0.5299
263 1 7 0 0.1429 0.8571 0.4542
270 1 6 1 0.1667 0.8333 0.3785
311 1 4 . 0.2500 0.7500 0.2839

Table 1. Example Kaplan-Meier Survival Curve


1. Number of patients who died this time (number of patients died)

⑴ The 23 people in the first row mean that 23 were alive when the time was 0.

⑵ The 22 people in the second row mean that 22 were alive when the time was 6.

⑶ The number of patients still alive (number at risk) is sometimes shown and sometimes not.



2. # of censored

⑴ Censored means there is no further record for reasons such as no longer being hospitalized.

⑵ It is assumed that censored patients have the same survival probability as other survivors.

⑶ Typically marked on the graph with a ⊕ symbol.



3. Probability of having survived so far

⑴ Probability of surviving until time 12 = 0.9130 = Probability of surviving until time 6 × Probability of surviving this period (6 ~ 12) = 0.9565 × 0.9545

⑵ Probability of surviving until time 21 = 0.8695 = Probability of surviving until time 12 × Probability of surviving this period (12 ~ 21) = 0.9130 × 0.9524



4. R code

⑴ R code related to the survival curve (considering # of censored)


install.packages("survival")
install.packages("survminer")
library(survival)
library(survminer)

surv_data <- data.frame(
  time = c(6, 12, 21, 27, 32, 39, 43, 43, 43, 89, 89, 89, 89, 89, 89, 261, 263, 270, 270, 311),
  status = c(1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1) # 사망은 1, censoring은 0
)

surv_obj <- Surv(time = surv_data$time, event = surv_data$status)

fit <- survfit(surv_obj ~ 1, data = surv_data)

ggsurvplot(
  fit, 
  data = surv_data, 
  xlab = "Time", 
  ylab = "Survival probability", 
  title = "Kaplan-Meier Survival Curve",
  surv.median.line = "hv", # 중앙값 생존 시간 선 추가
  ggtheme = theme_minimal(), # 테마 설정
  risk.table = TRUE, # 위험 테이블 추가
  palette = "Dark2" # 색상 팔레트 설정
)


스크린샷 2024-04-13 오전 10 51 57


⑵ If a p-value is added : Note, the following code is hypothetical and unrelated to the above table.


library(survival)
library(survminer)

# Prepare Condition, Overall Survival Time, and Overall Survival Status
print(condition) # continuous variable
print(overall_surv_time)
print(overall_surv_status)

# Convert Overall Survival Status to a binary variable, where 1 = event occurred (DECEASED) and 0 = censored (LIVING or NA)
overall_surv_status_binary <- ifelse(overall_surv_status == "DECEASED", 1, 0)

# Create survival objects
surv_obj_overall <- Surv(time = overall_surv_time, event = overall_surv_status_binary)

# Find the median value of the condition
median_condition <- median(condition, na.rm = TRUE)

# Split data based on the median condition
data_overall <- data.frame(surv_obj_overall, Status = overall_surv_status_binary, Condition = condition)

# Overall Survival Analysis
ggsurvplot(
  survfit(surv_obj_overall ~ Condition >= median_condition, data = data_overall),
  data = data_overall,
  pval = TRUE,
  risk.table = TRUE,
  title = "Overall Survival based on Median Condition",
  legend.title = "Condition >= Median"
)



5. Python code


from lifelines import KaplanMeierFitter
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Simulate some survival data for demonstration purposes
np.random.seed(42)  # for reproducibility
n_patients = 100
study_duration = 1000  # days

# Generate some survival times with a predefined survival rate (we will use an exponential distribution)
survival_times = np.random.exponential(scale=365, size=n_patients)  # mean survival time

# Generate censoring times, assuming that if a patient hasn't had an event by the end of the study, they are censored
censoring_times = np.random.uniform(low=0, high=study_duration, size=n_patients)

# The observed time is the minimum of the survival time and censoring time
observed_times = np.minimum(survival_times, censoring_times)

# The event is observed if the survival time is less than or equal to the censoring time
events_observed = (survival_times <= censoring_times).astype(int)

# Create a DataFrame
df_patients = pd.DataFrame({
    'duration': observed_times,
    'event_observed': events_observed
})

# Fit the Kaplan-Meier survival estimator on the data
kmf = KaplanMeierFitter()
kmf.fit(df_patients['duration'], event_observed=df_patients['event_observed'])

# Plot the survival function
plt.figure(figsize=(10, 6))
kmf.plot_survival_function()
plt.title('Kaplan-Meier Survival Curve')
plt.xlabel('Days since Start of Study')
plt.ylabel('Survival Probability')
plt.grid(True)
plt.show()


스크린샷 2024-04-13 오전 10 52 36



Input : 2021.04.13 17:29

Updated : 2024.03.11 21:42

results matching ""

    No results matching ""