Korean, Edit

Data Analysis: Kaplan-Meier Survival Curve

Recommended Article : 【Bioinformatics】 Table of Contents for Bioinformatics Analysis


1. Number of patients who died this time

2. # of censored

3. Probability of having survived so far

4. R code

5. Python code

6. Analysis metric



Total Time (t) Number of Patients Died (d) Number of Patients Alive Until Now (n) Number of Censored Probability of Death in this Period (d/n) Probability of Survival in this Period (1 - d/n) Probability of Being Alive Up to Now (L)
6 1 23 0 0.0435 0.9565 0.9565
12 1 22 0 0.0455 0.9545 0.9130
21 1 21 0 0.0476 0.9524 0.8695
27 1 20 0 0.0500 0.9500 0.8260
32 1 19 0 0.0526 0.9474 0.7826
39 1 18 0 0.0556 0.9444 0.7391
43 2 17 1 0.1176 0.8824 0.6522
89 1 14 5 0.0714 0.9286 0.6056
261 1 8 0 0.1250 0.8750 0.5299
263 1 7 0 0.1429 0.8571 0.4542
270 1 6 1 0.1667 0.8333 0.3785
311 1 4 . 0.2500 0.7500 0.2839

Table 1. Example Kaplan-Meier Survival Curve


1. Number of patients who died this time (number of patients died)

⑴ The 23 people in the first row mean that 23 were alive when the time was 0.

⑵ The 22 people in the second row mean that 22 were alive when the time was 6.

⑶ The number of patients still alive (number at risk) is sometimes shown and sometimes not.



2. # of censored

⑴ Censored means there is no further record for reasons such as no longer being hospitalized.

⑵ It is assumed that censored patients have the same survival probability as other survivors.

⑶ Typically marked on the graph with a ⊕ symbol.



3. Probability of having survived so far

⑴ Probability of surviving until time 12 = 0.9130 = Probability of surviving until time 6 × Probability of surviving this period (6 ~ 12) = 0.9565 × 0.9545

⑵ Probability of surviving until time 21 = 0.8695 = Probability of surviving until time 12 × Probability of surviving this period (12 ~ 21) = 0.9130 × 0.9524



4. R code

⑴ R code related to the survival curve (considering # of censored)


install.packages("survival")
install.packages("survminer")
library(survival)
library(survminer)

surv_data <- data.frame(
  time = c(6, 12, 21, 27, 32, 39, 43, 43, 43, 89, 89, 89, 89, 89, 89, 261, 263, 270, 270, 311),
  status = c(1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1) # 사망은 1, censoring은 0
)

surv_obj <- Surv(time = surv_data$time, event = surv_data$status)

fit <- survfit(surv_obj ~ 1, data = surv_data)

ggsurvplot(
  fit, 
  data = surv_data, 
  xlab = "Time", 
  ylab = "Survival probability", 
  title = "Kaplan-Meier Survival Curve",
  surv.median.line = "hv", # 중앙값 생존 시간 선 추가
  ggtheme = theme_minimal(), # 테마 설정
  risk.table = TRUE, # 위험 테이블 추가
  palette = "Dark2" # 색상 팔레트 설정
)


스크린샷 2024-04-13 오전 10 51 57


⑵ If a p-value is added : Note, the following code is hypothetical and unrelated to the above table.


library(survival)
library(survminer)

# Prepare Condition, Overall Survival Time, and Overall Survival Status
print(condition) # continuous variable
print(overall_surv_time)
print(overall_surv_status)

# Convert Overall Survival Status to a binary variable, where 1 = event occurred (DECEASED) and 0 = censored (LIVING or NA)
overall_surv_status_binary <- ifelse(overall_surv_status == "DECEASED", 1, 0)

# Create survival objects
surv_obj_overall <- Surv(time = overall_surv_time, event = overall_surv_status_binary)

# Find the median value of the condition
median_condition <- median(condition, na.rm = TRUE)

# Split data based on the median condition
data_overall <- data.frame(surv_obj_overall, Status = overall_surv_status_binary, Condition = condition)

# Overall Survival Analysis
ggsurvplot(
  survfit(surv_obj_overall ~ Condition >= median_condition, data = data_overall),
  data = data_overall,
  pval = TRUE,
  risk.table = TRUE,
  title = "Overall Survival based on Median Condition",
  legend.title = "Condition >= Median"
)



5. Python code


from lifelines import KaplanMeierFitter
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Simulate some survival data for demonstration purposes
np.random.seed(42)  # for reproducibility
n_patients = 100
study_duration = 1000  # days

# Generate some survival times with a predefined survival rate (we will use an exponential distribution)
survival_times = np.random.exponential(scale=365, size=n_patients)  # mean survival time

# Generate censoring times, assuming that if a patient hasn't had an event by the end of the study, they are censored
censoring_times = np.random.uniform(low=0, high=study_duration, size=n_patients)

# The observed time is the minimum of the survival time and censoring time
observed_times = np.minimum(survival_times, censoring_times)

# The event is observed if the survival time is less than or equal to the censoring time
events_observed = (survival_times <= censoring_times).astype(int)

# Create a DataFrame
df_patients = pd.DataFrame({
    'duration': observed_times,
    'event_observed': events_observed
})

# Fit the Kaplan-Meier survival estimator on the data
kmf = KaplanMeierFitter()
kmf.fit(df_patients['duration'], event_observed=df_patients['event_observed'])

# Plot the survival function
plt.figure(figsize=(10, 6))
kmf.plot_survival_function()
plt.title('Kaplan-Meier Survival Curve')
plt.xlabel('Days since Start of Study')
plt.ylabel('Survival Probability')
plt.grid(True)
plt.show()


스크린샷 2024-04-13 오전 10 52 36



6. Analysis metric

⑴ Test statistics for group differences

① Log-rank test (Mantel–Cox test)

② Wilcoxon (Breslow) test

③ Tarone–Ware test

⑵ Metrics for evaluating the survival curve itself

① Concordance Index (C-index)

② Integrated Brier Score (IBS)

③ AUC

Clustering comparison metrics (e.g., AMI)



Input : 2021.04.13 17:29

Updated : 2024.03.11 21:42

results matching ""

    No results matching ""