LAMP for Survival Analysis

View the Project on GitHub rtrelator/SurvivalLAMP

Survival LAMP is an extended version of LAMP (Terada et al 2013) for performing multiple testing correction in finding combinatorial markers using log-rank test in survival analysis. Details and usage of the original LAMP can be found here.

How to run LAMP for survival analysis:

  1. Clone or download from GitHub.
  2. Compile: run make in corresponding directory
  3. Run:
    python lampSA.py -p logrank [item_file] [status_file] [significance_level] -t [time_file] > [output_file]
    

    EXAMPLE:

    python lampSA.py -p logrank sample/sample_logrank_item.csv sample/sample_logrank_status.csv 0.05 -t sample/sample_logrank_time.csv > sample/sample_logrank_output.txt
    

Input Files:

  1. ITEM FILE
    • similar format to the original LAMP item file, this file contains association information of samples/individuals and markers
    • first row as header (commented first entry, contains marker identifiers)
    • first column contains the list of sample/individual IDs (one sample/individual per row)
    • succeeding columns represent markers (e.g. gene, SNPs, etc.) (one marker per column)
    • entries are 1 or 0 (integer type)
    • file format: csv
    • example: sample/sample_logrank_item.csv
  2. STATUS FILE
    • similar format to the original LAMP value file, except that it contains the status of samples/individuals
    • first row as header (commented first entry similar to item file)
    • first column must be the same as the item file
    • second column contains status of corresponding samples/individuals (1 = event, 0 = censored)
    • file format: csv
    • example: sample/sample_logrank_status.csv
  3. TIME FILE
    • contains survival time information of samples/individuals
    • first row as header (commented first entry similar to item and status files)
    • first column must be the same as the item and value files
    • second column contains survival time of corresponding samples/individuals (e.g. in months, years, or days)
    • file format: csv
    • example: sample/sample_logrank_time.csv

Output File:

Log-rank Test and Kaplan-Meier Curves in R

Performing log-rank test and generating KM plots for the combination results can be implemented using the survival package in R:

Log-rank Test

library(survival)

# read the three input files for LAMP
df = read.table("sample/sample_logrank_item.csv", sep = ",", header = T, comment.char = "", check.names = F, stringsAsFactors = F)
df_status = read.table("sample/sample_logrank_status.csv", sep = ",", header = T, comment.char = "", check.names = F, stringsAsFactors = F)
df_time = read.table("sample/sample_logrank_time.csv", sep = ",", header = T, comment.char = "", check.names = F, stringsAsFactors = F)

# can join into one data frame, otherwise use respective columns of each data frame separately
df$TIME = df_time$TIMErecurrence; df$STATUS = df_status$EVENTrecurrence

# compute for combination value by getting the product for each sample/individual, add to data frame as column 'COMB'
# example for combination r60_n9,G3PDH_570
df$COMB = apply(df[,c("r60_n9","G3PDH_570")], 1, prod)

# perform log-rank test
lr = survdiff(Surv(TIME, STATUS) ~ COMB, data = df)
print(lr)

Call:
survdiff(formula = Surv(TIME, STATUS) ~ COMB, data = df)

         N Observed Expected (O-E)^2/E (O-E)^2/V
COMB=0 281       91    98.14      0.52      18.5
COMB=1  14       10     2.86     17.87      18.5

 Chisq= 18.5  on 1 degrees of freedom, p= 1.74e-05

In the R results:

Plotting KM Curves

# fit Kaplan-Meier and plot curves
fit = survfit(Surv(TIME, STATUS) ~ COMB, data = df)
plot(fit)

# add tick marks for censored data and color legends
plot(fit, mark.time = T, col = c("blue", "red"), xlab = "Time", ylab = "Survival Probability", lwd = 2, lty = 1, cex.lab = 1.5, cex.axis = 1.5, cex.main = 1.5, cex.sub = 1.5)
legend("topright", c("without combination", "with combination"), lty = 1, col = c("blue","red"), lwd = 1.5, cex = 1.5)

# getting p-value from log-rank test result
pval = 1 - pchisq(lr$chisq, 1)
title(sprintf("r60_n9,G3PDH_570\np = %.4e", pval), cex.main = 1.5) 

The code above will produce the following plot: sample_plot

Reference

Raissa T. Relator, Aika Terada, Jun Sese. Identifying statistically significant combinatorial markers for survival analysis. BMC Medical Genomics. 2018 11(Suppl 2):31.