Wonderful Wednesday April 2020

survival data Kaplan-Meier Wonderful Wednesdays

In this webinar we discuss the first submissions looking at some great ways to visualise a categorical response variable over time. We talk about what the SIG members like about each plot and give some pointers on areas we think could be improved. More specifically, we talk about: the value of a title with conclusions, the use of colour, the benefits of animation to display time, ways to declutter your graph and examples of clean design, enclosure to highlight important parts of the data, the benefits of the sankey diagram, how different charts help to answer different questions, and many more aspects of great visualizations.

Rachel Phillips, Steve Mallett https://www.psiweb.org/sigs-special-interest-groups/visualisation
04-09-2020

This month’s challenge involved a classic survival dataset example. You can download the dataset here. This simulated example dataset was based on a phase III clinical trial in breast cancer, you can read more about this study here. To summarise, this was a four-arm study, where participants were randomised to receive either one of two monotherapies or one of two combination therapies for 1 year with 4 years post treatment follow-up. The primary outcome was progression free survival and we wanted you to explore different ways of visualising this time-to-event outcome. The dataset contained information that allowed you to explore subgroups, patterns of events and individual patient profiles. More information on the dataset is available here.

We received a variety of submissions in this round including both static and interactive plots and we would like to thank everyone for their contributions. In the webinar members of the VIS SIG discussed the merits of each and picked out key elements they felt would make the ‘ultimate plot’. We summarise the discussions for each submission below.

Example 1

Plot 1
Plot 1

link to code
high-resolution image

On the left this plot displays the proportion of participants that are event free over time by treatment arm and on the right we can see the cumulative number of events over time by treatment arm. Inclusion of both of these figures into one image gives a view of the data which we don’t often see. Inclusion of uncertainty is also not often seen in Kaplan-Meier plots (but has been advocated in this recent publication from Morris et al.) but is extremely useful and the clutter from the overlap of the confidence bands is perhaps less problematic with inclusion of the cumulative plot on the right. The team liked the inclusion of the tables of numbers at risk and the cumulative number of events at the bottom of each plot, this is important information, and we felt this approach didn’t overload or create a cluttered image. We thought there was a nice use of colour.

Whilst there is clear labelling of the axis and the inclusion of a common legend ensures clarity this image would benefit from inclusion of some extra details to allow it to stand-alone, for example it would benefit from inclusion of a title. In terms of creating a clearer image we also felt that the dashes to indicate censoring in the image on the left could be omitted given the information on numbers at risk are available at the bottom of the plot.

This is a nice example of an exploratory type plot that statisticians can use to understand the data but if the aim is to communicate a message to stakeholders then we would suggest presenting less information in a simpler graphic.

Example 2

Plot 2
Plot 2

link to code
high-resolution image

This presentation of the data displays the cumulative proportion of participants experiencing an event. It includes a separate plot for each treatment arm and the inclusion of subtitles ensures clarity. This plot is being used to tell a story, which the title guides us towards. Inclusion of a reference line at an important time point is helpful to guide the audience’s attention. Labelling the x and y-axis only once prevents clutter and in combination with the inclusion of the reference lines makes this a clear presentation of the data. This demonstrates an effective use of graduation of saturation and clear annotation.

The team noted that there are six colours of grey in the legend but that it was difficult to pick out all six groups in the image. This was because one of the groups is very small. However, it’s tricky to pick out which group this is so we caution potential users to be mindful of this in their own plots.

This is an original, well thought through presentation of survival data which we commend and think would work well where the is a natural ordering of the groups.

Example 3

Plot 3
Plot 3

link to code
high-resolution image

This is a very nice, interactive visual presentation of a growing or developing Kaplan-Meier plot link. This image included some really nice design points that update as the image progresses that can help to tell a story. These included:

The team felt that this interactive approach could help understanding and interpretation amongst those not so familiar with Kaplan-Meier plots. It’s a potentially useful way to force the viewer to take the entire time period into consideration and not just the final time point when there is often the most uncertainty.

With some additional information on the right y-axis such as estimates of treatment effects this could be a really powerful design choice. A minor point was that it would be nice to include treatment labels with the numbers on the right y-axis so the audience do not have to keep referring back to the legend as the image progresses.

Example 4

Plot 4
Plot 4

link to code
high-resolution image

This is a very flexible tool to display Kaplan-Meier plots that explores the subgroup element of the challenge link. It allows the user to change what is presented e.g. displaying information on different treatment groups, different subgroups, or restricting information presented by the age range of participants. Including the legend within the plot and use of grid lines both help audiences to interpret this image.

This is a nice way to present the data for exploratory purposes. It would be nice to include sample sizes and confidence intervals within this tool to represent the uncertainty and minimise the risk of making incorrect conclusions.

Example 5

Plot 5
Plot 5

link to code
link to html file
high-resolution image

The use of a heat map to present survival data is a new idea to the group. At first some of us found it difficult to understand but with some thought it definitely becomes clearer. Given the novelty of this plot it probably needs more information to explain what is going on but could be very useful to help deliver a message to audiences. Inclusion of a clear message in the title and the ordering of the data in a way that the image portrays this really helps i.e. the white stripe representing the median helps show differences between treatment arms.

As well as a ‘big picture’ message from the changes in colour there is also an interactive element to this plot allowing more detailed information to be provided, as the user hovers over different points on the plot information on the survival function at that time point is displayed. Similarly hovering over the reference lines displays further information.

Example 6

Plot 6
Plot 6

link to code
high-resolution image

This image displays a Kaplan-Meier plot for each treatment arm with accompanying confidence bands with inclusion of other treatment arms in grey for comparison. This is simple, clear plot exhibiting use of good data visualisation principles. We liked the clear message in the title, inclusion of numbers at risk for each plot rather than in a table and the use of grid lines. The team felt that this plot would be good for communicating a message.

Minor adjustments we recommend are: including a more prominent grid line at 52 weeks, including a legend clearly explaining what the red line and reference bands are and changing the scale from days to weeks or months.

Code

Example 1 (R)

### Purpose: create TTE KM plots

### Load packages
library(readr)
library(survival)
library(survminer)
library(stringr)

### Load data
setwd('C:/Users/XXXXXXXXXXXXX')
ADTTE <- read_delim('2020-04-08-psi-vissig-adtte.csv', delim=';')

### AVAL is given in days; transform into weeks
ADTTE$AVAL_wk = ADTTE$AVAL / 7

### Create survival curves
surv_curve <- survfit(Surv(AVAL_wk, CNSR == 0) ~ TRT01P, data = ADTTE)

### Adjust treatment group labels in surv_curve
names(surv_curve$strata) <- str_remove(names(surv_curve$strata), "TRT01P=") 

### Theme for plots
theme_table <- theme_minimal(base_size = 18) %+replace% 
  theme(panel.grid.major.x = element_blank(), panel.grid.minor.x = element_blank(),  # remove vertical gridlines
        panel.grid.major.y = element_blank(),  # remove horizontal gridlines
        axis.title.y = element_blank(), axis.title.x = element_blank(),  # remove axis titles
        axis.text.x = element_blank(), axis.ticks.x = element_blank())  # remove x-axis text and ticks

theme_gg <- theme_minimal(base_size = 18) %+replace% 
  theme(panel.grid.minor.y = element_blank(), panel.grid.minor.x = element_blank(),  # remove minor gridlines
        legend.title = element_blank())  # remove legend title

### Plot 
plots <- list()
# Proportion event-free
plots[[1]] <- ggsurvplot(fit = surv_curve,  # survfit object
                         data = ADTTE,  # data used to fit survival curves. 
                         break.y.by = 0.2,  # y-axis breaks
                         risk.table = T,  # show risk table.
                         conf.int = T,  # show confidence intervals
                         xlim = c(0, 160),  # present narrower X axis
                         break.time.by = 20,  # x-axis breaks
                         ggtheme = theme_gg,  # use theme defined above
                         risk.table.y.text = F, # show bars instead of names in text annotations in legend of risk table
                         xlab = "Week",
                         ylab = "Proportion event-free",
                         legend = c(0.7, 0.15),  # legend position
                         tables.theme = theme_table)  # use them defined above for risk table

# Cumulative proportion with event
plots[[2]] <- ggsurvplot(fit = surv_curve,
                         data = ADTTE,
                         fun = "event",  # show event probability instead of survival probability
                         ylim = c(0, 0.6),  # do not include full scale (0-1) to allow to better discriminate between treatments
                         break.y.by = 0.1,
                         cumevents = T,  # show cumulative number of events
                         conf.int = F,
                         xlim = c(0, 160),
                         break.time.by = 20,
                         ggtheme = theme_gg,
                         cumevents.y.text = FALSE,
                         xlab = "Week",
                         ylab = "Cumulative proportion with event",
                         legend = "none",
                         tables.theme = theme_table)

# Save plot
ggsave(filename = "2020-04-08-markusv_km.png",
       plot = arrange_ggsurvplots(plots, print = T, ncol = 2, nrow = 1),
       width = 19.3, height = 9, units = "in")

Back to blog

Example 2 (R)

### Purpose: create TTE area plot

### Load packages
library(readr)
library(dplyr)
library(ggplot2)
library(zoo)

### Load data
setwd('C:/Users/XXXXXXX')
ADTTE <- read_delim('2020-04-08-psi-vissig-adtte.csv', delim=';')

### AVAL is given in days; transform into weeks
ADTTE$AVAL_wk = ADTTE$AVAL / 7

### Derive number of subjects per treatment group
ADTTE_ntrt <- ADTTE %>% group_by(TRT01PN, TRT01P) %>% mutate(N=n())

### Derive cumulative number (percentage) of subjects with event / censoring
ADTTE_cum <- ADTTE_ntrt %>% group_by(TRT01PN, TRT01P, N, EVNTDESC, AVAL_wk) %>% 
  summarise(n = n()) %>% mutate(n = cumsum(n), pct = n / N)

### Create grid to ensure all groups have a value at each timepoint (necessary for geom_area())
grid <- with(ADTTE_cum, expand.grid(TRT01PN = unique(TRT01PN), EVNTDESC = unique(EVNTDESC), 
                                    AVAL_wk = unique(AVAL_wk)))
ADTTE_grid <- merge(grid, ADTTE_cum, by = c("TRT01PN", "EVNTDESC", "AVAL_wk"), all = T)

### Fill in NA values with last non-NA value and remove leading NA rows
ADTTE_locf <- ADTTE_grid %>% group_by(TRT01PN, EVNTDESC) %>% do(na.locf(.))

### Control order (show events at bottom and censoring above)
ADTTE_locf$EVNTDESC <- factor(ADTTE_locf$EVNTDESC, levels = c("Lost to follow-up", "No next-line therapy initiated",
                                                              "Second next-line therapy initiated",
                                                              "Ongoing on first next-line therapy", "PD", "Death"))

### Add N to treatment label and order by TRT01PN
ADTTE_locf$TRT01P_label <- with(ADTTE_locf, paste0(TRT01P, " (N=", N, ")"))
ADTTE_locf$TRT01P_label <- reorder(ADTTE_locf$TRT01P_label, ADTTE_locf$TRT01PN)

### Stacked area chart
ggplot(data = ADTTE_locf, aes(x = AVAL_wk, y = pct, fill = EVNTDESC)) +
  geom_area(alpha = 0.6) +
  facet_wrap(TRT01P_label ~ .) +
  scale_x_continuous(limits = c(0, 160), breaks = seq(0, 160, by = 20), name = "Week") +
  scale_y_continuous(limits = c(0, 1), breaks = seq(0, 1, by = 0.2), 
                     name = "Cumulative proportion with event / censoring", expand = c(0, 0)) +
  geom_vline(xintercept = 52, alpha = 0.3) +
  theme_minimal(base_size = 18) +
  scale_fill_grey(start = 0.8, end = 0.2) +
  theme(legend.position = c(0.15, 0.85), legend.title = element_blank(),  # remove legend title and set legend position
        panel.grid.major.x = element_blank(), panel.grid.minor.x = element_blank(),  # remove vertical gridlines
        panel.grid.minor.y = element_blank()) +  # remove horizontal minor gridlines
  ggtitle("Combination of tablemab + vismab leads to less deaths during first 52 weeks of study treatment")

ggsave(filename = "2020-04-08-markusv_area.png", width = 19.3, height = 9, units = "in")

Back to blog

Example 3 (SAS)

filename urlFile url 'https://raw.githubusercontent.com/VIS-SIG/Wonderful-Wednesdays/master/data/2020/2020-04-08/2020-04-08-psi-vissig-adtte.csv';
*--------------------------------
Import Procedure
---------------------------------;


data ADTTE 
    (label='Time-to-Event'
     compress=yes
);

infile urlFile 
  delimiter = ',' 
  missover 
  dsd 
  lrecl=32767 
  firstobs=2;
attrib 
    STUDYID length=$20 format=$20. informat=$20. label='the study identifier'
    SUBJID length=8 format=8. informat=8. label='subject identifier'
    USUBJID length=$40 format=$40. informat=$40. label='unique subject iddentifier'
    AGE length=8 format=8. informat=8. label='age at randomisation (years)'
    STR01 length=$8 format=$8. informat=8. label='Hormone receptor status at randomisation'
    STR01N length=3 format=3. informat=3. label='Hormone receptor positive (Numeric)'
    STR01L length=$30 format=$30. informat=$30. label='Hormone receptor positive (Long format)'
    STR02 length=$30 format=$30. informat=$30. label='Prior Radiotherapy at randomisation'
    STR02N length=3 format=3. informat=3. label='Prior Radiotherapy at randomisation (Numeric)'
    STR02L length=$40 format=$40. informat=$40. label='Prior Radiotherapy at randomisation (Long format)'
    TRT01P length=$40 format=$40. informat=$40. label='Planned treatment assigned at randomisation'
    TRT01PN length=8 format=8. informat=8. label='Planned treatment assigned at randomisation (Numeric)'
    PARAM length=$40 format=$40. informat=40. label='Analysis parameter - Progression free survival'
    PARAMCD length=$10 format=$10. informat=$10. label='Analysis parameter code'
    AVAL length=8 format=8. informat=8. label='Analysis value (time to event [days])'
    CNSR length=3 format=3. informat=3. label='Censoring (0 = Event, 1 = censored)'
    EVNTDESC length= $34 format=$34. informat=$34. label='Event description'
    CNSDTDSC length=$33 format=$33. informat=$34. label='Censoring description'
    DCTREAS length=$25 format=$25. informat=34. label='Discontinuation from study reason'
;
input
  STUDYID SUBJID USUBJID AGE STR01 STR01N STR01L    
  STR02 STR02N STR02L   TRT01P TRT01PN PARAM PARAMCD    
  AVAL CNSR EVNTDESC CNSDTDSC DCTREAS;
run;


data TIME_TO_EVENT
( keep =  Patient Age_at_Start Age_Group Age_at_End Survival_Days Survival_Weeks
          Full_Years Hormone_Receptor_Status Prior_Radiotherapy
        Treatment_No Treatment Censoring Death Discontinuation Discontinuation_Reason
);

attrib
    Patient length=8 
    Age_at_Start length=8
    Age_Group length=$5
    Age_at_End length=8
    Survival_Days length=8 label='Survival Days'
    Full_Years length=8
    Hormone_Receptor_Status length=$8 
    Prior_Radiotherapy length=$1
    Treatment_No length=8
    Treatment length=$40
    Censoring length=8
    Death length=8
    Discontinuation length=$1
    Discontinuation_Reason length=$25 
;
set RECD.ADTTE;

    Patient = SUBJID;
    Age_at_Start = AGE;
    Survival_Days = AVAL;
    Survival_Weeks = int(Survival_Days/7);
    Full_Years = int(Survival_Days/365.25);
    Age_at_End = Age_at_Start + Full_Years;
    Hormone_Receptor_Status = STR01;
    Prior_Radiotherapy = ifc(STR02N=1,'Y','N');
    Treatment_No = TRT01PN;
    Treatment = TRT01P;
    Censoring = CNSR;
    Death = ifc(EVNTDESC = 'Death',1,0);
    Discontinuation = ifc(DCTREAS = 'NA','N','Y');
    Discontinuation_Reason = propcase(DCTREAS);

    if Discontinuation = 'N' then Discontinuation_Reason = 'Not applicable';

    if Age_at_Start <= 50 then Age_Group = '25-50';
    else if 51 <= Age_at_Start <= 60 then Age_Group = '51-60' ;
    else if 61 <= Age_at_Start <= 70 then Age_Group = '61-70' ;
    else if 71 <= Age_at_Start <= 80 then Age_Group = '71-80' ;
    else  Age_Group = '81+' ;
run;


ods _all_ close;
ods graphics on;

ods output Survivalplot=SurvivalPlotData;
proc lifetest data=TIME_TO_EVENT plots=survival(atrisk=0 to 2000 by 10); 
    time Survival_Days * Death(0);
    strata Treatment/ test=logrank adjust=sidak;
run; 






proc sql;
create table time
as
select distinct time
from SurvivalPlotData;
quit;


data time;
set time end=eof;
i=_n_;
if eof then call symput('n',_n_);
run;


%macro km;
%do i=1 %to &n+10;

proc sql noprint;
select max(Time) into: Time
from Time
where i <= &i.;
quit;

data SPD;
set SurvivalPlotData (where=(Time le &Time.));
run;

proc sql;
create table Survival_Pct
as
select      StratumNum,
            min(Survival) as Survival,
            compress(put(min(Survival),percent8.1)) as Survival_txt
from        SPD 
where       Survival not=.
group by    StratumNum;
quit;

data _null_;
set Survival_Pct;
if StratumNum = 1 then do;
    call symput('Trt_val_1',Survival);
    call symputx('Trt_txt_1',Survival_txt);
end;
if StratumNum = 2 then do;
    call symput('Trt_val_2',Survival);
    call symputx('Trt_txt_2',Survival_txt);
end;
if StratumNum = 3 then do;
    call symput('Trt_val_3',Survival);
    call symputx('Trt_txt_3',Survival_txt);
end;
if StratumNum = 4 then do;
    call symput('Trt_val_4',Survival);
    call symputx('Trt_txt_4',Survival_txt);
end;
run;

title j=center font=calibri height=10pt "Survival Day = &Time";
proc sgplot data=SPD noborder nowall;
 styleattrs datacontrastcolors=(CX003469 CX69b937 CXef7d04 CX00a6cb);
 step x=time y=survival / group=stratum name='s';
  scatter x=time y=censored / markerattrs=(symbol=plus) name='c';
  scatter x=time y=censored / markerattrs=(symbol=plus) GROUP=stratum;

  keylegend 'c' / location=inside position=topright valueattrs = (family=calibri size=10pt);
  keylegend 's' / linelength=20 valueattrs = (family=calibri size=10pt);
  xaxis
    values = (0 to 2000 by 30)
    valueattrs = (family=calibri size=10pt) 
    labelattrs = (family=calibri size=10pt) 
    label = 'x - Survival Days / y - *Survival Probability'
  ;
  yaxis 
    valueattrs = (family=calibri size=10pt) 
    labelattrs = (family=calibri size=10pt) 
    values = (0 to 1 by 0.1)
    labelpos = top
    label='*'
  ;

%if &Time. ge 84 %then %do;
refline 84 / axis=x 
    lineattrs=(pattern=dash color=lightgrey) label='12 Weeks' labelattrs=(family=calibri size=10pt);
%end;

%if &Time. ge 365 %then %do;
refline 365 / axis=x 
    lineattrs=(pattern=dash color=lightgrey) label='52 Weeks / 1 Year' labelattrs=(family=calibri size=10pt);
%end;

%if &Time. ge 730 %then %do;
refline 730 / axis=x  
    lineattrs=(pattern=dash color=lightgrey) label='2 Years' labelattrs=(family=calibri size=10pt);
%end;

%if &Time. ge 1095 %then %do;
refline 1095 / axis=x 
    lineattrs=(pattern=dash color=lightgrey) label='3 Years' labelattrs=(family=calibri size=10pt)  ;
%end;

%if &Time. ge 1461 %then %do;
refline 1461 / axis=x 
    lineattrs=(pattern=dash color=lightgrey) label='4 Years' labelattrs=(family=calibri size=10pt)  ;
%end;

%if &Time. ge 1826 %then %do;
refline 1826 / axis=x 
    lineattrs=(pattern=dash color=lightgrey) label='5 Years' labelattrs=(family=calibri size=10pt)  ;
%end;

refline &Trt_val_1. / axis=y
    lineattrs=(pattern=thindot color=CX003469) label="&Trt_txt_1" labelattrs=(family=calibri size=10pt color=CX003469);

refline &Trt_val_2. / axis=y
    lineattrs=(pattern=thindot color=CX69b937) label="&Trt_txt_2" labelattrs=(family=calibri size=10pt color=CX69b937);

refline &Trt_val_3. / axis=y
    lineattrs=(pattern=thindot color=CXef7d04) label="&Trt_txt_3" labelattrs=(family=calibri size=10pt color=CXef7d04);

refline &Trt_val_4. / axis=y
    lineattrs=(pattern=thindot color=CX00a6cb) label="&Trt_txt_4" labelattrs=(family=calibri size=10pt color=CX00a6cb);
run;

title;


%end;
%mend;

/* Create the combined graph. */


options papersize=a4 printerpath=gif animation=start animduration=0.05 animloop=no noanimoverlay nodate nonumber spool;
ods printer file="c:\temp\KM_Treatment.gif";

ods graphics / width=10in height=10in  scale=on imagefmt=GIF;
%km;

options printerpath=gif animation=stop;
ods printer close;

Back to blog

Example 4 (R and readme.txt)

## PSI_April_KM_v1_0

## load in packages
library(shiny)
library(shinyWidgets)
library(tidyverse)
library(shinycssloaders)
library(broom)
library(survival)
library(survminer)
library(plotly)


tick <- icon("check")

Orange <- "#EF7F04"
Green <- "#68B937"
Blue <- "#00A6CB"
Grey <- "#4E5053"
Darkblue <- "#003569"
Yellow <- "#FFBB2D"

ADTTE <- read_csv('2020-04-08-psi-vissig-adtte_v2.csv')

ADTTE$EVNTDESC <- as.factor(ADTTE$EVNTDESC)
ADTTE$STR01L <- as.factor(ADTTE$STR01L)
ADTTE$STR02L <- as.factor(ADTTE$STR02L)
ADTTE$TRT01P <- as.factor(ADTTE$TRT01P)
ADTTE$AGE <- as.integer(ADTTE$AGE)

ADTTE2 <- ADTTE
# Define UI for app that draws a bar chart in ggplot ----

# Use a fluid Bootstrap layout
ui <- fluidPage(    
  
  # Give the page a title
  titlePanel("Interactive Kaplan-Meier Plot"),

  # Generate a row with a sidebar
  sidebarLayout(

    # Define the sidebar with inputs for each plot separately
    sidebarPanel(
      ## Conditional panels- only see first set of filters on first tab
      conditionalPanel(condition = "input.tabset == 'withn'",
                       strong("Filters for Plot:"),
                       prettyCheckboxGroup("Strat1", "Hormone receptor status:", 
                                           choices=levels(ADTTE$STR01L), 
                                           selected = levels(ADTTE$STR01L),
                                           inline = TRUE, icon = tick),
                       prettyCheckboxGroup("Strat2", "Prior radiotherapy at randomisation:",
                                           choices= levels(ADTTE$STR02L),
                                           selected = levels(ADTTE$STR02L),
                                           inline = TRUE, icon = tick),
                       prettyCheckboxGroup("Trt", "Treatment Arm:",
                                           choices= levels(ADTTE$TRT01P),
                                           selected = levels(ADTTE$TRT01P),
                                           inline = TRUE, icon = tick),
                       sliderInput("ptAGE", "Age Range:",
                                   min = min(ADTTE$AGE), max = max(ADTTE$AGE),
                                   value = c("min","max"))),
      
      ## Conditional panels- only see second set of filters on second tab
      conditionalPanel(
        condition = "input.tabset == 'QuantComp'",
        strong("Filters for Plot A:"),
        prettyCheckboxGroup("Strat1a", "Hormone receptor status:", 
                            choices=levels(ADTTE$STR01L), 
                            selected = levels(ADTTE$STR01L),
                            inline = TRUE, icon = tick),
        prettyCheckboxGroup("Strat2a", "Prior radiotherapy at randomisation:",
                            choices= levels(ADTTE$STR02L),
                            selected = levels(ADTTE$STR02L),
                            inline = TRUE, icon = tick),
        prettyCheckboxGroup("Trta", "Treatment Arm:",
                            choices= levels(ADTTE$TRT01P),
                            selected = levels(ADTTE$TRT01P),
                            inline = TRUE, icon = tick),
        sliderInput("ptAGEa", "Age Range:",
                    min = min(ADTTE$AGE), max = max(ADTTE$AGE),
                    value = c("min","max")),
        br(),
        hr(),
        br(),
        
        strong("Filters for Plot B:"),
        prettyCheckboxGroup("Strat1b", "Hormone receptor status:", 
                            choices=levels(ADTTE$STR01L), 
                            selected = levels(ADTTE2$STR01L),
                            inline = TRUE, icon = tick),
        prettyCheckboxGroup("Strat2b", "Prior radiotherapy at randomisation:",
                            choices= levels(ADTTE$STR02L),
                            selected = levels(ADTTE$STR02L),
                            inline = TRUE, icon = tick),
        prettyCheckboxGroup("Trtb", "Treatment Arm:",
                            choices= levels(ADTTE$TRT01P),
                            selected = levels(ADTTE$TRT01P),
                            inline = TRUE, icon = tick),
        sliderInput("ptAGEb", "Age Range:",
                    min = min(ADTTE$AGE), max = max(ADTTE$AGE),
                    value = c("min","max")),
        
      ),
      "Events are defined as death or disease progression."
    ),
    
    
    
    
    
    # Create a spot for the plot
    mainPanel(
      ## Creates two separate panels- one for KM plot and number of patients, one for two KM plots to allow comparisons
      tabsetPanel(type = "tabs",
                  id = "tabset",
                  tabPanel("KM plot with number of patients", plotOutput("PlotA"), br(), hr(), br(), plotOutput("PlotB"), value = "withn"),
                  
                  
                  tabPanel("KM plots to compare between subgroups", plotOutput("PlotC"), br(), hr(), br(), plotOutput("PlotD"), value = "QuantComp")
                  
                  
      )
      
    )  
  )
  
)  


# Define a server for the Shiny app


server <- function(input, output) {
  
  # Create four separate plots based on inputs
  
  ## Fit model and plot KM curve for first tab
  output$PlotA <- renderPlot({
    
    #Filter and Subset Data
    
    plotdata <- ADTTE %>% filter(STR01L %in% input$Strat1, STR02L %in% input$Strat2,
                                 TRT01P %in% input$Trt)
    plotdata2 <- subset(plotdata,
                        AGE  >= input$ptAGE[1] & AGE <= input$ptAGE[2])  
    

    
    fita <- survfit(Surv(AVAL, CNSR == 0) ~ TRT01P, data = plotdata2)
    plot <-        ggsurvplot(fita,
                              risk.table = TRUE,
                              data = plotdata2,
                              # size = 0.0001,
                              censor.shape = "",
                              palette = c("TRT01P=tablemab + vismab 52 weeks" = Darkblue,"TRT01P=tablemab x 12 week -> vismab 34 weeks" = Green, 
                                          "TRT01P=tablemab x 52 weeks" = Blue, "TRT01P=vismab x 52 weeks" = Orange),
                              # risk.table = 'nrisk_cumcensor',
                              # tables.theme = theme_cleantable(),
                              # risk.table.col = "strata",
                              # # cumevents = TRUE,
                              title = "Interactive Kaplan-Meier plot: Explore time to event across 4 treatment arms",
                              xlab = "Time (days)",
                              ylab = "Event free survival",
                              legend = c(.18,.25),
                              legend.title = "Treatment Group",
                              
                              # legend.labs = c("Control", "Treatment"),
                              break.x.by = 90,
                              xlim = c(0, 1980),
                              ggtheme = theme_classic()) 
    plot$plot + ggplot2::geom_vline(xintercept=365, linetype='dotted', col = "black") +
      ggplot2::geom_vline(xintercept=1095, linetype='dashed', col = "black") +
      ggplot2::geom_vline(xintercept=1825, linetype='solid', col = "black") +
      ggplot2::annotate(geom = "text", x = 375, y = 0, label = "Year 1", hjust = "left", size = 4.5) +
      ggplot2::annotate(geom = "text", x = 1105, y = 0, label = "Year 3", hjust = "left", size = 4.5) +
      ggplot2::annotate(geom = "text", x = 1835, y = 0, label = "Year 5", hjust = "left", size = 4.5) 
    
  })
  
  ## Filter data and pull out number of patients for n's at bottom of first tab
  output$PlotB <- renderPlot({
    plotdata <- ADTTE %>% filter(STR01L %in% input$Strat1, STR02L %in% input$Strat2,
                                 TRT01P %in% input$Trt)
    plotdata2 <- subset(plotdata,
                        AGE  >= input$ptAGE[1] & AGE <= input$ptAGE[2])  
    
    
    
    fita <- survfit(Surv(AVAL, CNSR == 0) ~ TRT01P, data = plotdata2)
    plot2 <-        ggsurvplot(fita,
                               data = plotdata2,
                               # size = 0.0001,
                               censor.shape = "",
                               palette = c("TRT01P=tablemab + vismab 52 weeks" = Darkblue,"TRT01P=tablemab x 12 week -> vismab 34 weeks" = Green, 
                                           "TRT01P=tablemab x 52 weeks" = Blue, "TRT01P=vismab x 52 weeks" = Orange),
                               # risk.table = 'nrisk_cumcensor',
                               # tables.theme = theme_cleantable(),
                               # risk.table.col = "strata",
                               # # cumevents = TRUE,
                               #title = "Interactive Kaplan-Meier plot: Explore time to event across 4 treatment arms",
                               title = "Interactive Kaplan-Meier plot: Explore time to event across 4 treatment arms",
                               xlab = "Time (days)",
                               ylab = "Event free survival",
                               legend = c(.18,.25),
                               legend.title = "Treatment Group",
                               break.x.by = 180,
                               xlim = c(0, 1980),
                               ggtheme = theme_classic(),
                               risk.table = TRUE, 
                               risk.table.col = "strata",  
                               risk.table.title="Patients remaining in study"
    ) 
    plot2$table + theme(
      axis.text.y = element_blank(),
      axis.ticks.y = element_blank(),
      legend.position = "none"
    )
    
  })  
## Filter data and create KM plot for top of second tab
  output$PlotC <- renderPlot({
    
    #Filter and Subset Data
    
    plotdatc <- ADTTE %>% filter(STR01L %in% input$Strat1a, STR02L %in% input$Strat2a,
                                 TRT01P %in% input$Trta)
    plotdata2c <- subset(plotdatc,
                         AGE  >= input$ptAGEa[1] & AGE <= input$ptAGEa[2])  
    
    
    
    fitc <- survfit(Surv(AVAL, CNSR == 0) ~ TRT01P, data = plotdata2c)
    plotb <-        ggsurvplot(fitc,
                               risk.table = TRUE,
                               data = plotdata2c,
                               # size = 0.0001,
                               censor.shape = "",
                               censor = FALSE,
                               palette = c("TRT01P=tablemab + vismab 52 weeks" = Darkblue,"TRT01P=tablemab x 12 week -> vismab 34 weeks" = Green, 
                                           "TRT01P=tablemab x 52 weeks" = Blue, "TRT01P=vismab x 52 weeks" = Orange),
                               # risk.table = 'nrisk_cumcensor',
                               # tables.theme = theme_cleantable(),
                               # risk.table.col = "strata",
                               # # cumevents = TRUE,
                               title = "Plot A:",
                               xlab = "Time (days)",
                               ylab = "Event free survival",
                               legend = c(.18,.25),
                               legend.title = "Treatment Group",
                               
                               # legend.labs = c("Control", "Treatment"),
                               break.x.by = 90,
                               xlim = c(0, 1980),
                               ggtheme = theme_classic()) 
    plotb$plot + ggplot2::geom_vline(xintercept=365, linetype='dotted', col = "black") +
      ggplot2::geom_vline(xintercept=1095, linetype='dashed', col = "black") +
      ggplot2::geom_vline(xintercept=1825, linetype='solid', col = "black") +
      ggplot2::annotate(geom = "text", x = 375, y = 0, label = "Year 1", hjust = "left", size = 4.5) +
      ggplot2::annotate(geom = "text", x = 1105, y = 0, label = "Year 3", hjust = "left", size = 4.5) +
      ggplot2::annotate(geom = "text", x = 1835, y = 0, label = "Year 5", hjust = "left", size = 4.5)
    
  })
  
  
## Filter data and create KM plot for bottom of second tab  
  output$PlotD <- renderPlot({
    
    #Filter and Subset Data
    
    plotdatad <- ADTTE %>% filter(STR01L %in% input$Strat1b, STR02L %in% input$Strat2b,
                                  TRT01P %in% input$Trtb)
    plotdata2d <- subset(plotdatad,
                         AGE  >= input$ptAGEb[1] & AGE <= input$ptAGEb[2])  
    
    
    
    fitd <- survfit(Surv(AVAL, CNSR == 0) ~ TRT01P, data = plotdata2d)
    plotd <-        ggsurvplot(fitd,
                               risk.table = TRUE,
                               data = plotdata2d,
                               censor = FALSE,
                               # size = 0.0001,
                               censor.shape = FALSE,
                               palette = c("TRT01P=tablemab + vismab 52 weeks" = Darkblue,"TRT01P=tablemab x 12 week -> vismab 34 weeks" = Green, 
                                           "TRT01P=tablemab x 52 weeks" = Blue, "TRT01P=vismab x 52 weeks" = Orange),
                               # risk.table = 'nrisk_cumcensor',
                               # tables.theme = theme_cleantable(),
                               # risk.table.col = "strata",
                               # # cumevents = TRUE,
                               title = "Plot B:",
                               xlab = "Time (days)",
                               ylab = "Event free survival",
                               legend = c(.18,.25),
                               legend.title = "Treatment Group",
                               # legend.labs = c("Control", "Treatment"),
                               break.x.by = 90,
                               xlim = c(0, 1980),
                               ggtheme = theme_classic()) 
    plotd$plot +  ggplot2::geom_vline(xintercept=365, linetype='dotted', col = "black") +
      ggplot2::geom_vline(xintercept=1095, linetype='dashed', col = "black") +
      ggplot2::geom_vline(xintercept=1825, linetype='solid', col = "black") +
      ggplot2::annotate(geom = "text", x = 375, y = 0, label = "Year 1", hjust = "left", size = 4.5) +
      ggplot2::annotate(geom = "text", x = 1105, y = 0, label = "Year 3", hjust = "left", size = 4.5) +
      ggplot2::annotate(geom = "text", x = 1835, y = 0, label = "Year 5", hjust = "left", size = 4.5)
    
    
    
    
  })
}    





# Create Shiny app ----
shinyApp(ui = ui, server = server)

Back to blog

Example 5 (R and readme.txt)

## Heatmap for PSI Wonderful Wednesday - April

## Load required packages ====

library(tidyverse)
library(plotly)
library(survival)

## Read in data and assign colours ====

Orange <- "#EF7F04"
Green <- "#68B937"
Blue <- "#00A6CB"
Grey <- "#4E5053"
Darkblue <- "#003569"
Yellow <- "#FFBB2D"

## Assumes data saved in same working directory
ADTTE <- read_csv('2020-04-08-psi-vissig-adtte_v2.csv')

## change variable types as necessary
ADTTE$STR01L <- as.factor(ADTTE$STR01L)
ADTTE$STR02L <- as.factor(ADTTE$STR02L)
ADTTE$TRT01P <- as.factor(ADTTE$TRT01P)
ADTTE$DCTREAS <- as.factor(ADTTE$DCTREAS)
ADTTE$EVNTDESC <- as.factor(ADTTE$EVNTDESC)


## Fit survival model and reformat ====
fita <- survfit(Surv(AVAL, CNSR == 0) ~ TRT01P, data = ADTTE)

## Pull out survival function data
surv <- as.data.frame(fita$surv)

## Add treatment to survival data
treatment <- c(rep("TMB+VMB 52wks", times = 433), 
               rep("TMB 12wks,VMB 34wks", times = 454), 
               rep("TMB 52wks", times = 427), 
               rep("VMB 52wks", times = 445))

## Combine survival function data, treatment arm and time
surv_time <- surv %>%
  add_column(Days=fita$time) %>%
  add_column(Treatment_Arm=treatment) %>%
  rename(survival=`fita$surv`) %>%
  arrange(Days)

## create shell with all possible days in

shell <- tibble(Days = rep(1:1944, times = 4),
                Treatment_Arm = c(rep("TMB+VMB 52wks", times = 1944), 
                                  rep("TMB 12wks,VMB 34wks", times = 1944), 
                                  rep("TMB 52wks", times = 1944), 
                                  rep("VMB 52wks", times = 1944)))

## merge on from surv time for when we have a survival value
## Add 1 for first day at each (nobody's dead yet)
## Copy survival function values down into empty rows to replicate KM curve horizontal line

full_surv <- shell %>% 
  left_join(surv_time) %>% 
  mutate(survival = if_else(Days == 1, 1, survival)) %>% ## add 1 for first day as all alive
  group_by(Treatment_Arm) %>%  ## make sure we don't copy from one trt to the next
  arrange(Days, .by_group = TRUE) %>%  ## make sure in correct analysis day order
  fill(survival) %>%  ## fills in empty survival function values
  ungroup()

testdf <- full_surv %>% 
  pivot_wider(id_cols = Days, names_from = Treatment_Arm, values_from = survival)

Flag <-  case_when(
  testdf$`TMB 12wks,VMB 34wks`> testdf$`TMB 52wks` & 
    testdf$`TMB 12wks,VMB 34wks`> testdf$`TMB+VMB 52wks` & 
    testdf$`TMB 12wks,VMB 34wks`> testdf$`VMB 52wks` ~ 3,
  
  testdf$`TMB 52wks`> testdf$`TMB 12wks,VMB 34wks` & 
    testdf$`TMB 52wks`> testdf$`TMB+VMB 52wks` & 
    testdf$`TMB 52wks`> testdf$`VMB 52wks` ~ 1,
  
  testdf$`TMB+VMB 52wks`> testdf$`TMB 12wks,VMB 34wks` & 
    testdf$`TMB+VMB 52wks`> testdf$`TMB 52wks` & 
    testdf$`TMB+VMB 52wks`> testdf$`VMB 52wks` ~ 4,
  
  testdf$`VMB 52wks`> testdf$`TMB 12wks,VMB 34wks` & 
    testdf$`VMB 52wks`> testdf$`TMB 52wks` & 
    testdf$`VMB 52wks`> testdf$`TMB+VMB 52wks` ~ 2
)



## Creating text labels for hover functionality ====
## Add in text for labels, adding extra blank space for alignment
full_surv2 <- full_surv %>% 
  mutate(text1 = paste(Treatment_Arm, round(survival, digits = 4), sep = ": ")) %>% 
  mutate(text2 = if_else(condition = Treatment_Arm == "TMB 12wks,VMB 34wks", paste(Treatment_Arm, round(survival, digits = 4), sep = ":"),
                         false = if_else(condition = Treatment_Arm == "TMB+VMB 52wks", paste(Treatment_Arm, round(survival, digits = 4), sep = ":          "),
                                         false = if_else(Treatment_Arm == "VMB 52wks", paste(Treatment_Arm, round(survival, digits = 4), sep = ":                   "),
                                                         false = paste(Treatment_Arm, round(survival, digits = 4), sep = ":                   ")))))


## Remove survival values (now in text2) and reformat data to wide format
full_surv3 <- full_surv2 %>% 
  select(-survival) %>% 
  pivot_wider(id_cols = Days, names_from = Treatment_Arm, values_from = text2) 

full_surv3$Flag <- Flag

full_surv3 <- full_surv3 %>% 
  mutate(Flag = replace_na(Flag, 99))

## Merge formatted labels (full_surv3) back to data with survival values (so all labels for each observation) 
## Combines label with /n separator to ensure each on different line
## Makes the value that is hovered over bold

full_surv4 <- full_surv2 %>% 
  left_join(full_surv3) %>% 
  mutate(text2 = paste(`TMB 52wks`,`VMB 52wks`,`TMB 12wks,VMB 34wks`,  `TMB+VMB 52wks`, sep = "\n")) %>% 
  mutate(boldtrt = paste("<b>", if_else(Flag ==1, `TMB 52wks`, 
                                        false = if_else(Flag==2, `VMB 52wks`, 
                                                        false = if_else(Flag==3, `TMB 12wks,VMB 34wks`,
                                                                        false = if_else(Flag==4,`TMB+VMB 52wks`,
                                                                                        false = "")))),"</b>", sep = "")) %>% 
  mutate(text3 = if_else(Flag == 1, paste(boldtrt, `VMB 52wks`,`TMB 12wks,VMB 34wks`,  `TMB+VMB 52wks`, sep = "\n"),
                         false = if_else(Flag == 2, paste(`TMB 52wks`,boldtrt,`TMB 12wks,VMB 34wks`,  `TMB+VMB 52wks`, sep = "\n"),
                                         false = if_else(Flag == 3, paste(`TMB 52wks`,`VMB 52wks`,boldtrt,  `TMB+VMB 52wks`, sep = "\n"),
                                                         false = if_else(Flag == 4, paste(`TMB 52wks`, `VMB 52wks`,`TMB 12wks,VMB 34wks`,  boldtrt, sep = "\n"),
                                                                         false = paste(`TMB 52wks`,`VMB 52wks`,`TMB 12wks,VMB 34wks`, `TMB+VMB 52wks`, sep = "\n"))))))

## Reorder factors so that axis will match with label order ====
full_surv4$Treatment_Arm <- as.factor(full_surv4$Treatment_Arm)

full_surv4$Treatment_Arm <- fct_relevel(full_surv4$Treatment_Arm, "TMB+VMB 52wks", "TMB 12wks,VMB 34wks", "VMB 52wks","TMB 52wks")

## Create plot ====

## Create ggplot object
plot <- ggplot(data = full_surv4, aes(x = Days, y = Treatment_Arm, text = text3)) +
  ggplot2::scale_fill_gradient2(
    low = Darkblue,
    high = Green,
    mid = 'white', 
    midpoint = 0.5,
    limits = c(0, 1),
    name = "Survival\nFunction") +
  labs(y = "Treatment Arm") +
  theme_minimal() +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())

## Create static heatmap and add titles
plot <- plot + geom_tile(aes(fill = survival))

plot <-  plot + 
  annotate(geom = "text", x = 1000, y = 5.2, label = " ", hjust = "centered", vjust = "top", size=7) + 
  annotate(geom = "text", x = 1000, y = 5.05, label = "Combining therapies leads to better outcomes after one year", hjust = "centered", vjust = "top", size=7) +
  annotate(geom = "text", x = 1000, y = 4.6, label = "...but simultaneous treatment has a better overall median survival", hjust = "centered", vjust = "top", size=6) +
  geom_segment(x=365, xend=365, y=0.5, yend=4.5, linetype='dotted', col = "black") +
  geom_segment(x=1095, xend=1095, y=0.5, yend=4.5, linetype='dashed', col = "black") +
  geom_segment(x=1825, xend=1825, y=0.5, yend=4.5, linetype='solid', col = "black") +
  annotate(geom = "text", x = 375, y = 0.4, label = "Year 1", hjust = "left", vjust = "bottom", size = 3) +
  annotate(geom = "text", x = 1105, y = 0.4, label = "Year 3", hjust = "left", vjust = "bottom", size = 3) +
  annotate(geom = "text", x = 1835, y = 0.4, label = "Year 5", hjust = "left", vjust = "bottom", size = 3) +
  annotate(geom = "text", x = 1835, y = 0.3, label = " ", hjust = "left", vjust = "bottom", size = 3) 


## Transform to plotly to add hover text

ggplotly(plot, tooltip = c("text3")) %>% layout(hoverlabel = list(align = "left"))

Back to blog

Example 5 (R)

library(tidyverse)    
library(broom)
library(survival)
library(here)

# load data
ADTTE <- read_csv(here("data", '2020-04-08-psi-vissig-adtte.csv'))


# Set up meta data for the report
#-------------------------------------------------
title <- "Evidence of improved progression-free survival for combo over monotherapy"
subtitle <- "Kaplan-Meier estimates over time including 95% uncertainty interval"
source <- "*The number of patients at risk (and events) are displayed the time point reference.
Data source: https://github.com/VIS-SIG/Wonderful-Wednesdays/tree/master/data/2020/2020-04-08"
y_axis <- "Progression free survival"
x_axis <- "Time [days]*"


############################################
## Survival model for KM and risk set
############################################

fit <- survfit(Surv(AVAL, CNSR == 0) ~ TRT01P, data = ADTTE, conf.type = 'log-log' ) 
sumfit <- summary(fit, times = c(0, 250, 500, 750, 1000, 1250, 1500, 1750, 2000)) 


# tidy KM curve by treatment for plotting
#-------------------------------------------
km <-
  survfit(Surv(AVAL, CNSR == 0) ~ TRT01P,
          data = ADTTE,
          conf.type = 'log-log')  %>%
  broom::tidy(fit) %>%
  dplyr::mutate(group = stringr::str_remove(strata, "TRT01P=")) %>%
  dplyr::mutate(group = factor(
    group,
    levels = c(
      "tablemab + vismab 52 weeks",
      "tablemab x 12 week -> vismab 34 weeks",
      "vismab x 52 weeks",
      "tablemab x 52 weeks"
    )
  ))

# tidy KM curve by treatment for small multiples
# this is a second data set to pass in for plotting
# ghost lines of each treatment 
#-------------------------------------------
km_sm <- km %>%
  mutate(group2 = group)


# Calculate risk set for annotations
#------------------------------------------
risk_data <-
  tibble(
    time =  sumfit$time,
    group = sumfit$strata,
    n.risk =   sumfit$n.risk,
    n.events = sumfit$n.event
  ) %>%
  dplyr::mutate(
    group = stringr::str_remove(group, "TRT01P="),
    label = paste0(n.risk, " (", n.events, ")"),
    y_pos = 0.01
  )


############################################
## Create the plot
############################################

plot <-
  km %>% ggplot(aes(x = time, y = estimate, group = group)) +
  ## draw the ghost lines of each treatment by facet. Force the group facet to null
  geom_step(
    data = transform(td2, group = NULL),
    aes(time, estimate, group = group2),
    size = 0.75,
    color = "#000000",
    alpha = 0.15
  ) +
  
  ## draw km lines
  geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.1, fill = "red") +
  geom_step(color = "red") +
  
  ## draw risk set
  geom_text(data = risk_data, mapping = aes(x = time, y = y_pos, label = label, group = group, fill = NULL), size = 2.5) +
  
  ## asthetics of the plot
  scale_x_continuous(breaks = c(0, 250, 500, 750, 1000, 1250, 1500, 1750, 2000)) +
  scale_y_continuous(breaks = c(0, 0.25, 0.50, 0.75, 1), limits = c(0, 1)) +

  ## annotations
  labs(title = title,
       subtitle = subtitle,
       caption = source) +
  xlab(x_axis) +
  ylab(y_axis) +
  
  # set up basic theme 
  theme_minimal(base_size = 12) +
  theme(
    panel.grid.minor = element_blank(),
    axis.title.y = element_blank(),
    legend.position = "none"
  ) +
  
  # Set the entire chart region to a light gray color
  theme(panel.background = element_rect(fill = color_background, color = color_background)) +
  theme(plot.background = element_rect(fill = color_background, color = color_background)) +
  theme(panel.border = element_rect(color = "grey", fill = NA, size = 0.35)) +
  facet_wrap(~ group, scales = 'free', ncol = 2) 

# Save plot to file
#-----------------------------------------
ggsave(
  file = here("figs", "ww-km-plot.png"),
  width = 35, height = 20, units = "cm")

Back to blog

Citation

For attribution, please cite this work as

Mallett (2020, April 9). VIS-SIG Blog: Wonderful Wednesday April 2020. Retrieved from https://graphicsprinciples.github.io/posts/2020-09-10-wonderful-wednesday-april-2020/

BibTeX citation

@misc{mallett2020wonderful,
  author = {Mallett, Rachel Phillips, Steve},
  title = {VIS-SIG Blog: Wonderful Wednesday April 2020},
  url = {https://graphicsprinciples.github.io/posts/2020-09-10-wonderful-wednesday-april-2020/},
  year = {2020}
}