Zachary Skrivanek guides through a number of data visualisations explaining a mediated treatment effect on patient reported quality of life. In addition the problem of missing data should be handled within the graphical representation. A Lollipop plot and a bar chart were presented as well as multiplot solutions using correlation plots or a combination of scatter plots, modelling plots and distribution plots. A point of discussion was the usability of a parallel coordinates plot. Another proposal used a Bayesian model displaying the results in an impressive grable – a combination of graphic and table. The last approach used an innovative way of storytelling with data called scrollytelling.
The purpose of this exercise is to develop data visualization techniques to illustrate that one variable is a mediator for a treatment effect and that this mediator effect is stronger than in other variables. A more detailed description and link to the data can be found here.
This graphic addresses the question of which variable is mediating the treatment effect the most by plotting the reduction in treatment effect after accounting for the variable. One shortcoming of lollipop charts is that they do not allow for a visulization of uncertainty. link to code
The html file can be found here.
This is a well done “grapple” - a graphic embeded in a table. This table summarizes all essential information in the data to address the mediaton effect of all three variables being considered. The posterior distrubtions are relevant and to the point. It is well annotated, uses color and bold face judiciously to highlight key findings and is organized to let the reader draw the correct conclusion without having to spell it out in the title. This graphic is ready for a journal. The question is: What type of journal? If it is a medical journal then the author might want to exclude the sensitivity analysis results (LOCF or completers) and just pick the appropriate method for handling missing data since this is not of interest to a medical audience. On the other hand if this is for a statistics journal this is quite appropriate. The author is not only showing the key results but the robustness of the results. (And part of the exercise was to account for missingness, so we cannot really blame the author for including this.)
This graphic is similar to the first one except it uses a barchart instead of a lollipop chart and it includes a completers analysis. The title explains the results and the reference line for the marginal treatment effect provides a good reference for the reader.
The parallel coordinate plot is a classic way to analyze multi-variate data. The correlation between itch and DLQI is apparent by the fact that there are so many horizontal parallel lines connecting the two axes. This is a plot where two key parameters have to be fine tuned to bring out any patterns: the order of the axes and the scaling. By separating Rx and placebo into two different panels and scaling them separately we lose the treatment effect on DLQI. They look quite similar in fact, but that is because they were scaled independently. The author chose to color by LOCF, but this revealed no clear pattern. It would be interesting to combine the two treatment groups and color by treatment to see if that creates a distinct grouping for Rx vs placebo.
The advantage of this graphic over the barchart and lollipop chart is that we can see the individual variation. But this can be a downside because it can lead to overplotting. Making this interactive where the user could select a set of lines and move the axes could make this plot very powerful.
This is a succinct summary of the correlations between all of the variables, whether they are based on all data including LOCF imputed or just completers. The use of color in addition to shave of the ellipses is a nice application of redundancy. You do not need to read a legend to realize that blue encodes a positive correlation and red encodes a negative correlation and the level of transparency encodes the magnitude of the correlation.
high-resolution image high-resolution image high-resolution image
These 3 graphics provide an exhaustive amount of information. The 3 graphics address the different ways of treating the missing data. Each graph is a matrix where the upper triangle models the data by treatment and covariate, the lower triangle shows the individual data and the diagonal shows the marginal distributions. The title clearly leads the reader to the conclusion the DLQI is highly correlated with itch and when the reader looks at the panel with two splines, one for each treatment, of DLQI against itch the reader can see that the treatment effect on DLQI is neglible once itch is included in the model.
library(tidyverse)
library(dplyr)
library(tidyr)
library(mediation)
library(grid)
library(gridExtra)
library(haven)
library(ggplot2)
library(cowplot)
data <- read_csv("/shared/175/arenv/arwork/gsk1278863/mid209676/present_2020_01/code/mediation/TRT.csv")
# Summarise data
sum <- data %>%
group_by(TRT) %>%
summarise(avg=mean(DLQI))
sort <- c(1,2)
sum <- cbind(sum, sort)
# Plot Mean DLQI
plot01 <- ggplot(data=sum) +
geom_text(aes(x=-5, y=sort, label=TRT, color=TRT), size=10) +
geom_segment(aes(x=0, xend=avg, y=sort, yend=sort, color=TRT)) +
geom_vline(aes(xintercept=0)) +
geom_point(aes(x=avg, y=sort, color=TRT), alpha=0.7, size=10) +
scale_x_continuous("Mean DLQI at Week 24",
labels=c(" ", "0", "10", "20", "30"),
breaks=c(-10, 0, 10, 20, 30),
limits=c(-10, 30)) +
scale_y_continuous(" ",
labels=c(" ", " ", " "),
breaks=c(1,2,3),
limits=c(0, 3)) +
labs(title="Rx reduces DLQI by approx. 5 units. The treatment effect is \n
reduced by approx. 4 units when Itch is added to the model. \n
Therefore the treatment effect of Rx is mediated primarily by itch.\n") +
theme_minimal() +
theme(legend.position="none",
text = element_text(size = 20),
axis.ticks.x = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
axis.title.x = element_text(size = 20),
plot.title = element_text(hjust = 0.5, size = 25),
panel.border = element_rect(colour = "black", fill=NA, size=0.25),
panel.grid = element_blank())
# Mediation analysis (itch)
model.I <- lm(itch ~ TRT, data)
model.Yi <- lm(DLQI ~ TRT + itch, data)
med.itch <- mediate(model.I, model.Yi, treat='TRT', mediator='itch',
boot=TRUE, sims=500)
itch <- (med.itch$d0)*(-1)
# Mediation analysis (BSA)
model.B <- lm(BSA ~ TRT, data)
model.Yb <- lm(DLQI ~ TRT + BSA, data)
med.BSA <- mediate(model.B, model.Yb, treat='TRT', mediator='BSA',
boot=TRUE, sims=500)
BSA <- (med.BSA$d0)
# Mediation analysis (redness)
model.R <- lm(redness ~ TRT, data)
model.Yr <- lm(DLQI ~ TRT + redness, data)
med.redness <- mediate(model.R, model.Yr, treat='TRT', mediator='redness',
boot=TRUE, sims=500)
redness <- (med.redness$d0)
# Combine ACME values
sort <- c(1, 2, 3)
V1 <- " "
A <- data.frame(cbind(itch, BSA, redness))
A2 <- gather(A, var, val)
A3 <- cbind(A2, sort, V1)
# Overall treatment effect
var = "Effect"
sort = 1
coeff <- data.frame(cbind(summary(model.0 <- lm(DLQI ~ TRT, data))$coefficients[2,1]*(-1), var)) %>%
mutate(val=as.numeric(V1))
coeff2 <- cbind (coeff, sort)
# Plot treatment effect
plot02 <- ggplot() +
geom_vline(aes(xintercept=0)) +
geom_text(data=coeff2, aes(x=-0.75, y=sort), label="Rx Effect", color="blue", size=10) +
geom_segment(data=coeff2, aes(x=0, xend=val, y=sort, yend=sort), color="blue") +
geom_point(data=coeff2, aes(x=val, y=sort), color="Blue", alpha=0.7, size=10) +
scale_x_continuous("Treatment Effect (placebo - Rx)",
labels=c(" ", "0", "1", "2", "3", "4", "5", "6"),
breaks=c(-1, 0, 1, 2, 3, 4, 5, 6),
limits=c(-1, 6)) +
scale_y_continuous(" ",
labels=c(" "),
breaks=c(1),
limits=c(1)) +
theme_minimal() +
theme(text = element_text(size = 20),
axis.ticks.x = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
axis.title.x = element_text(size = 20),
plot.title = element_text(hjust = 0.5, size = 25),
panel.border = element_rect(colour = "black", fill=NA, size=0.25),
panel.grid = element_blank(),
plot.caption=element_text(hjust = 0))
# Plot mediation effect
plot03 <- ggplot() +
geom_text(data=A3, aes(x=-0.75, y=sort, label=var), color="black", size=10) +
geom_segment(data=A3, aes(x=0, xend=val, y=sort, yend=sort)) +
geom_point(data=A3, aes(x=val, y=sort), color="black", alpha=0.7, size=10, shape=1) +
scale_x_continuous("Reduction in Treatment Effect (Mediation Effect)",
labels=c(" ", "0", "1", "2", "3", "4", "5", "6"),
breaks=c(-1, 0, 1, 2, 3, 4, 5, 6),
limits=c(-1, 6)) +
scale_y_continuous(" ",
labels=c(" ", " ", " "),
breaks=c(1, 2, 3),
limits=c(0, 4)) +
theme_minimal() +
theme(text = element_text(size = 20),
axis.ticks.x = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
axis.title.x = element_text(size = 20),
plot.title = element_text(hjust = 0.5, size = 25),
panel.border = element_rect(colour = "black", fill=NA, size=0.25),
panel.grid = element_blank(),
plot.caption=element_text(hjust = 0)) +
ggtitle(label = " ")
p <- plot_grid(plot01, plot02, plot03, align = "v", nrow = 3, rel_heights = c(1.5, 0.6, 1.2))
ggsave("/shared/175/arenv/arwork/gsk1278863/mid209676/present_2020_01/code/mediation/DLQI_mediation_mallett.png", p, width=12, height=12, dpi=300)
The code can be found here.
Note that this is an R markdown file and you might need a proper editor to open it.
A txt file with the code can be found here.
# Baplots to show the reduction in treatment effect on DLQI.
# ==========================================================
# Read in the data set:
dat <- read.csv("mediation_data.csv")
fit <- lm(DLQI ~ TRT, data = dat)
t.val.pure <- coef(summary(fit))[2, 3]
t.val.vec <- numeric(3)
j <- 1
for (i in c(2, 4, 6)) {
fit <- lm(dat$DLQI ~ dat[, i] + dat$TRT)
t.val.vec[j] <- coef(summary(fit))[3, 3]
j <- j + 1
}
# Redo without imputed data:
t.val.vec.re <- numeric(3)
j <- 1
for (i in c(2, 4, 6)) {
dat.re <- dat[which(dat[, i+1] == F), ]
fit <- lm(dat.re$DLQI ~ dat.re[, i] + dat.re$TRT)
t.val.vec.re[j] <- coef(summary(fit))[3, 3]
j <- j + 1
}
# Combine both vectors:
t.vals <- c(t.val.vec.re[3], t.val.vec[3],
t.val.vec.re[2], t.val.vec[2],
t.val.vec.re[1], t.val.vec[1])
# Calculate difference:
t.vals - t.val.pure
# Calculate the difference:
t.val.diff <- t.vals - t.val.pure
t.val.mat <- matrix(t.val.diff, ncol = 3)
fit <- lm(DLQI ~ TRT, dat)
t.val.mat.pr <- t.val.mat/abs(coef(summary(fit))[2, 3]) * 100
png("barplot.png", width = 7, height = 5, res = 300, units = "in")
par(xpd = T, cex.main = 0.9)
barplot(t.val.diff, horiz = T, col = c("darkcyan", "blue"), xlim = c(0, 5),
space = rep(c(0.25, 0), 3),
main = "Adjusting for itch leads to the greatest reduction
in the absolute standardized treatment effect on DLQI (unadjusted effect: 4.7).
Removing the LOCF imputed data diminishes the differences.",
xlab = "Reduction in the absolute standardized treatment effect")
y.coord <- c(1.25, 3.5, 5.75)
text(-0.25, y.coord[3], "itch")
text(-0.25, y.coord[2], "BSA")
text(-0.35, y.coord[1], "redness")
for (i in 1:nrow(t.val.mat)) {
for (j in 1:ncol(t.val.mat)) {
t.val <- paste0(format(round(t.val.mat[i, j], 1), nsmall = 1), " (",
format(round(t.val.mat.pr[i, j], 1), nsmall = 1), "%)")
text(t.val.mat[i, j] + .45, y.coord[j] + i - 1.5, t.val,
col = c("darkcyan", "blue")[i])
}
}
legend("topright", legend = c("LOCF", "Observed"), fill = c("blue", "darkcyan"), bty = "n")
dev.off()
## function to scale data
scale_this <- function(x) {
(x - mean(x, na.rm = TRUE)) / sd(x, na.rm = TRUE)
}
library(tidyverse)
## read in data
final_in <- read_csv("mediation_data.csv")
## check data
final_in %>% glimpse()
## scale data and put in long format
data <-
final_in %>%
select(!c("itch_LOCF", "BSA_LOCF", "redness_LOCF", "DLQI_LOCF")) %>%
mutate(id = row_number(),
itch = scale_this(itch),
BSA = scale_this(BSA),
redness = scale_this(redness),
DLQI = scale_this(DLQI)) %>%
pivot_longer(!c(TRT, id), names_to = "var", values_to = "val")
### add an indicator for LOCF variables
missing <-
final_in %>%
select(c("TRT", "itch_LOCF", "BSA_LOCF", "redness_LOCF", "DLQI_LOCF")) %>%
mutate(id = row_number()) %>%
pivot_longer(!c(TRT, id), names_to = "var", values_to = "LOCF") %>%
mutate(var = str_remove(var, "_LOCF"))
## join to main data set
data <-
data %>%
left_join(missing)
## check data set
data %>% glimpse()
## check LOCF vals
table(data$LOCF)
## plot data
data %>%
mutate(
name = fct_relevel(var,
"DLQI", "itch", "redness",
"BSA"),
TRT = fct_relevel(TRT, "Rx", "placebo")
) %>%
ggplot(aes(
x = name,
y = val,
group = id,
colour = LOCF
)) +
geom_hline(yintercept = 0, colour = "black", alpha = 0.4, size = 1.1) +
geom_point(alpha = 0.7, size = 0.5) +
geom_line(alpha = 0.25, size = 0.5) +
labs(title = "Relationship between outcome, treatment and LOCF imputation",
subtitle = "Measurements are scaled (lower is a better) by outcome",
caption = "\n The solid black line at zero represents the mean outcome (irrespective of treatment).\nA larger number of patients reported a better BSA and redness profile in the Rx arm.\nThe missing data pattern differs across groups, requiring further investigation.") +
xlab("") +
ylab("") +
facet_wrap( ~ TRT, ncol = 1) +
theme_minimal() +
theme(legend.position = "bottom")
## Save plot
page_width <- 200
page_height <- 150
d_dpi <- 300
ggsave(file = paste0("parallel_coords.png"),
width = page_width, height = page_height,
units = "mm", dpi = d_dpi)
#########################################
## Warning this code requires a re-factor
## and put repeated steps in to a function
#########################################
library(corrplot)
library(tidyverse)
## read in data
final_in <- read_csv("mediation_data.csv")
#plot on one page
par(mfrow = c(2, 3))
par(cex = 0.75)
##-----------------------------------------------------
## Overall correlations
title <- "How do all outcomes relate overall?"
corrs <- final_in %>%
dplyr::select("itch", "BSA", "redness", "DLQI") %>%
filter(complete.cases(.)) %>%
dplyr::mutate_all(as.numeric)
M <- cor(corrs)
col <-
colorRampPalette(c("#BB4444", "#EE9988", "#FFFFFF", "#77AADD", "#4477AA"))
corrplot(
M,
method = "ellipse",
col = col(200), tl.cex = 1/par("cex"),
type = "upper",
order = "hclust",
number.cex = .7,
title = title,
addCoef.col = "black",
# Add coefficient of correlation
tl.col = "black",
tl.srt = 90,
# Text label color and rotation
# hide correlation coefficient on the principal diagonal
diag = FALSE,
mar = c(0, 0, 3, 0)
)
##-----------------------------------------------------
### - By Rx arm
title <- "How do all outcomes relate within Rx?"
corrs <- final_in %>%
filter(TRT == "Rx") %>%
dplyr::select("itch", "BSA", "redness", "DLQI") %>%
dplyr::mutate_all(as.numeric)
M <- cor(corrs)
col <-
colorRampPalette(c("#BB4444", "#EE9988", "#FFFFFF", "#77AADD", "#4477AA"))
corrplot(
M,
method = "ellipse",
col = col(200), tl.cex = 1/par("cex"),
type = "upper",
order = "hclust",
number.cex = .7,
title = title,
addCoef.col = "black",
# Add coefficient of correlation
tl.col = "black",
tl.srt = 90,
diag = FALSE,
mar = c(0, 0, 3, 0)
)
##-----------------------------------------------------
## By Placebo
corrs <- final_in %>%
filter(TRT == "placebo") %>%
dplyr::select("itch", "BSA", "redness", "DLQI") %>%
dplyr::mutate_all(as.numeric)
M <- cor(corrs)
col <-
colorRampPalette(c("#BB4444", "#EE9988", "#FFFFFF", "#77AADD", "#4477AA"))
title <- "How do all outcomes relate within Placebo?"
corrplot(
M,
method = "ellipse",
col = col(200), tl.cex = 1/par("cex"),
type = "upper",
order = "hclust",
number.cex = .7,
title = title,
addCoef.col = "black",
# Add coefficient of correlation
tl.col = "black",
tl.srt = 90,
# Text label color and rotation
# hide correlation coefficient on the principal diagonal
diag = FALSE,
mar = c(0, 0, 3, 0)
)
##-----------------------------------------------------
## Overall and complete cases
corrs <- final_in %>%
filter(itch_LOCF == FALSE &
BSA_LOCF == FALSE & redness_LOCF == FALSE & DLQI_LOCF == FALSE) %>%
filter(complete.cases(.)) %>%
dplyr::select("itch", "BSA", "redness", "DLQI") %>%
dplyr::mutate_all(as.numeric)
M <- cor(corrs)
col <-
colorRampPalette(c("#BB4444", "#EE9988", "#FFFFFF", "#77AADD", "#4477AA"))
title <-
"How do all outcomes relate overall\n(excluding patients with imputed data)?"
corrplot(
M,
method = "ellipse",
col = col(200), tl.cex = 1/par("cex"),
type = "upper",
order = "hclust",
number.cex = .7,
title = title,
addCoef.col = "black",
# Add coefficient of correlation
tl.col = "black",
tl.srt = 90,
# Text label color and rotation
# hide correlation coefficient on the principal diagonal
diag = FALSE,
mar = c(0, 0, 3, 0)
)
##-----------------------------------------------------
### By Rx and complete cases
corrs <- final_in %>%
filter(TRT == "Rx") %>%
filter(itch_LOCF == FALSE &
BSA_LOCF == FALSE & redness_LOCF == FALSE & DLQI_LOCF == FALSE) %>%
filter(complete.cases(.)) %>%
dplyr::select("itch", "BSA", "redness", "DLQI") %>%
dplyr::mutate_all(as.numeric)
M <- cor(corrs)
col <-
colorRampPalette(c("#BB4444", "#EE9988", "#FFFFFF", "#77AADD", "#4477AA"))
title <- "How do all outcomes relate within Rx?\n(Complete cases)"
corrplot(
M,
method = "ellipse",
col = col(200), tl.cex = 1/par("cex"),
type = "upper",
order = "hclust",
number.cex = .7,
title = title,
addCoef.col = "black",
# Add coefficient of correlation
tl.col = "black",
tl.srt = 90,
# Text label color and rotation
# hide correlation coefficient on the principal diagonal
diag = FALSE,
mar = c(0, 0, 3, 0)
)
##-----------------------------------------------------
### Placebo and complete cases
corrs <- final_in %>%
filter(TRT == "placebo") %>%
filter(itch_LOCF == FALSE &
BSA_LOCF == FALSE & redness_LOCF == FALSE & DLQI_LOCF == FALSE) %>%
filter(complete.cases(.)) %>%
dplyr::select("itch", "BSA", "redness", "DLQI") %>%
dplyr::mutate_all(as.numeric)
M <- cor(corrs)
col <-
colorRampPalette(c("#BB4444", "#EE9988", "#FFFFFF", "#77AADD", "#4477AA"))
title <-
"How do all outcomes relate within placebo?\n(Complete cases)"
corrplot(
M,
method = "ellipse",
col = col(200), tl.cex = 1/par("cex"),
type = "upper",
order = "hclust",
number.cex = .7,
title = title,
addCoef.col = "black",
# Add coefficient of correlation
tl.col = "black",
tl.srt = 90,
# Text label color and rotation
# hide correlation coefficient on the principal diagonal
diag = FALSE,
mar = c(0, 0, 3, 0)
)
library(tidyverse)
library(ggforce)
## Save plot
page_width <- 350
page_height <- 250
d_dpi <- 400
## read in data
final_in <- read_csv("mediation_data.csv")
## Plot overall
ggplot(final_in, aes(x = .panel_x, y = .panel_y, colour = TRT, fill = TRT)) +
geom_autopoint(alpha = 0.6) +
geom_autodensity(alpha = 0.2) +
geom_smooth(method = lm, formula = y ~ splines::bs(x, 3)) +
facet_matrix(vars(DLQI, itch, redness, BSA), layer.diag = 2, layer.upper = 3,
grid.y.diag = FALSE) +
labs(title = "Itch has a strong positive association with DLQI.",
subtitle = "There is evidence that the DLQI treatment effect is mediated through itch.",
caption = "\n Cubic splines are presented in the top-right layer regressing y on x.\nScatter-plots of the same relationship are displayed in the bottom-left later.\nThe marginal distribution by treatment are displayed on the diagonal.") +
theme_light(base_size = 14) +
theme(legend.position = "bottom")
## save plot
ggsave(file = paste0("scatter-matrix-final.png"),
width = page_width, height = page_height,
units = "mm", dpi = d_dpi)
library(tidyverse)
library(ggforce)
## Save plot
page_width <- 350
page_height <- 250
d_dpi <- 400
## read in data
final_in <- read_csv("mediation_data.csv")
## Plot overall
ggplot(final_in, aes(x = .panel_x, y = .panel_y, colour = TRT, fill = TRT)) +
geom_autopoint(alpha = 0.6) +
geom_autodensity(alpha = 0.2) +
geom_smooth(method = lm, formula = y ~ splines::bs(x, 3)) +
facet_matrix(vars(DLQI, itch, redness, BSA), layer.diag = 2, layer.upper = 3,
grid.y.diag = FALSE) +
labs(title = "Investigating the assocation between outcomes and treatment",
subtitle = "With LOCF imputation",
caption = "\n Cubic splines are presented in the top-right layer regressing y on x.\nScatter-plots of the same relationship are displayed in the bottom-left later.\nThe marginal distribution by treatment are displayed on the diagonal.") +
theme_light(base_size = 14) +
theme(legend.position = "bottom")
## save plot
ggsave(file = paste0("scatter-matrix-locf.png"),
width = page_width, height = page_height,
units = "mm", dpi = d_dpi)
## Plot complete cases
final_in %>%
filter(itch_LOCF == FALSE & BSA_LOCF == FALSE & redness_LOCF == FALSE & DLQI_LOCF == FALSE) %>%
ggplot(aes(x = .panel_x, y = .panel_y, colour = TRT, fill = TRT)) +
geom_autopoint(alpha = 0.6) +
geom_autodensity(alpha = 0.2) +
geom_smooth(method = lm, formula = y ~ splines::bs(x, 3)) +
facet_matrix(vars(DLQI, itch, redness, BSA), layer.diag = 2, layer.upper = 3,
grid.y.diag = FALSE) +
labs(title = "Investigating the assocation between outcomes and treatment",
subtitle = "Complete cases analysis",
caption = "\n Cubic splines are presented in the top-right layer regressing y on x.\nScatter-plots of the same relationship are displayed in the bottom-left later.\nThe marginal distribution by treatment are displayed on the diagonal.") +
theme_light(base_size = 14) +
theme(legend.position = "bottom")
## save plot
ggsave(file = paste0("scatter-matrix-cc.png"),
width = page_width, height = page_height,
units = "mm", dpi = d_dpi)
For attribution, please cite this work as
Skrivanek (2020, Nov. 11). VIS-SIG Blog: Wonderful Wednesdays November 2020. Retrieved from https://graphicsprinciples.github.io/posts/2020-11-10-wonderful-wednesdays-november-2020/
BibTeX citation
@misc{skrivanek2020wonderful, author = {Skrivanek, Zachary}, title = {VIS-SIG Blog: Wonderful Wednesdays November 2020}, url = {https://graphicsprinciples.github.io/posts/2020-11-10-wonderful-wednesdays-november-2020/}, year = {2020} }