My scripts:
About myself: www.linkedin.com/in/ThomasWs-Mopfair
“Wonderful Wednesdays” are a monthly webinar organised by the
Visualisation Special Interest Group of Statisticians in the
Pharmaceutical Industry (PSI).
This script was written for the VIS-SIG Wonderful Wednesdays challenge
13 Nov 24 (https://github.com/VIS-SIG/Wonderful-Wednesdays/tree/master/data/2024/2024-10-09).
The task involved summary data of a randomized study, and consisted of
suggesting a better visualisation than the example chart.
# - r libraries and data - #
library(dplyr) # version 1.1.4 # general grammar
library(purrr) # version 1.0.2 # set_names, list_rbind
library(tidyr) # version 1.3.1 # pivot_longer, pivot_wider
library(ggpubr) # version 0.6.0, includes ggplot2 3.4.0 # stat_compare_means, ggtexttable, ggarrange
library(ggrepel) # version 0.9.6 # geom_text_repel
library(RColorBrewer) # versin 1.1-3 # scale_color_brewer (optional)
exampleData <- read.csv("https://raw.githubusercontent.com/VIS-SIG/Wonderful-Wednesdays/refs/heads/master/data/2024/2024-10-09/WW_Nov2024.csv")
exampleData
## TREATMENT X1 X2 X3 X4 X5 X6 X7
## 1 1 6.2 5.7 5.2 4.8 4.1 6.0 5.2
## 2 2 2.2 3.8 4.0 4.2 4.7 2.3 2.8
## 3 3 5.5 4.6 4.2 5.3 5.4 6.7 4.3
## 4 4 6.1 6.4 7.3 7.2 8.0 5.8 5.1
## 5 5 7.0 7.2 8.3 8.5 9.3 8.1 6.4
## 6 6 8.5 8.1 9.0 8.3 7.4 6.3 5.6
## 7 7 2.5 2.1 1.8 2.3 4.1 9.4 10.0
## 8 8 7.6 8.1 7.4 7.9 6.1 5.5 5.7
Additional information provided for the challenge:
# - Chunk_1 - #
# - Requires R-libraries and object exampleData (r libraries and data) - #
exampleData_int <-
exampleData %>%
set_names(
"Treatment",
paste(c(1:7), sep = ", " )
) %>%
# Add a column with differences between the medians of disease activity at visits before and after the intervention
mutate(
Change_intervention =
apply(., 1, function(row) {
median(row[c("6", "7")]) -
median(row[c("1", "2", "3", "4", "5")])
}
)
) %>%
# Optional: replace treatment numbers with letters, to differentiate from the index for visits
mutate(
Treatment = LETTERS[1:8]
) %>%
# Re-arrange dataframe
arrange (Change_intervention) %>%
pivot_longer(
cols = paste(1:7),
names_to = "Visit",
values_to = "Mean_Disease_Index"
) %>%
mutate(.after = Visit,
Visit = as.numeric(Visit),
Intervention = ordered(
rep(
c(rep("before", 5), rep("after", 2)),
8),
levels = c("before", "after")
),
Treatment = factor(
Treatment,
levels = unique(Treatment)
)
)
# Plot
p_visits <-
exampleData_int %>%
mutate(facet_strip = "") %>% # Empty facet label for aligning the y-axis scales of the two subplots
ggplot(
aes(
x = Visit,
y = Mean_Disease_Index,
color = Treatment
)
) +
scale_color_brewer(palette="Dark2") +
geom_text_repel(
data = subset(
exampleData_int,
Visit == "7" & (Treatment %in% LETTERS[1:8] )
),
aes(label = Treatment),
nudge_x = 0.5,
color = "black",
size = 5
) +
facet_grid(.
~ facet_strip
) +
geom_line (lwd = 1.5, alpha = 0.6) +
scale_x_continuous(
breaks = 1:7,
minor_breaks = NULL,
expand = c(0, 0), limits = c(0.5, 7.5),
labels = c(paste(1:6), "7\n")
) +
scale_y_continuous(
breaks = seq(0, 10, 2),
minor_breaks = NULL,
expand = c(0, 0), limits = c(0, 10.5)
) +
geom_vline(
aes( xintercept = 5.5),
linetype = "dashed",
col = "blue"
) +
annotate(
"rect",
xmin = 5.4, xmax = 5.6,
ymin = -Inf, ymax=Inf,
fill="white", alpha=0.6) +
annotate(
"text",
x = 5.5, y = 0.5,
label = "Intervention",
size = 4,
color = "blue"
) +
theme_light() +
theme(
legend.position = "none",
strip.background = element_blank(),
strip.text.x=element_text(
colour="black",
size = 12
),
) +
labs(
y = "Mean Disease Index Score",
title = "Disease Activity vs. Visit",
vjust = -0.1
)
p_intervention <-
exampleData_int %>%
ggplot(
aes(
x = Intervention,
y = Mean_Disease_Index,
color = Treatment,
)
) +
scale_color_brewer(palette="Dark2") +
scale_y_continuous( # Same for both subplots
breaks = seq(0, 10, 2),
minor_breaks = NULL,
expand = c(0, 0),
limits = c(0, 10.5)
) +
scale_x_discrete(
labels = c("before", "\nafter")
)+
facet_grid(.
~ Treatment
) +
geom_point(
size = 3
) +
stat_summary(
fun = median,
shape = 5,
size = 0.8,
color = "darkgrey"
) +
stat_summary(
fun = median,
group = 1,
geom = "path",
color = "darkgrey", alpha = 0.5,
lwd = 1.3
) +
theme_light() +
theme(
axis.title.y = element_blank(),
axis.text.y = element_blank(),
strip.background = element_blank(),
strip.text.x=element_text(
colour="black",
size = 12
),
legend.position = "none"
) +
labs(
title = "Change following Intervention",
vjust = -0.1
)
ggarrange(
p_visits,
p_intervention,
ncol = 2,
widths = c(0.6, 1)
)
The original data used numbers for indexing visits (X1-X7) as well as treatment groups (1-8). For better distinction, I changed the names of the treatment groups to upper case letters (A = 1, B = 2, … H = 8).
“Mean Disease Index vs Visit” (left subplot)
Disease activity varied widely between groups and visits, in the periods
before and after the intervention. There was a sharp rise in disease
activity in group “G” after the intervention, and a notable drop in
group “F”. However, both of these trends seemed to have started already
at visits 4 and 5.
“Change following Intervention” (right subplot)
Nevertheless, I added a summary of mean disease index scores before and
after intervention. My assumption was that visits 1-5 were meant to
establish a baseline value, to which levels at visits 6 and 7 were to be
compared. Median values of the mean disease indices before and after
intervention are shown as grey open diamonds, their changes are
highlighted by grey lines. Treatment groups are shown left to right in
order of greatest decrease (group “F”, left) to greatest the increase
(group “G”, right).
# - Chunk_2 - #
# - Requires R-libraries (r libraries and data) and object exampleData_int (chunk_1) - #
## Additional parameters, not supplied with the challenge
# Intervals at which visits 1-7 took place (e.g. weeks)
visitIntervals <- c(0, 2, 4, 8, 12, 16, 24)
interventionTime <- mean (c(visitIntervals[5], visitIntervals[6]))
# Clinically meaningful target for mean disease index score
diseaseScore_target <- 3.5
## Plot
exampleData_int <-
exampleData_int %>%
mutate(
Time_of_visits = rep(visitIntervals, 8)
)
p_visits <-
exampleData_int %>%
mutate(facet_strip = "") %>% # Empty facet label for aligning the y-axis scales of the two subplots
ggplot(
aes(
x = Time_of_visits,
y = Mean_Disease_Index,
color = Treatment
)
) +
scale_color_brewer(palette="Dark2") +
geom_text_repel(
data = subset(
exampleData_int,
Time_of_visits ==
max(visitIntervals) & (Treatment %in% LETTERS[1:8]
)
),
aes(label = Treatment),
nudge_x = 0.5,
color = "black",
size = 5
) +
annotate(
"rect",
xmin = -Inf, xmax = Inf,
ymin = diseaseScore_target, ymax=Inf,
fill="lightgrey", alpha=0.2) +
facet_grid(.
~ facet_strip
) +
geom_line (lwd = 1.5, alpha = 0.6) +
scale_x_continuous(
breaks = visitIntervals,
minor_breaks = NULL,
expand = c(0, 0),
limits =
range(visitIntervals) +
max(visitIntervals)/14 * c(-1, 1),
labels = c(
visitIntervals[1:6], paste0(visitIntervals[7],"\n") )
) +
scale_y_continuous(
breaks = seq(0, 10, 2),
minor_breaks = NULL,
expand = c(0, 0), limits = c(0, 10.5)
) +
geom_vline(
aes(xintercept = interventionTime ),
linetype = "dashed",
col = "blue"
) +
annotate(
"text",
x = 0.67 - max(visitIntervals)/14,
y = 1.8,
label = "Target Range",
size = 4,
angle = 90,
color = "blue"
) +
annotate(
"rect",
xmin = interventionTime - 0.1,
xmax = interventionTime + 0.1,
ymin = -Inf, ymax=Inf,
fill="white", alpha=0.6) +
annotate(
"text",
x = interventionTime,
y = 0.5,
label = "Intervention",
size = 4,
color = "blue"
) +
theme_light() +
theme(
legend.position = "none",
strip.background = element_blank(),
strip.text.x=element_text(
colour="black",
size = 12
),
) +
labs(
y = "Mean Disease Index Score",
x = "Time of Visit [week]",
title = "Disease Activity vs. Time of Visit",
vjust = -0.1
)
p_intervention <-
exampleData_int %>%
ggplot(
aes(
x = Intervention,
y = Mean_Disease_Index,
color = Treatment,
)
) +
scale_color_brewer(palette="Dark2") +
annotate(
"rect",
xmin = -Inf, xmax = Inf,
ymin = 3.5, ymax=Inf,
fill="lightgrey", alpha=0.2) +
scale_y_continuous( # Same for both subplots
breaks = seq(0, 10, 2),
minor_breaks = NULL,
expand = c(0, 0),
limits = c(0, 10.5)
) +
scale_x_discrete(
labels = c("before", "\nafter")
)+
facet_grid(.
~ Treatment
) +
geom_point(
size = 3
) +
stat_summary(
fun = median,
shape = 5,
size = 0.8,
color = "darkgrey"
) +
stat_summary(
fun = median,
group = 1,
geom = "path",
color = "darkgrey", alpha = 0.5,
lwd = 1.3
) +
theme_light() +
theme(
axis.title.y = element_blank(),
axis.text.y = element_blank(),
strip.background = element_blank(),
strip.text.x=element_text(
colour="black",
size = 12
),
legend.position = "none"
) +
labs(
title = "Change following Intervention",
vjust = -0.1
)
ggarrange(
p_visits,
p_intervention,
ncol = 2,
widths = c(0.7, 1)
)
In this version of the figure, I added two parameters that were not
supplied with the example data but which could improve the figure:
The first was a hypothetical target range for the disease index. The aim of treatment/intervention in this scenario would have been to keep index scores below the values in the shaded areas of the plots. For the target shown here, the intervention achieved a meaningful benefit only in group “B”, whereas group “G” became significantly worse.
My second additional parameter were the intervals at which the visits
were scheduled. This should give a better impression of the variability
of the disease index over time, in particular if the visits were not
evenly spaced.
My difficulty with this task was understanding the data, and what messages the figure should convey. It was not clear whether the treatment and/or intervention had a meaningful effect, except for the worsening of disease in group “G”.
Was the “intervention” given between visits 5 and 6 the initiation of treatments A-H? Or were these treatments started before the first visit, followed by an additional intervention that was identical for all groups? If treatments were initiated between visits 5 and 6, and mean disease indices at prior visits represented the baselines for supposedly randomised groups, their large variability should be a concern. Alternatively, the study could have been undertaken to assess the effect of a common intervention in patients on different treatments. However, there were no suggestions of this in the supplied data and “bad” figure.
Redrawing the figure was further limited by having only basic summary
data as a source, i.e., mean scores without even margins of error. I
therefore compared the medians of those means before and after
“intervention” although this is probably an inappropriate method.
Far better would have been an analysis of covariance 1.
However, this would have required patient-level data.
Statistics
notes: Analysing controlled trials with baseline and follow up
measurements.
Vickers AJ, Altman DG. BMJ. 2001 Nov 10;323(7321):1123-4.
… more bad chards