Code
library(ggplot2)
library(gridExtra)
library(reshape2)
library(swt)
library(readxl)
library(testit)
library(rms)Reproducible, high-quality statistical reporting.
swt.
This cookbook introduces the Swisstransplant R package swt to make high-quality reports and publication-ready graphics in our in-house style.
The source code for swt is maintained on GitHub and can be accessed here. Additionally, a package manual is available for reference.
library(ggplot2)
library(gridExtra)
library(reshape2)
library(swt)
library(readxl)
library(testit)
library(rms)The swt package includes the Swisstransplant template for creating Quarto documents. Quarto documents enable dynamic reporting and unite written text, statistical programming code, results, tables, and figures into a reproducible document.
To create a new Swisstransplant project in RStudio, click New Project…, select New Directory, and then choose Swisstransplant Document from the bottom of the list.
The Swisstransplant Quarto document is written in Markdown; an overview can be found in the Markdown Basics. It is possible to add Callout Blocks to highlight important information.
::: {.callout-note appearance="simple"}
Note that there are five types of callouts, including:
`note`, `warning`, `important`, `tip`, and `caution`.
:::Note that there are five types of callouts: note, warning, important, tip, and caution.
Something important.
The Quarto document introduces a simple data analysis workflow structured into eight chapters, each using second-level headings (##). The chapters are as follows:
Each chapter can be further expanded to support more complex analysis projects, as illustrated in the flowchart below.
Quarto documents that follow a structured analysis workflow enhance transparency, support reproducibility, and build trust in statistical analyses.
flowchart TB O(Objectives) --> I(Data import) I --> P(Data processing) P --> Q(Quality control) Q --> D(Descriptive statistics) D --> A(Primary analysis) A --> S(Secondary analyses) S --> C(Computing information) O --- OC[Study objectives<br>Primary outcome<br>Secondary outcomes<br>Methods] I --- IC[Custom R functions<br>Load libraries<br>Two-pass reading when necessary<br>Data class conversion<br>Data cleaning<br>Reshape<br>Merge<br>Aggregate] P --- PC[Data overview<br>Definition of outcomes<br>Consent status<br>Inclusion, exclusion and subsets<br>Missing data and imputations<br>Transform<br>Matching<br>Create analysis data set] Q --- QC[Data checks<br>Assert logical expressions<br>Variable distributions] D --- DC[Sample size<br>Table 1<br>Consort Diagram] A --- AC[Results<br>Tables<br>Figures] S --- SC[Results<br>Tables<br>Figures] C --- CC[Runtime<br>Environment<br>Package names and versions]
To support professional-looking output, the swt package provides a custom color palette and styling for creating ggplot2 figures.
The color scheme and ggplot2 styling are used in the official Swisstransplant national statistics to provide a professional and consistent visual presentation.
The swt_colors() function creates an object for user-friendly access to the Swisstransplant color scheme:
col = swt_colors()
col$blue.darkThe following colors are available:
col = swt_colors()
par(mfrow=c(1,1), mai=c(0.5,0.1,0.2,0.1))
barplot(rep(1,12), axes=FALSE, col=c(col$blue.dark,
col$blue.alt,
col$turkis.tpx,
col$yellow.donation,
col$strongred.akzent,
col$pink.heart,
col$red.liver,
col$darkyellow.kidney,
col$green.pancreas,
col$lightblue.lungs,
col$beige.intestine,
col$purple.alt
),
names.arg = c("blue\ndark", "blue\nalt", "turkis\ntpx", "yellow\ndonat",
"strongr\nakzent", "pink\nheart", "red\nliver", "darkylw\nkidney",
"green\npancr", "lightb\nlungs", "beige\nintest", "purple\nalt"),
cex.names = 0.8
)The swt_colors object also includes single hue palettes with three additional color strengths: 75%, 50%, and 25%. Color palettes can be accessed as follows:
col$pal.blue.dark[1] # 100%, same as col$blue.swt
col$pal.blue.dark[2] # 75%
col$pal.blue.dark[3] # 50%
col$pal.blue.dark[4] # 25%par(mfrow=c(12,1), mai=c(0.1,0.1,0.2,0.1))
barplot(rep(1,4), axes=FALSE, col=c(col$pal.blue.swt[1],
col$pal.blue.swt[2],
col$pal.blue.swt[3],
col$pal.blue.swt[4])
)
barplot(rep(1,4), axes=FALSE, col=c(col$pal.blue.alt[1],
col$pal.blue.alt[2],
col$pal.blue.alt[3],
col$pal.blue.alt[4])
)
barplot(rep(1,4), axes=FALSE, col=c(col$pal.turkis.tpx[1],
col$pal.turkis.tpx[2],
col$pal.turkis.tpx[3],
col$pal.turkis.tpx[4])
)
barplot(rep(1,4), axes=FALSE, col=c(col$pal.yellow.donation[1],
col$pal.yellow.donation[2],
col$pal.yellow.donation[3],
col$pal.yellow.donation[4])
)
barplot(rep(1,4), axes=FALSE, col=c(col$pal.strongred.akzent[1],
col$pal.strongred.akzent[2],
col$pal.strongred.akzent[3],
col$pal.strongred.akzent[4])
)
barplot(rep(1,4), axes=FALSE, col=c(col$pal.pink.heart[1],
col$pal.pink.heart[2],
col$pal.pink.heart[3],
col$pal.pink.heart[4])
)
barplot(rep(1,4), axes=FALSE, col=c(col$pal.red.liver[1],
col$pal.red.liver[2],
col$pal.red.liver[3],
col$pal.red.liver[4])
)
barplot(rep(1,4), axes=FALSE, col=c(col$pal.darkyellow.kidney[1],
col$pal.darkyellow.kidney[2],
col$pal.darkyellow.kidney[3],
col$pal.darkyellow.kidney[4])
)
barplot(rep(1,4), axes=FALSE, col=c(col$pal.green.pancreas[1],
col$pal.green.pancreas[2],
col$pal.green.pancreas[3],
col$pal.green.pancreas[4])
)
barplot(rep(1,4), axes=FALSE, col=c(col$pal.lightblue.lungs[1],
col$pal.lightblue.lungs[2],
col$pal.lightblue.lungs[3],
col$pal.lightblue.lungs[4])
)
barplot(rep(1,4), axes=FALSE, col=c(col$pal.beige.intestine[1],
col$pal.beige.intestine[2],
col$pal.beige.intestine[3],
col$pal.beige.intestine[4])
)
barplot(rep(1,4), axes=FALSE, col=c(col$pal.purple.alt[1],
col$pal.purple.alt[2],
col$pal.purple.alt[3],
col$pal.purple.alt[4]),
names.arg = c("100%", "75%", "50%", "25%"),
cex.names = 0.8
)ggplot2Below are a few examples of creating various types of data plots in the SWT color scheme. The function swt_style() adds the correct styling to the plot.
set.seed(1980)
n = 100
var1 = c(rnorm(n/2, mean=0), rnorm(n/2, mean=3) )
d = data.frame(var1 = var1,
var2 = var1 + rnorm(n, sd = 0.4),
group = as.factor(rep(c("abc", "mno" ), each=n/2))
)
p1 = ggplot(d, aes(x=group, y=var1, group=group, col=group)) +
geom_boxplot(fill=col$grey.bg) +
geom_point(size=2, shape=1, position = position_jitter(height = 0, width = 0.25),
col=c(rep(col$pal.blue.swt[1], n/2),
rep(col$pal.red.liver[1], n/2))) +
labs(title = "Title", tag = "A") +
scale_color_manual(values = c(col$blue.swt,
col$red.liver)) +
swt_style(legend_position = "none", grey_theme = TRUE, font_size = 14, title_size = 16) +
theme(plot.tag = element_text(size = 16, face = "bold"))
p2 = ggplot(d, aes(x=group, y=var1, group=group, col=group)) +
geom_boxplot() +
geom_point(size=2, position = position_jitter(height = 0, width = 0.25),
col=c(rep(col$pal.turkis.tpx[3], n/2),
rep(col$pal.darkyellow.kidney[3], n/2))) +
scale_color_manual(values = c(col$turkis.tpx,
col$darkyellow.kidney)) +
swt_style(legend_position = "none")
p3 = ggplot(d, aes(x=var2, fill=group)) +
geom_histogram(position = "identity", bins=20, alpha=0.5) +
scale_fill_manual(values = c(col$lightblue.lungs,
col$beige.intestine)) +
swt_style(grey_theme = FALSE)
p4 = ggplot(d, aes(x=var2, y=var1, group=group, col=group)) +
geom_point(size=2, alpha=0.5) +
scale_color_manual(values = c(col$blue.swt,
col$yellow.donation)) +
labs(title = "Title", tag = "D") +
swt_style(legend_position = "bottom", grey_theme = TRUE)
grid.arrange(p1, p2, p3, p4, nrow = 2, ncol = 2)# Data taken from Annual Report 2020 (p. 32)
table3.2 = t(array(c(
13.3,17.2,18.6,18.4,17.0,
11.5,12.6,14.9,11.7,11.2,
1.8,4.6,3.8,6.7,5.8), dim = c(5,3)))
colnames(table3.2) = 2016:2020
rownames(table3.2) = c("Total", "DBD", "DCD")
data3.2 = melt(table3.2, varnames = c("Gruppe", "Jahr"), value.name = "Anzahl")number_height = -0.8
ggplot(data3.2, aes(x=Jahr, y=Anzahl, col=Gruppe, group=Gruppe)) +
# plot line with numbers
geom_line(data = data3.2, linewidth=1) +
geom_text(data = data3.2, aes(label=Anzahl), vjust=number_height,
col="black", size=4) +
# some adjustments (colors, axes, etc)
scale_color_manual(values=c(col$strongred.akzent,
col$yellow.donation,
col$blue.swt)) +
scale_y_continuous(breaks = seq(0,22,4), limits = c(0,22)) +
ylab("pmp") +
labs(title="Title",
subtitle = "Subtitle"
) +
swt_style(grey_theme = TRUE) +
theme(legend.position = "top")# Data taken from Annual Report 2020 (p. 31)
table3.1 = t(array(c(
96,106,126,100,96,
15,39,32,57,50), dim = c(5,2)))
table3.1.totals = array(colSums(table3.1), dim = c(1,5))
colnames(table3.1) = 2016:2020
colnames(table3.1.totals) = 2016:2020
rownames(table3.1) = c("DBD", "DCD")
rownames(table3.1.totals) = c("Total")
data3.1 = melt(table3.1, varnames = c("Gruppe", "Jahr"), value.name = "Anzahl")
data3.1.totals = melt(table3.1.totals, varnames = c("Gruppe", "Jahr"),
value.name = "Anzahl")number_height = -0.8
bar_with = 0.5
ggplot(data3.1, aes(x=Jahr, y=Anzahl, fill=Gruppe, group=Gruppe)) +
# plot bars with numbers
geom_bar(data = data3.1, stat="identity", position="dodge", width=bar_with) +
geom_text(data = data3.1, aes(label=Anzahl), vjust=number_height,
position = position_dodge(width=bar_with)) +
# plot line with numbers
geom_line(data = data3.1.totals, col = col$strongred.akzent, linewidth=1) +
geom_text(data = data3.1.totals, aes(label=Anzahl), vjust=number_height,
position = position_dodge(width=bar_with), col=col$strongred.akzent) +
# some adjustments (colors, axes, etc)
scale_fill_manual(values=c(col$yellow.donation,
col$blue.swt,
col$strongred.akzent)) +
scale_y_continuous(breaks = seq(0,180,20), limits = c(0,180)) +
ylab("Personen") +
swt_style(grey_theme = TRUE) Several helper functions are implemented to support statistical computing.
count_perc() for the count and percentmean_sd() for the mean and standard deviationmedian_irq() for the median and interquartile rangemiss_perc() for the count and percent for missing data (NA)tidy_pvalues() for formatting p-valuesset.seed(1980)
data = data.frame(age = rnorm(n = 200, mean = 50, sd = 10))
data$hypertension = rbinom(n = 200, size = 1, prob = 0.20)
tab = data.frame(all = c(mean_sd(data$age),
count_perc(data$hypertension))
)
colnames(tab) = "Descriptives"
rownames(tab) = c("Age in years, mean (SD)", "Hypertension, count (%)")
tab| Descriptives | |
|---|---|
| Age in years, mean (SD) | 50.2 (10.7) |
| Hypertension, count (%) | 57 (28.5) |
The function tidy_missing() displays missing data of all the variables in a data frame.
data$age[1:5] = NA
tidy_missing(data)| Missing | |
|---|---|
| age | 5 (2.5%) |
| hypertension | 0 (0.0%) |
| TOTAL | 5 (2.5) |
The swt package provides two functions for advanced handling of date and time.
num2date()converts Excel days since origin to POSIXct data type (date/time).date2num() converts POSIXct data type (date/time) to Excel days since origin.Using num2date() for all date variables will also handle the time zone consistently using CET time (and CEST during summer). Note that having mixed time zones, UTC and CET, can cause offsets of 1 hour.
If date and time data are provided in a consistent way, such as from SOAS, these functions are not necessary, and date and time can be handled in the usual way. However, date and time are sometimes entered inconsistently. If we force the data type to be a date using the option col_types, inconsistent values are discarded and show up as NA, which is not ideal. We also get a warning, see below.
data = as.data.frame(read_xlsx(path = "../data/dates.xlsx", col_types = "date"))Warning: Expecting date in A4 / R4C1: got '11.03.1980 11:11:11'
data| mydate |
|---|
| 2021-08-22 16:43:01 |
| 2023-02-17 22:12:02 |
| NA |
An alternative is to import them as data type text. The dates show up as numbers and, if not recognized, as characters. The numbers are the number of days since 1899-12-30; this is an Excel convention.
data = as.data.frame(read_xlsx(path = "../data/dates.xlsx", col_types = "text"))
data| mydate |
|---|
| 44430.696539351855 |
| 44974.925023148149 |
| 11.03.1980 11:11:11 |
We can use the function num2date() to convert numbers into dates. There is also a built-in filter that recognizes the alternative format, which can also be modified via the option; see ?num2date.
data$mydate = num2date(data$mydate, format = "%d.%m.%Y %H:%M:%OS", round = FALSE)
data| mydate |
|---|
| 2021-08-22 16:43:01 |
| 2023-02-17 22:12:02 |
| 1980-03-11 11:11:10 |
One disadvantage of handling dates as numbers is when performing data cleaning. For example, manually correcting dates or adding missing data may be necessary. This has to be done in numbers before the conversion into the POSIXct data type. However, the function date2num()can be used for this purpose.
data = as.data.frame(read_xlsx(path = "../data/dates.xlsx", col_types = "text"))
data$mydate[3] = date2num("1980-03-11 02:10:00")
data$mydate = num2date(data$mydate)
data| mydate |
|---|
| 2021-08-22 16:43:01 |
| 2023-02-17 22:12:02 |
| 1980-03-11 02:10:00 |
The swt package implemented a number of clinical calculators, for example, from published clinical prediction models.
Quick examples are shown below.
egfr = egfr_ckd_epi(SCr = c(40, 110),
age = c(40, 50),
sex = c("M", "F"),
units = "SI")
data.frame(eGFR = egfr)| eGFR |
|---|
| 136 |
| 53 |
assert(egfr == c(136, 53))egfr = egfr_schwartz(SCr = c(100, 110),
height = c(100, 120),
units = "SI")
data.frame(eGFR = egfr)| eGFR |
|---|
| 37 |
| 40 |
assert(egfr == c(37, 40))kdri = optn_kdri(D_age = c(45, 15, 70),
D_height = c(183, 183, 183),
D_weight = c(75, 90, 100),
D_hypertension = c(TRUE, FALSE, TRUE),
D_diabetes = c(TRUE, FALSE, TRUE),
D_CVA = c(TRUE, FALSE, TRUE),
D_SCr = c(1, 1.8, 1.8),
D_DCD = c(TRUE, FALSE, TRUE))
kdri = round(kdri, digits = 2)
data.frame(KDRI = kdri)| KDRI |
|---|
| 1.36 |
| 0.56 |
| 2.10 |
assert(kdri == c(1.36, 0.56, 2.10))kdri = uk_kdri(D_age = 45,
D_height = 180,
D_hypertension = TRUE,
D_female = TRUE,
D_CMV = TRUE,
D_eGFR = 100,
D_days_hosp = 5)
kdri = round(kdri, digits = 2)
data.frame(KDRI = kdri)| KDRI |
|---|
| 0.94 |
assert(kdri == 0.94)score = uk_dcd_score(D_age = c(61, 60),
D_BMI = c(26, 25),
fWIT = c(31, 20),
CIT = c(7, 6),
R_age = c(61, 60),
R_MELD = c(26, 25),
retpx = c(TRUE, FALSE))
data.frame(Score = score)| Score |
|---|
| 27 |
| 0 |
assert(score == c(27, 0))The tidy_rmsfit() function converts regression model results from the package rms into a tidy, publication-ready table. This works for
ols() linear regression model (ordinary least squares)lmr() logistic regression modelcph() Cox proportional hazards modelset.seed(2025)
d = data.frame(x1 = runif(200),
x2 = runif(200),
x3 = as.factor(rbinom(200, 2, 0.5))
)
dd = datadist(d)
options(datadist="dd")
d$y <- d$x1 + d$x2 + rnorm(200)
fit = rms::ols(y ~ x1 + x2 + x3, data = d)Regression model results are typically displayed in an untidy format by summary() and anova(), as shown below:
summary(fit) Effects Response : y
Factor Low High Diff. Effect S.E. Lower 0.95 Upper 0.95
x1 0.23003 0.78329 0.55326 0.670250 0.13141 0.41108 0.92942
x2 0.25219 0.77842 0.52623 0.681250 0.12475 0.43521 0.92728
x3 - 0:1 2.00000 1.00000 NA -0.196800 0.17276 -0.53752 0.14392
x3 - 2:1 2.00000 3.00000 NA 0.081503 0.17186 -0.25744 0.42045
anova(fit) Analysis of Variance Response: y
Factor d.f. Partial SS MS F P
x1 1 25.740338 25.7403376 26.01 <.0001
x2 1 29.506389 29.5063894 29.82 <.0001
x3 2 2.097779 1.0488895 1.06 0.3484
REGRESSION 4 57.355444 14.3388609 14.49 <.0001
ERROR 195 192.949861 0.9894865
The function tidy_rmsfit() combines the effect estimates and the test statistics into one single publication-ready table:
tidy_rmsfit(fit, x3 = 0)| Interquartile difference | Effect estimate (95%-CI) | F-value | d.f. | p-value | |
|---|---|---|---|---|---|
| x1 | 0.55 (from 0.23 to 0.78) | 0.67 (from 0.41 to 0.93) | 26 | 1 | < 0.001 *** |
| x2 | 0.53 (from 0.25 to 0.78) | 0.68 (from 0.44 to 0.93) | 29.8 | 1 | < 0.001 *** |
| x3 | – | – | 1.06 | 2 | 0.35 |
| x3 1 | – | 0.20 (from -0.14 to 0.54) | – | – | – |
| x3 2 | – | 0.28 (from -0.11 to 0.67) | – | – | – |
| TOTAL | – | – | 14.5 | 4 | < 0.001 *** |
| ERROR | – | – | – | 195 | – |
sessionInfo()R version 4.5.1 (2025-06-13 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 10 x64 (build 19045)
Matrix products: default
LAPACK version 3.12.1
locale:
[1] LC_COLLATE=English_Switzerland.utf8 LC_CTYPE=English_Switzerland.utf8
[3] LC_MONETARY=English_Switzerland.utf8 LC_NUMERIC=C
[5] LC_TIME=English_Switzerland.utf8
time zone: Europe/Zurich
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rms_8.0-0 Hmisc_5.2-3 testit_0.13 readxl_1.4.5 swt_0.3
[6] reshape2_1.4.4 gridExtra_2.3 ggplot2_3.5.2
loaded via a namespace (and not attached):
[1] gtable_0.3.6 xfun_0.52 htmlwidgets_1.6.4 lattice_0.22-7
[5] vctrs_0.6.5 tools_4.5.1 generics_0.1.4 sandwich_3.1-1
[9] tibble_3.3.0 cluster_2.1.8.1 pkgconfig_2.0.3 Matrix_1.7-3
[13] data.table_1.17.6 checkmate_2.3.2 RColorBrewer_1.1-3 lifecycle_1.0.4
[17] compiler_4.5.1 farver_2.1.2 stringr_1.5.1 MatrixModels_0.5-4
[21] codetools_0.2-20 SparseM_1.84-2 quantreg_6.1 htmltools_0.5.8.1
[25] yaml_2.3.10 htmlTable_2.4.3 Formula_1.2-5 pillar_1.11.0
[29] MASS_7.3-65 rpart_4.1.24 multcomp_1.4-28 nlme_3.1-168
[33] tidyselect_1.2.1 digest_0.6.37 mvtnorm_1.3-3 polspline_1.1.25
[37] stringi_1.8.7 dplyr_1.1.4 labeling_0.4.3 splines_4.5.1
[41] fastmap_1.2.0 grid_4.5.1 colorspace_2.1-1 cli_3.6.5
[45] magrittr_2.0.3 base64enc_0.1-3 survival_3.8-3 TH.data_1.1-3
[49] foreign_0.8-90 withr_3.0.2 scales_1.4.0 backports_1.5.0
[53] segmented_2.1-4 lubridate_1.9.4 timechange_0.3.0 rmarkdown_2.29
[57] nnet_7.3-20 cellranger_1.1.0 zoo_1.8-14 evaluate_1.0.4
[61] knitr_1.50 rlang_1.1.6 Rcpp_1.1.0 glue_1.8.0
[65] rstudioapi_0.17.1 jsonlite_2.0.0 R6_2.6.1 plyr_1.8.9