Swisstransplant R cookbook

Reproducible, high-quality statistical reporting.

Author

Simon Schwab

Published

July 7, 2025

Abstract
The Swisstransplant R cookbook is a practical guide for generating reproducible, high-quality statistical reports using our custom R package swt.

Introduction

This cookbook introduces the Swisstransplant R package swt to make high-quality reports and publication-ready graphics in our in-house style.

The source code for swt is maintained on GitHub and can be accessed here. Additionally, a package manual is available for reference.

Code
library(ggplot2)
library(gridExtra)
library(reshape2)
library(swt)
library(readxl)
library(testit)
library(rms)

Quarto document

The swt package includes the Swisstransplant template for creating Quarto documents. Quarto documents enable dynamic reporting and unite written text, statistical programming code, results, tables, and figures into a reproducible document.

To create a new Swisstransplant project in RStudio, click New Project…, select New Directory, and then choose Swisstransplant Document from the bottom of the list.

Writing in Markdown

The Swisstransplant Quarto document is written in Markdown; an overview can be found in the Markdown Basics. It is possible to add Callout Blocks to highlight important information.

::: {.callout-note appearance="simple"}
Note that there are five types of callouts, including:
`note`, `warning`, `important`, `tip`, and `caution`.
:::

Note that there are five types of callouts: note, warning, important, tip, and caution.

Something important.

Data analysis workflow

The Quarto document introduces a simple data analysis workflow structured into eight chapters, each using second-level headings (##). The chapters are as follows:

  • Objectives
  • Data import
  • Data processing
  • Quality control
  • Descriptive statistics
  • Primary analysis
  • Secondary analysis
  • Computing information

Each chapter can be further expanded to support more complex analysis projects, as illustrated in the flowchart below.

Quarto documents that follow a structured analysis workflow enhance transparency, support reproducibility, and build trust in statistical analyses.

flowchart TB
O(Objectives) --> I(Data import)
I --> P(Data processing)
P --> Q(Quality control)
Q --> D(Descriptive statistics)
D --> A(Primary analysis)
A --> S(Secondary analyses)
S --> C(Computing information)
O --- OC[Study objectives<br>Primary outcome<br>Secondary outcomes<br>Methods]
I --- IC[Custom R functions<br>Load libraries<br>Two-pass reading when necessary<br>Data class conversion<br>Data cleaning<br>Reshape<br>Merge<br>Aggregate]
P --- PC[Data overview<br>Definition of outcomes<br>Consent status<br>Inclusion, exclusion and subsets<br>Missing data and imputations<br>Transform<br>Matching<br>Create analysis data set]
Q --- QC[Data checks<br>Assert logical expressions<br>Variable distributions]
D --- DC[Sample size<br>Table 1<br>Consort Diagram]
A --- AC[Results<br>Tables<br>Figures]
S --- SC[Results<br>Tables<br>Figures]
C --- CC[Runtime<br>Environment<br>Package names and versions]

Corporate design

To support professional-looking output, the swt package provides a custom color palette and styling for creating ggplot2 figures.

The color scheme and ggplot2 styling are used in the official Swisstransplant national statistics to provide a professional and consistent visual presentation.

Colors

The swt_colors() function creates an object for user-friendly access to the Swisstransplant color scheme:

col = swt_colors()
col$blue.dark

The following colors are available:

Code
col = swt_colors()
par(mfrow=c(1,1), mai=c(0.5,0.1,0.2,0.1))
barplot(rep(1,12), axes=FALSE, col=c(col$blue.dark,
                                     col$blue.alt,
                                     col$turkis.tpx,
                                     col$yellow.donation, 
                                     col$strongred.akzent,
                                     col$pink.heart,
                                     col$red.liver,
                                     col$darkyellow.kidney,
                                     col$green.pancreas,
                                     col$lightblue.lungs,
                                     col$beige.intestine,
                                     col$purple.alt
                                     
),
names.arg = c("blue\ndark", "blue\nalt", "turkis\ntpx", "yellow\ndonat", 
              "strongr\nakzent", "pink\nheart", "red\nliver", "darkylw\nkidney", 
              "green\npancr", "lightb\nlungs", "beige\nintest", "purple\nalt"),
cex.names = 0.8

)

Color palettes

The swt_colors object also includes single hue palettes with three additional color strengths: 75%, 50%, and 25%. Color palettes can be accessed as follows:

col$pal.blue.dark[1] # 100%, same as col$blue.swt
col$pal.blue.dark[2] #  75%
col$pal.blue.dark[3] #  50%
col$pal.blue.dark[4] #  25%
Code
par(mfrow=c(12,1), mai=c(0.1,0.1,0.2,0.1))

barplot(rep(1,4), axes=FALSE, col=c(col$pal.blue.swt[1],
                                    col$pal.blue.swt[2],
                                    col$pal.blue.swt[3],
                                    col$pal.blue.swt[4])
)

barplot(rep(1,4), axes=FALSE, col=c(col$pal.blue.alt[1],
                                    col$pal.blue.alt[2],
                                    col$pal.blue.alt[3],
                                    col$pal.blue.alt[4])
)

barplot(rep(1,4), axes=FALSE, col=c(col$pal.turkis.tpx[1],
                                    col$pal.turkis.tpx[2],
                                    col$pal.turkis.tpx[3],
                                    col$pal.turkis.tpx[4])
)

barplot(rep(1,4), axes=FALSE, col=c(col$pal.yellow.donation[1],
                                    col$pal.yellow.donation[2],
                                    col$pal.yellow.donation[3],
                                    col$pal.yellow.donation[4])
)

barplot(rep(1,4), axes=FALSE, col=c(col$pal.strongred.akzent[1],
                                    col$pal.strongred.akzent[2],
                                    col$pal.strongred.akzent[3],
                                    col$pal.strongred.akzent[4])
)

barplot(rep(1,4), axes=FALSE, col=c(col$pal.pink.heart[1],
                                    col$pal.pink.heart[2],
                                    col$pal.pink.heart[3],
                                    col$pal.pink.heart[4])
)

barplot(rep(1,4), axes=FALSE, col=c(col$pal.red.liver[1],
                                    col$pal.red.liver[2],
                                    col$pal.red.liver[3],
                                    col$pal.red.liver[4])
)

barplot(rep(1,4), axes=FALSE, col=c(col$pal.darkyellow.kidney[1],
                                    col$pal.darkyellow.kidney[2],
                                    col$pal.darkyellow.kidney[3],
                                    col$pal.darkyellow.kidney[4])
)

barplot(rep(1,4), axes=FALSE, col=c(col$pal.green.pancreas[1],
                                    col$pal.green.pancreas[2],
                                    col$pal.green.pancreas[3],
                                    col$pal.green.pancreas[4])
)

barplot(rep(1,4), axes=FALSE, col=c(col$pal.lightblue.lungs[1],
                                    col$pal.lightblue.lungs[2],
                                    col$pal.lightblue.lungs[3],
                                    col$pal.lightblue.lungs[4])
)

barplot(rep(1,4), axes=FALSE, col=c(col$pal.beige.intestine[1],
                                    col$pal.beige.intestine[2],
                                    col$pal.beige.intestine[3],
                                    col$pal.beige.intestine[4])
)

barplot(rep(1,4), axes=FALSE, col=c(col$pal.purple.alt[1],
                                    col$pal.purple.alt[2],
                                    col$pal.purple.alt[3],
                                    col$pal.purple.alt[4]),
        names.arg = c("100%", "75%", "50%", "25%"), 
        cex.names = 0.8
)

Plots with ggplot2

Below are a few examples of creating various types of data plots in the SWT color scheme. The function swt_style() adds the correct styling to the plot.

Scientific plots

Code
set.seed(1980)
n = 100
var1 = c(rnorm(n/2, mean=0), rnorm(n/2, mean=3) )
d = data.frame(var1 = var1,
               var2 = var1 + rnorm(n, sd = 0.4),
               group = as.factor(rep(c("abc", "mno" ), each=n/2))
)

p1 = ggplot(d, aes(x=group, y=var1, group=group, col=group)) + 
  geom_boxplot(fill=col$grey.bg) + 
  geom_point(size=2, shape=1, position = position_jitter(height = 0, width = 0.25),
             col=c(rep(col$pal.blue.swt[1], n/2),
                   rep(col$pal.red.liver[1], n/2))) +
  labs(title = "Title", tag = "A") +
  scale_color_manual(values = c(col$blue.swt,
                                col$red.liver)) +
  swt_style(legend_position = "none", grey_theme = TRUE, font_size = 14, title_size = 16) + 
  theme(plot.tag = element_text(size = 16, face = "bold"))

p2 = ggplot(d, aes(x=group, y=var1, group=group, col=group)) + 
  geom_boxplot() + 
  geom_point(size=2, position = position_jitter(height = 0, width = 0.25),
             col=c(rep(col$pal.turkis.tpx[3], n/2), 
                   rep(col$pal.darkyellow.kidney[3], n/2))) +
  scale_color_manual(values = c(col$turkis.tpx,
                                col$darkyellow.kidney)) +
  swt_style(legend_position = "none")

p3 = ggplot(d, aes(x=var2, fill=group)) + 
  geom_histogram(position = "identity", bins=20, alpha=0.5) +
  scale_fill_manual(values = c(col$lightblue.lungs,
                               col$beige.intestine)) +
  swt_style(grey_theme = FALSE)

p4 = ggplot(d, aes(x=var2, y=var1, group=group, col=group)) + 
  geom_point(size=2, alpha=0.5) +
  scale_color_manual(values = c(col$blue.swt,
                                col$yellow.donation)) +
  labs(title = "Title", tag = "D") +
  swt_style(legend_position = "bottom", grey_theme = TRUE)

grid.arrange(p1, p2, p3, p4, nrow = 2, ncol = 2)

Line plot

Code
# Data taken from Annual Report 2020 (p. 32)
table3.2 = t(array(c(
  13.3,17.2,18.6,18.4,17.0,
  11.5,12.6,14.9,11.7,11.2,
  1.8,4.6,3.8,6.7,5.8), dim = c(5,3)))

colnames(table3.2) = 2016:2020
rownames(table3.2) = c("Total", "DBD", "DCD")

data3.2 = melt(table3.2, varnames = c("Gruppe", "Jahr"), value.name = "Anzahl")
Code
number_height = -0.8

ggplot(data3.2, aes(x=Jahr, y=Anzahl, col=Gruppe, group=Gruppe)) +
  
  # plot line with numbers
  geom_line(data = data3.2, linewidth=1) +
  geom_text(data = data3.2, aes(label=Anzahl), vjust=number_height,
            col="black", size=4) +
  
  # some adjustments (colors, axes, etc)
  scale_color_manual(values=c(col$strongred.akzent,
                              col$yellow.donation,
                              col$blue.swt)) +
  scale_y_continuous(breaks = seq(0,22,4), limits = c(0,22)) +
  
  ylab("pmp") +
  labs(title="Title",
       subtitle = "Subtitle"
  ) +
  swt_style(grey_theme = TRUE) +
  theme(legend.position = "top")

Barplot with lines

Code
# Data taken from Annual Report 2020 (p. 31)
table3.1 = t(array(c(
  96,106,126,100,96,
  15,39,32,57,50), dim = c(5,2)))

table3.1.totals = array(colSums(table3.1), dim = c(1,5))
colnames(table3.1) = 2016:2020
colnames(table3.1.totals) = 2016:2020
rownames(table3.1) = c("DBD", "DCD")
rownames(table3.1.totals) = c("Total")

data3.1 = melt(table3.1, varnames = c("Gruppe", "Jahr"), value.name = "Anzahl")
data3.1.totals = melt(table3.1.totals, varnames = c("Gruppe", "Jahr"),
                      value.name = "Anzahl")
Code
number_height = -0.8
bar_with = 0.5

ggplot(data3.1, aes(x=Jahr, y=Anzahl, fill=Gruppe, group=Gruppe)) +
  
  # plot bars with numbers
  geom_bar(data = data3.1, stat="identity", position="dodge", width=bar_with) +
  geom_text(data = data3.1, aes(label=Anzahl), vjust=number_height,
            position = position_dodge(width=bar_with)) +
  
  # plot line with numbers
  geom_line(data = data3.1.totals, col = col$strongred.akzent, linewidth=1) +
  geom_text(data = data3.1.totals, aes(label=Anzahl), vjust=number_height,
            position = position_dodge(width=bar_with), col=col$strongred.akzent) +
  
  # some adjustments (colors, axes, etc)
  scale_fill_manual(values=c(col$yellow.donation, 
                             col$blue.swt,
                             col$strongred.akzent)) +
  scale_y_continuous(breaks = seq(0,180,20), limits = c(0,180)) +
  ylab("Personen") +
  swt_style(grey_theme = TRUE) 

Tiny little helpers

Several helper functions are implemented to support statistical computing.

Descriptive statistics and formatting

  • count_perc() for the count and percent
  • mean_sd() for the mean and standard deviation
  • median_irq() for the median and interquartile range
  • miss_perc() for the count and percent for missing data (NA)
  • tidy_pvalues() for formatting p-values
Code
set.seed(1980)
data = data.frame(age = rnorm(n = 200, mean = 50, sd = 10))
data$hypertension = rbinom(n = 200, size = 1, prob = 0.20)

tab = data.frame(all = c(mean_sd(data$age),
                         count_perc(data$hypertension))
)

colnames(tab) = "Descriptives"
rownames(tab) = c("Age in years, mean (SD)", "Hypertension, count (%)")
tab
Descriptives
Age in years, mean (SD) 50.2 (10.7)
Hypertension, count (%) 57 (28.5)

Missing data

The function tidy_missing() displays missing data of all the variables in a data frame.

Code
data$age[1:5] = NA
tidy_missing(data)
Missing
age 5 (2.5%)
hypertension 0 (0.0%)
TOTAL 5 (2.5)

Handling date and time

The swt package provides two functions for advanced handling of date and time.

  • num2date()converts Excel days since origin to POSIXct data type (date/time).
  • date2num() converts POSIXct data type (date/time) to Excel days since origin.

Using num2date() for all date variables will also handle the time zone consistently using CET time (and CEST during summer). Note that having mixed time zones, UTC and CET, can cause offsets of 1 hour.

If date and time data are provided in a consistent way, such as from SOAS, these functions are not necessary, and date and time can be handled in the usual way. However, date and time are sometimes entered inconsistently. If we force the data type to be a date using the option col_types, inconsistent values are discarded and show up as NA, which is not ideal. We also get a warning, see below.

Code
data = as.data.frame(read_xlsx(path = "../data/dates.xlsx", col_types = "date"))
Warning: Expecting date in A4 / R4C1: got '11.03.1980 11:11:11'
Code
data
mydate
2021-08-22 16:43:01
2023-02-17 22:12:02
NA

An alternative is to import them as data type text. The dates show up as numbers and, if not recognized, as characters. The numbers are the number of days since 1899-12-30; this is an Excel convention.

Code
data = as.data.frame(read_xlsx(path = "../data/dates.xlsx", col_types = "text"))
data
mydate
44430.696539351855
44974.925023148149
11.03.1980 11:11:11

We can use the function num2date() to convert numbers into dates. There is also a built-in filter that recognizes the alternative format, which can also be modified via the option; see ?num2date.

Code
data$mydate = num2date(data$mydate, format = "%d.%m.%Y %H:%M:%OS", round = FALSE)
data
mydate
2021-08-22 16:43:01
2023-02-17 22:12:02
1980-03-11 11:11:10

One disadvantage of handling dates as numbers is when performing data cleaning. For example, manually correcting dates or adding missing data may be necessary. This has to be done in numbers before the conversion into the POSIXct data type. However, the function date2num()can be used for this purpose.

Code
data = as.data.frame(read_xlsx(path = "../data/dates.xlsx", col_types = "text"))
data$mydate[3] = date2num("1980-03-11 02:10:00")
data$mydate = num2date(data$mydate)
data
mydate
2021-08-22 16:43:01
2023-02-17 22:12:02
1980-03-11 02:10:00

Clinical calculators

The swt package implemented a number of clinical calculators, for example, from published clinical prediction models.

  • eGFR according to CKD-EPI 2021
  • eGFR for pediatric patients according to Schwartz
  • OPTN KDRI (2024 version)
  • UK KDRI (2019 version)
  • UK DCD Risk Score

Quick examples are shown below.

CKD-EPI eGFR 2021

Code
egfr = egfr_ckd_epi(SCr = c(40, 110), 
                    age = c(40, 50), 
                    sex = c("M", "F"),
                    units = "SI")
data.frame(eGFR = egfr)
eGFR
136
53
Code
assert(egfr == c(136, 53))

Revised Schwartz Equation 2009

Code
egfr = egfr_schwartz(SCr = c(100, 110), 
                     height = c(100, 120),
                     units = "SI")
data.frame(eGFR = egfr)
eGFR
37
40
Code
assert(egfr == c(37, 40))

OPTN KDRI

Code
kdri = optn_kdri(D_age = c(45, 15, 70),
                 D_height = c(183, 183, 183),
                 D_weight = c(75, 90, 100), 
                 D_hypertension = c(TRUE, FALSE, TRUE),
                 D_diabetes = c(TRUE, FALSE, TRUE),
                 D_CVA = c(TRUE, FALSE, TRUE),
                 D_SCr = c(1, 1.8, 1.8),
                 D_DCD = c(TRUE, FALSE, TRUE))

kdri = round(kdri, digits = 2)
data.frame(KDRI = kdri)
KDRI
1.36
0.56
2.10
Code
assert(kdri == c(1.36, 0.56, 2.10))

UK KDRI

Code
kdri = uk_kdri(D_age = 45,
               D_height = 180,
               D_hypertension = TRUE,
               D_female = TRUE,
               D_CMV = TRUE,
               D_eGFR = 100,
               D_days_hosp = 5)

kdri = round(kdri, digits = 2)
data.frame(KDRI = kdri)
KDRI
0.94
Code
assert(kdri == 0.94)

UK DCD Risk Score

Code
score = uk_dcd_score(D_age = c(61, 60),
                     D_BMI = c(26, 25),
                     fWIT = c(31, 20),
                     CIT = c(7, 6),
                     R_age = c(61, 60),
                     R_MELD = c(26, 25),
                     retpx = c(TRUE, FALSE))

data.frame(Score = score)
Score
27
0
Code
assert(score == c(27, 0))

Statistical models

The tidy_rmsfit() function converts regression model results from the package rms into a tidy, publication-ready table. This works for

  • ols() linear regression model (ordinary least squares)
  • lmr() logistic regression model
  • cph() Cox proportional hazards model
Code
set.seed(2025)

d = data.frame(x1 = runif(200),
               x2 = runif(200),
               x3 = as.factor(rbinom(200, 2, 0.5))
)

dd = datadist(d)
options(datadist="dd")  

d$y <- d$x1 + d$x2 + rnorm(200)

fit = rms::ols(y ~ x1 + x2 + x3, data = d)

Regression model results are typically displayed in an untidy format by summary() and anova(), as shown below:

Code
summary(fit)
             Effects              Response : y 

 Factor   Low     High    Diff.   Effect    S.E.    Lower 0.95 Upper 0.95
 x1       0.23003 0.78329 0.55326  0.670250 0.13141  0.41108   0.92942   
 x2       0.25219 0.77842 0.52623  0.681250 0.12475  0.43521   0.92728   
 x3 - 0:1 2.00000 1.00000      NA -0.196800 0.17276 -0.53752   0.14392   
 x3 - 2:1 2.00000 3.00000      NA  0.081503 0.17186 -0.25744   0.42045   
Code
anova(fit)
                Analysis of Variance          Response: y 

 Factor     d.f. Partial SS MS         F     P     
 x1           1   25.740338 25.7403376 26.01 <.0001
 x2           1   29.506389 29.5063894 29.82 <.0001
 x3           2    2.097779  1.0488895  1.06 0.3484
 REGRESSION   4   57.355444 14.3388609 14.49 <.0001
 ERROR      195  192.949861  0.9894865             

The function tidy_rmsfit() combines the effect estimates and the test statistics into one single publication-ready table:

Code
tidy_rmsfit(fit, x3 = 0)
Interquartile difference Effect estimate (95%-CI) F-value d.f. p-value
x1 0.55 (from 0.23 to 0.78) 0.67 (from 0.41 to 0.93) 26 1 < 0.001 ***
x2 0.53 (from 0.25 to 0.78) 0.68 (from 0.44 to 0.93) 29.8 1 < 0.001 ***
x3 1.06 2 0.35
x3 1 0.20 (from -0.14 to 0.54)
x3 2 0.28 (from -0.11 to 0.67)
TOTAL 14.5 4 < 0.001 ***
ERROR 195

Computing information

Code
sessionInfo()
R version 4.5.1 (2025-06-13 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 10 x64 (build 19045)

Matrix products: default
  LAPACK version 3.12.1

locale:
[1] LC_COLLATE=English_Switzerland.utf8  LC_CTYPE=English_Switzerland.utf8   
[3] LC_MONETARY=English_Switzerland.utf8 LC_NUMERIC=C                        
[5] LC_TIME=English_Switzerland.utf8    

time zone: Europe/Zurich
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rms_8.0-0      Hmisc_5.2-3    testit_0.13    readxl_1.4.5   swt_0.3       
[6] reshape2_1.4.4 gridExtra_2.3  ggplot2_3.5.2 

loaded via a namespace (and not attached):
 [1] gtable_0.3.6       xfun_0.52          htmlwidgets_1.6.4  lattice_0.22-7    
 [5] vctrs_0.6.5        tools_4.5.1        generics_0.1.4     sandwich_3.1-1    
 [9] tibble_3.3.0       cluster_2.1.8.1    pkgconfig_2.0.3    Matrix_1.7-3      
[13] data.table_1.17.6  checkmate_2.3.2    RColorBrewer_1.1-3 lifecycle_1.0.4   
[17] compiler_4.5.1     farver_2.1.2       stringr_1.5.1      MatrixModels_0.5-4
[21] codetools_0.2-20   SparseM_1.84-2     quantreg_6.1       htmltools_0.5.8.1 
[25] yaml_2.3.10        htmlTable_2.4.3    Formula_1.2-5      pillar_1.11.0     
[29] MASS_7.3-65        rpart_4.1.24       multcomp_1.4-28    nlme_3.1-168      
[33] tidyselect_1.2.1   digest_0.6.37      mvtnorm_1.3-3      polspline_1.1.25  
[37] stringi_1.8.7      dplyr_1.1.4        labeling_0.4.3     splines_4.5.1     
[41] fastmap_1.2.0      grid_4.5.1         colorspace_2.1-1   cli_3.6.5         
[45] magrittr_2.0.3     base64enc_0.1-3    survival_3.8-3     TH.data_1.1-3     
[49] foreign_0.8-90     withr_3.0.2        scales_1.4.0       backports_1.5.0   
[53] segmented_2.1-4    lubridate_1.9.4    timechange_0.3.0   rmarkdown_2.29    
[57] nnet_7.3-20        cellranger_1.1.0   zoo_1.8-14         evaluate_1.0.4    
[61] knitr_1.50         rlang_1.1.6        Rcpp_1.1.0         glue_1.8.0        
[65] rstudioapi_0.17.1  jsonlite_2.0.0     R6_2.6.1           plyr_1.8.9