Swisstransplant R cookbook

Reproducible, high-quality statistical reporting.

Author

Simon Schwab

Published

July 7, 2025

Abstract

The Swisstransplant R cookbook is a practical guide for generating reproducible, high-quality statistical reports using our custom R package swt.

Introduction

This cookbook introduces the Swisstransplant R package swt to make high-quality reports and publication-ready graphics in our in-house style.

The source code for swt is maintained on GitHub and can be accessed here. Additionally, a package manual is available for reference.

Code

library(ggplot2)
library(gridExtra)
library(reshape2)
library(swt)
library(readxl)
library(testit)
library(rms)

Quarto document

The swt package includes the Swisstransplant template for creating Quarto documents. Quarto documents enable dynamic reporting and unite written text, statistical programming code, results, tables, and figures into a reproducible document.

To create a new Swisstransplant project in RStudio, click New Project…, select New Directory, and then choose Swisstransplant Document from the bottom of the list.

Writing in Markdown

The Swisstransplant Quarto document is written in Markdown; an overview can be found in the Markdown Basics. It is possible to add Callout Blocks to highlight important information.

::: {.callout-note appearance="simple"}
Note that there are five types of callouts, including:
`note`, `warning`, `important`, `tip`, and `caution`.
:::

Note that there are five types of callouts: note, warning, important, tip, and caution.

Something important.

Data analysis workflow

The Quarto document introduces a simple data analysis workflow structured into eight chapters, each using second-level headings (##). The chapters are as follows:

Objectives
Data import
Data processing
Quality control
Descriptive statistics
Primary analysis
Secondary analysis
Computing information

Each chapter can be further expanded to support more complex analysis projects, as illustrated in the flowchart below.

Quarto documents that follow a structured analysis workflow enhance transparency, support reproducibility, and build trust in statistical analyses.

flowchart TB
O(Objectives) --> I(Data import)
I --> P(Data processing)
P --> Q(Quality control)
Q --> D(Descriptive statistics)
D --> A(Primary analysis)
A --> S(Secondary analyses)
S --> C(Computing information)
O --- OC[Study objectives<br>Primary outcome<br>Secondary outcomes<br>Methods]
I --- IC[Custom R functions<br>Load libraries<br>Two-pass reading when necessary<br>Data class conversion<br>Data cleaning<br>Reshape<br>Merge<br>Aggregate]
P --- PC[Data overview<br>Definition of outcomes<br>Consent status<br>Inclusion, exclusion and subsets<br>Missing data and imputations<br>Transform<br>Matching<br>Create analysis data set]
Q --- QC[Data checks<br>Assert logical expressions<br>Variable distributions]
D --- DC[Sample size<br>Table 1<br>Consort Diagram]
A --- AC[Results<br>Tables<br>Figures]
S --- SC[Results<br>Tables<br>Figures]
C --- CC[Runtime<br>Environment<br>Package names and versions]

Corporate design

To support professional-looking output, the swt package provides a custom color palette and styling for creating ggplot2 figures.

The color scheme and ggplot2 styling are used in the official Swisstransplant national statistics to provide a professional and consistent visual presentation.

Colors

The swt_colors() function creates an object for user-friendly access to the Swisstransplant color scheme:

col = swt_colors()
col$blue.dark

The following colors are available:

Code

col = swt_colors()
par(mfrow=c(1,1), mai=c(0.5,0.1,0.2,0.1))
barplot(rep(1,12), axes=FALSE, col=c(col$blue.dark,
                                     col$blue.alt,
                                     col$turkis.tpx,
                                     col$yellow.donation, 
                                     col$strongred.akzent,
                                     col$pink.heart,
                                     col$red.liver,
                                     col$darkyellow.kidney,
                                     col$green.pancreas,
                                     col$lightblue.lungs,
                                     col$beige.intestine,
                                     col$purple.alt
                                     
),
names.arg = c("blue\ndark", "blue\nalt", "turkis\ntpx", "yellow\ndonat", 
              "strongr\nakzent", "pink\nheart", "red\nliver", "darkylw\nkidney", 
              "green\npancr", "lightb\nlungs", "beige\nintest", "purple\nalt"),
cex.names = 0.8

)

Color palettes

The swt_colors object also includes single hue palettes with three additional color strengths: 75%, 50%, and 25%. Color palettes can be accessed as follows:

col$pal.blue.dark[1] # 100%, same as col$blue.swt
col$pal.blue.dark[2] #  75%
col$pal.blue.dark[3] #  50%
col$pal.blue.dark[4] #  25%

Code

par(mfrow=c(12,1), mai=c(0.1,0.1,0.2,0.1))

barplot(rep(1,4), axes=FALSE, col=c(col$pal.blue.swt[1],
                                    col$pal.blue.swt[2],
                                    col$pal.blue.swt[3],
                                    col$pal.blue.swt[4])
)

barplot(rep(1,4), axes=FALSE, col=c(col$pal.blue.alt[1],
                                    col$pal.blue.alt[2],
                                    col$pal.blue.alt[3],
                                    col$pal.blue.alt[4])
)

barplot(rep(1,4), axes=FALSE, col=c(col$pal.turkis.tpx[1],
                                    col$pal.turkis.tpx[2],
                                    col$pal.turkis.tpx[3],
                                    col$pal.turkis.tpx[4])
)

barplot(rep(1,4), axes=FALSE, col=c(col$pal.yellow.donation[1],
                                    col$pal.yellow.donation[2],
                                    col$pal.yellow.donation[3],
                                    col$pal.yellow.donation[4])
)

barplot(rep(1,4), axes=FALSE, col=c(col$pal.strongred.akzent[1],
                                    col$pal.strongred.akzent[2],
                                    col$pal.strongred.akzent[3],
                                    col$pal.strongred.akzent[4])
)

barplot(rep(1,4), axes=FALSE, col=c(col$pal.pink.heart[1],
                                    col$pal.pink.heart[2],
                                    col$pal.pink.heart[3],
                                    col$pal.pink.heart[4])
)

barplot(rep(1,4), axes=FALSE, col=c(col$pal.red.liver[1],
                                    col$pal.red.liver[2],
                                    col$pal.red.liver[3],
                                    col$pal.red.liver[4])
)

barplot(rep(1,4), axes=FALSE, col=c(col$pal.darkyellow.kidney[1],
                                    col$pal.darkyellow.kidney[2],
                                    col$pal.darkyellow.kidney[3],
                                    col$pal.darkyellow.kidney[4])
)

barplot(rep(1,4), axes=FALSE, col=c(col$pal.green.pancreas[1],
                                    col$pal.green.pancreas[2],
                                    col$pal.green.pancreas[3],
                                    col$pal.green.pancreas[4])
)

barplot(rep(1,4), axes=FALSE, col=c(col$pal.lightblue.lungs[1],
                                    col$pal.lightblue.lungs[2],
                                    col$pal.lightblue.lungs[3],
                                    col$pal.lightblue.lungs[4])
)

barplot(rep(1,4), axes=FALSE, col=c(col$pal.beige.intestine[1],
                                    col$pal.beige.intestine[2],
                                    col$pal.beige.intestine[3],
                                    col$pal.beige.intestine[4])
)

barplot(rep(1,4), axes=FALSE, col=c(col$pal.purple.alt[1],
                                    col$pal.purple.alt[2],
                                    col$pal.purple.alt[3],
                                    col$pal.purple.alt[4]),
        names.arg = c("100%", "75%", "50%", "25%"), 
        cex.names = 0.8
)

Plots with `ggplot2`

Below are a few examples of creating various types of data plots in the SWT color scheme. The function swt_style() adds the correct styling to the plot.

Scientific plots

Code

set.seed(1980)
n = 100
var1 = c(rnorm(n/2, mean=0), rnorm(n/2, mean=3) )
d = data.frame(var1 = var1,
               var2 = var1 + rnorm(n, sd = 0.4),
               group = as.factor(rep(c("abc", "mno" ), each=n/2))
)

p1 = ggplot(d, aes(x=group, y=var1, group=group, col=group)) + 
  geom_boxplot(fill=col$grey.bg) + 
  geom_point(size=2, shape=1, position = position_jitter(height = 0, width = 0.25),
             col=c(rep(col$pal.blue.swt[1], n/2),
                   rep(col$pal.red.liver[1], n/2))) +
  labs(title = "Title", tag = "A") +
  scale_color_manual(values = c(col$blue.swt,
                                col$red.liver)) +
  swt_style(legend_position = "none", grey_theme = TRUE, font_size = 14, title_size = 16) + 
  theme(plot.tag = element_text(size = 16, face = "bold"))

p2 = ggplot(d, aes(x=group, y=var1, group=group, col=group)) + 
  geom_boxplot() + 
  geom_point(size=2, position = position_jitter(height = 0, width = 0.25),
             col=c(rep(col$pal.turkis.tpx[3], n/2), 
                   rep(col$pal.darkyellow.kidney[3], n/2))) +
  scale_color_manual(values = c(col$turkis.tpx,
                                col$darkyellow.kidney)) +
  swt_style(legend_position = "none")

p3 = ggplot(d, aes(x=var2, fill=group)) + 
  geom_histogram(position = "identity", bins=20, alpha=0.5) +
  scale_fill_manual(values = c(col$lightblue.lungs,
                               col$beige.intestine)) +
  swt_style(grey_theme = FALSE)

p4 = ggplot(d, aes(x=var2, y=var1, group=group, col=group)) + 
  geom_point(size=2, alpha=0.5) +
  scale_color_manual(values = c(col$blue.swt,
                                col$yellow.donation)) +
  labs(title = "Title", tag = "D") +
  swt_style(legend_position = "bottom", grey_theme = TRUE)

grid.arrange(p1, p2, p3, p4, nrow = 2, ncol = 2)

Line plot

Code

# Data taken from Annual Report 2020 (p. 32)
table3.2 = t(array(c(
  13.3,17.2,18.6,18.4,17.0,
  11.5,12.6,14.9,11.7,11.2,
  1.8,4.6,3.8,6.7,5.8), dim = c(5,3)))

colnames(table3.2) = 2016:2020
rownames(table3.2) = c("Total", "DBD", "DCD")

data3.2 = melt(table3.2, varnames = c("Gruppe", "Jahr"), value.name = "Anzahl")

Code

number_height = -0.8

ggplot(data3.2, aes(x=Jahr, y=Anzahl, col=Gruppe, group=Gruppe)) +
  
  # plot line with numbers
  geom_line(data = data3.2, linewidth=1) +
  geom_text(data = data3.2, aes(label=Anzahl), vjust=number_height,
            col="black", size=4) +
  
  # some adjustments (colors, axes, etc)
  scale_color_manual(values=c(col$strongred.akzent,
                              col$yellow.donation,
                              col$blue.swt)) +
  scale_y_continuous(breaks = seq(0,22,4), limits = c(0,22)) +
  
  ylab("pmp") +
  labs(title="Title",
       subtitle = "Subtitle"
  ) +
  swt_style(grey_theme = TRUE) +
  theme(legend.position = "top")

Barplot with lines

Code

# Data taken from Annual Report 2020 (p. 31)
table3.1 = t(array(c(
  96,106,126,100,96,
  15,39,32,57,50), dim = c(5,2)))

table3.1.totals = array(colSums(table3.1), dim = c(1,5))
colnames(table3.1) = 2016:2020
colnames(table3.1.totals) = 2016:2020
rownames(table3.1) = c("DBD", "DCD")
rownames(table3.1.totals) = c("Total")

data3.1 = melt(table3.1, varnames = c("Gruppe", "Jahr"), value.name = "Anzahl")
data3.1.totals = melt(table3.1.totals, varnames = c("Gruppe", "Jahr"),
                      value.name = "Anzahl")

Code

number_height = -0.8
bar_with = 0.5

ggplot(data3.1, aes(x=Jahr, y=Anzahl, fill=Gruppe, group=Gruppe)) +
  
  # plot bars with numbers
  geom_bar(data = data3.1, stat="identity", position="dodge", width=bar_with) +
  geom_text(data = data3.1, aes(label=Anzahl), vjust=number_height,
            position = position_dodge(width=bar_with)) +
  
  # plot line with numbers
  geom_line(data = data3.1.totals, col = col$strongred.akzent, linewidth=1) +
  geom_text(data = data3.1.totals, aes(label=Anzahl), vjust=number_height,
            position = position_dodge(width=bar_with), col=col$strongred.akzent) +
  
  # some adjustments (colors, axes, etc)
  scale_fill_manual(values=c(col$yellow.donation, 
                             col$blue.swt,
                             col$strongred.akzent)) +
  scale_y_continuous(breaks = seq(0,180,20), limits = c(0,180)) +
  ylab("Personen") +
  swt_style(grey_theme = TRUE)

Tiny little helpers

Several helper functions are implemented to support statistical computing.

Descriptive statistics and formatting

count_perc() for the count and percent
mean_sd() for the mean and standard deviation
median_irq() for the median and interquartile range
miss_perc() for the count and percent for missing data (NA)
tidy_pvalues() for formatting p-values

Code

set.seed(1980)
data = data.frame(age = rnorm(n = 200, mean = 50, sd = 10))
data$hypertension = rbinom(n = 200, size = 1, prob = 0.20)

tab = data.frame(all = c(mean_sd(data$age),
                         count_perc(data$hypertension))
)

colnames(tab) = "Descriptives"
rownames(tab) = c("Age in years, mean (SD)", "Hypertension, count (%)")
tab

	Descriptives
Age in years, mean (SD)	50.2 (10.7)
Hypertension, count (%)	57 (28.5)

Missing data

The function tidy_missing() displays missing data of all the variables in a data frame.

Code

data$age[1:5] = NA
tidy_missing(data)

	Missing
age	5 (2.5%)
hypertension	0 (0.0%)
TOTAL	5 (2.5)

Handling date and time

The swt package provides two functions for advanced handling of date and time.

num2date()converts Excel days since origin to POSIXct data type (date/time).
date2num() converts POSIXct data type (date/time) to Excel days since origin.

Using num2date() for all date variables will also handle the time zone consistently using CET time (and CEST during summer). Note that having mixed time zones, UTC and CET, can cause offsets of 1 hour.

If date and time data are provided in a consistent way, such as from SOAS, these functions are not necessary, and date and time can be handled in the usual way. However, date and time are sometimes entered inconsistently. If we force the data type to be a date using the option col_types, inconsistent values are discarded and show up as NA, which is not ideal. We also get a warning, see below.

Code

data = as.data.frame(read_xlsx(path = "../data/dates.xlsx", col_types = "date"))

Warning: Expecting date in A4 / R4C1: got '11.03.1980 11:11:11'

Code

data

mydate
2021-08-22 16:43:01
2023-02-17 22:12:02
NA

An alternative is to import them as data type text. The dates show up as numbers and, if not recognized, as characters. The numbers are the number of days since 1899-12-30; this is an Excel convention.

Code

data = as.data.frame(read_xlsx(path = "../data/dates.xlsx", col_types = "text"))
data

mydate
44430.696539351855
44974.925023148149
11.03.1980 11:11:11

We can use the function num2date() to convert numbers into dates. There is also a built-in filter that recognizes the alternative format, which can also be modified via the option; see ?num2date.

Code

data$mydate = num2date(data$mydate, format = "%d.%m.%Y %H:%M:%OS", round = FALSE)
data

mydate
2021-08-22 16:43:01
2023-02-17 22:12:02
1980-03-11 11:11:10

One disadvantage of handling dates as numbers is when performing data cleaning. For example, manually correcting dates or adding missing data may be necessary. This has to be done in numbers before the conversion into the POSIXct data type. However, the function date2num()can be used for this purpose.

Code

data = as.data.frame(read_xlsx(path = "../data/dates.xlsx", col_types = "text"))
data$mydate[3] = date2num("1980-03-11 02:10:00")
data$mydate = num2date(data$mydate)
data

mydate
2021-08-22 16:43:01
2023-02-17 22:12:02
1980-03-11 02:10:00

Clinical calculators

The swt package implemented a number of clinical calculators, for example, from published clinical prediction models.

eGFR according to CKD-EPI 2021
eGFR for pediatric patients according to Schwartz
OPTN KDRI (2024 version)
UK KDRI (2019 version)
UK DCD Risk Score

Quick examples are shown below.

CKD-EPI eGFR 2021

Code

egfr = egfr_ckd_epi(SCr = c(40, 110), 
                    age = c(40, 50), 
                    sex = c("M", "F"),
                    units = "SI")
data.frame(eGFR = egfr)

eGFR
136
53

Code

assert(egfr == c(136, 53))

Revised Schwartz Equation 2009

Code

egfr = egfr_schwartz(SCr = c(100, 110), 
                     height = c(100, 120),
                     units = "SI")
data.frame(eGFR = egfr)

eGFR
37
40

Code

assert(egfr == c(37, 40))

OPTN KDRI

Code

kdri = optn_kdri(D_age = c(45, 15, 70),
                 D_height = c(183, 183, 183),
                 D_weight = c(75, 90, 100), 
                 D_hypertension = c(TRUE, FALSE, TRUE),
                 D_diabetes = c(TRUE, FALSE, TRUE),
                 D_CVA = c(TRUE, FALSE, TRUE),
                 D_SCr = c(1, 1.8, 1.8),
                 D_DCD = c(TRUE, FALSE, TRUE))

kdri = round(kdri, digits = 2)
data.frame(KDRI = kdri)

KDRI
1.36
0.56
2.10

Code

assert(kdri == c(1.36, 0.56, 2.10))

UK KDRI

Code

kdri = uk_kdri(D_age = 45,
               D_height = 180,
               D_hypertension = TRUE,
               D_female = TRUE,
               D_CMV = TRUE,
               D_eGFR = 100,
               D_days_hosp = 5)

kdri = round(kdri, digits = 2)
data.frame(KDRI = kdri)

KDRI
0.94

Code

assert(kdri == 0.94)

UK DCD Risk Score

Code

score = uk_dcd_score(D_age = c(61, 60),
                     D_BMI = c(26, 25),
                     fWIT = c(31, 20),
                     CIT = c(7, 6),
                     R_age = c(61, 60),
                     R_MELD = c(26, 25),
                     retpx = c(TRUE, FALSE))

data.frame(Score = score)

Score
27
0

Code

assert(score == c(27, 0))

Statistical models

The tidy_rmsfit() function converts regression model results from the package rms into a tidy, publication-ready table. This works for

ols() linear regression model (ordinary least squares)
lmr() logistic regression model
cph() Cox proportional hazards model

Code

set.seed(2025)

d = data.frame(x1 = runif(200),
               x2 = runif(200),
               x3 = as.factor(rbinom(200, 2, 0.5))
)

dd = datadist(d)
options(datadist="dd")  

d$y <- d$x1 + d$x2 + rnorm(200)

fit = rms::ols(y ~ x1 + x2 + x3, data = d)

Regression model results are typically displayed in an untidy format by summary() and anova(), as shown below:

Code

summary(fit)

             Effects              Response : y 

 Factor   Low     High    Diff.   Effect    S.E.    Lower 0.95 Upper 0.95
 x1       0.23003 0.78329 0.55326  0.670250 0.13141  0.41108   0.92942   
 x2       0.25219 0.77842 0.52623  0.681250 0.12475  0.43521   0.92728   
 x3 - 0:1 2.00000 1.00000      NA -0.196800 0.17276 -0.53752   0.14392   
 x3 - 2:1 2.00000 3.00000      NA  0.081503 0.17186 -0.25744   0.42045

Code

anova(fit)

                Analysis of Variance          Response: y 

 Factor     d.f. Partial SS MS         F     P     
 x1           1   25.740338 25.7403376 26.01 <.0001
 x2           1   29.506389 29.5063894 29.82 <.0001
 x3           2    2.097779  1.0488895  1.06 0.3484
 REGRESSION   4   57.355444 14.3388609 14.49 <.0001
 ERROR      195  192.949861  0.9894865

The function tidy_rmsfit() combines the effect estimates and the test statistics into one single publication-ready table:

Code

tidy_rmsfit(fit, x3 = 0)

	Interquartile difference	Effect estimate (95%-CI)	F-value	d.f.	p-value
x1	0.55 (from 0.23 to 0.78)	0.67 (from 0.41 to 0.93)	26	1	< 0.001 ***
x2	0.53 (from 0.25 to 0.78)	0.68 (from 0.44 to 0.93)	29.8	1	< 0.001 ***
x3	–	–	1.06	2	0.35
x3 1	–	0.20 (from -0.14 to 0.54)	–	–	–
x3 2	–	0.28 (from -0.11 to 0.67)	–	–	–
TOTAL	–	–	14.5	4	< 0.001 ***
ERROR	–	–	–	195	–

Computing information

Code

sessionInfo()

R version 4.5.1 (2025-06-13 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 10 x64 (build 19045)

Matrix products: default
  LAPACK version 3.12.1

locale:
[1] LC_COLLATE=English_Switzerland.utf8  LC_CTYPE=English_Switzerland.utf8   
[3] LC_MONETARY=English_Switzerland.utf8 LC_NUMERIC=C                        
[5] LC_TIME=English_Switzerland.utf8    

time zone: Europe/Zurich
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rms_8.0-0      Hmisc_5.2-3    testit_0.13    readxl_1.4.5   swt_0.3       
[6] reshape2_1.4.4 gridExtra_2.3  ggplot2_3.5.2 

loaded via a namespace (and not attached):
 [1] gtable_0.3.6       xfun_0.52          htmlwidgets_1.6.4  lattice_0.22-7    
 [5] vctrs_0.6.5        tools_4.5.1        generics_0.1.4     sandwich_3.1-1    
 [9] tibble_3.3.0       cluster_2.1.8.1    pkgconfig_2.0.3    Matrix_1.7-3      
[13] data.table_1.17.6  checkmate_2.3.2    RColorBrewer_1.1-3 lifecycle_1.0.4   
[17] compiler_4.5.1     farver_2.1.2       stringr_1.5.1      MatrixModels_0.5-4
[21] codetools_0.2-20   SparseM_1.84-2     quantreg_6.1       htmltools_0.5.8.1 
[25] yaml_2.3.10        htmlTable_2.4.3    Formula_1.2-5      pillar_1.11.0     
[29] MASS_7.3-65        rpart_4.1.24       multcomp_1.4-28    nlme_3.1-168      
[33] tidyselect_1.2.1   digest_0.6.37      mvtnorm_1.3-3      polspline_1.1.25  
[37] stringi_1.8.7      dplyr_1.1.4        labeling_0.4.3     splines_4.5.1     
[41] fastmap_1.2.0      grid_4.5.1         colorspace_2.1-1   cli_3.6.5         
[45] magrittr_2.0.3     base64enc_0.1-3    survival_3.8-3     TH.data_1.1-3     
[49] foreign_0.8-90     withr_3.0.2        scales_1.4.0       backports_1.5.0   
[53] segmented_2.1-4    lubridate_1.9.4    timechange_0.3.0   rmarkdown_2.29    
[57] nnet_7.3-20        cellranger_1.1.0   zoo_1.8-14         evaluate_1.0.4    
[61] knitr_1.50         rlang_1.1.6        Rcpp_1.1.0         glue_1.8.0        
[65] rstudioapi_0.17.1  jsonlite_2.0.0     R6_2.6.1           plyr_1.8.9

Introduction

Quarto document

Writing in Markdown

Data analysis workflow

Corporate design

Colors

Color palettes

Plots with ggplot2

Scientific plots

Line plot

Barplot with lines

Tiny little helpers

Descriptive statistics and formatting

Missing data

Handling date and time

Clinical calculators

CKD-EPI eGFR 2021

Revised Schwartz Equation 2009

OPTN KDRI

UK KDRI

UK DCD Risk Score

Statistical models

Computing information

Plots with `ggplot2`