Overview

In this tutorial we will go through the basic steps of importing blood metabolomics data into R, joining them with phenotypes or endpoint data, and performing basic epidemiological analysis. The tutorial uses simulated demo datasets from Nightingale’s NMR platform.

Prerequisites

This tutorial uses R, which is free, compiles and runs in most operating systems. If you don’t have it already installed in your system, you may install it by following the instructions here. Additionally, we recommend using the RStudio IDE (Integrated Development Environment), which can be downloaded and installed following the instructions here for the free “RStudio Desktop (Open Source License)”.

You should be able to reproduce easily the code in this tutorial, even with very little knowledge in programming. If you have never programmed in R before and are intrested to learn more, you may find the following book useful, Hands on Programming with R by Garrett Grolemund.

Throughout the tutorial, we will be using the collection of tidyverse R packages and try to abide by their philosophy. You can install the complete tidyverse by typing the following line of code in the R console:

We also note the usage of the pipe %>% symbol throughout the code of this tutorial. The pipe is a way to write a series of operations on an R object, e.g. a dataset, in an easy-to-read way. As an example, the operation x %>% f(y) effectivly means f(x, y). You can read more on the pipe here.

Overview of Nightingale blood biomarker data

The NMR biomarker concentrations are provided in xlsx, csv and tsv formats. R can read either but in this tutorial we will use the csv format.

For the purposes of this tutorial we will use an example csv file. Download the zipped folder here and unzip it. It contains two files, one with metabolomics data and one with corresponding clinical data that we will use later on.

Type the commands below in your R session to load the metabolomics data and view the available biomarkers. Mind to put the correct path in the file parameter below.

Print the names of all the variables in the dataset for inspection.

names(df_nmr_results)
#>   [1] "Sample_id"                       "Unidentified_small_molecule_c"  
#>   [3] "Unidentified_macromolecules"     "Aminocaproic_acid"              
#>   [5] "Below_limit_of_quantification"   "Citrate_plasma"                 
#>   [7] "Low_protein"                     "Low_ethanol"                    
#>   [9] "Low_glutamine_or_high_glutamate" "Gluconolactone"                 
#>  [11] "High_ethanol"                    "High_lactate"                   
#>  [13] "High_pyruvate"                   "Isopropyl_alcohol"              
#>  [15] "Low_glucose"                     "Medium_ethanol"                 
#>  [17] "EDTA_plasma"                     "Polysaccharides"                
#>  [19] "N_methyl_2_pyrrolidone"          "Unexpected_amino_acid_signals"  
#>  [21] "Unidentified_small_molecule_a"   "Unidentified_small_molecule_b"  
#>  [23] "Total_C"                         "non_HDL_C"                      
#>  [25] "Remnant_C"                       "VLDL_C"                         
#>  [27] "Clinical_LDL_C"                  "LDL_C"                          
#>  [29] "HDL_C"                           "Total_TG"                       
#>  [31] "VLDL_TG"                         "LDL_TG"                         
#>  [33] "HDL_TG"                          "Total_PL"                       
#>  [35] "VLDL_PL"                         "LDL_PL"                         
#>  [37] "HDL_PL"                          "Total_CE"                       
#>  [39] "VLDL_CE"                         "LDL_CE"                         
#>  [41] "HDL_CE"                          "Total_FC"                       
#>  [43] "VLDL_FC"                         "LDL_FC"                         
#>  [45] "HDL_FC"                          "Total_L"                        
#>  [47] "VLDL_L"                          "LDL_L"                          
#>  [49] "HDL_L"                           "Total_P"                        
#>  [51] "VLDL_P"                          "LDL_P"                          
#>  [53] "HDL_P"                           "VLDL_size"                      
#>  [55] "LDL_size"                        "HDL_size"                       
#>  [57] "Phosphoglyc"                     "TG_by_PG"                       
#>  [59] "Cholines"                        "Phosphatidylc"                  
#>  [61] "Sphingomyelins"                  "ApoB"                           
#>  [63] "ApoA1"                           "ApoB_by_ApoA1"                  
#>  [65] "Total_FA"                        "Unsaturation"                   
#>  [67] "Omega_3"                         "Omega_6"                        
#>  [69] "PUFA"                            "MUFA"                           
#>  [71] "SFA"                             "LA"                             
#>  [73] "DHA"                             "Omega_3_pct"                    
#>  [75] "Omega_6_pct"                     "PUFA_pct"                       
#>  [77] "MUFA_pct"                        "SFA_pct"                        
#>  [79] "LA_pct"                          "DHA_pct"                        
#>  [81] "PUFA_by_MUFA"                    "Omega_6_by_Omega_3"             
#>  [83] "Ala"                             "Gln"                            
#>  [85] "Gly"                             "His"                            
#>  [87] "Total_BCAA"                      "Ile"                            
#>  [89] "Leu"                             "Val"                            
#>  [91] "Phe"                             "Tyr"                            
#>  [93] "Glucose"                         "Lactate"                        
#>  [95] "Pyruvate"                        "Citrate"                        
#>  [97] "Glycerol"                        "bOHbutyrate"                    
#>  [99] "Acetate"                         "Acetoacetate"                   
#> [101] "Acetone"                         "Creatinine"                     
#> [103] "Albumin"                         "GlycA"                          
#> [105] "XXL_VLDL_P"                      "XXL_VLDL_L"                     
#> [107] "XXL_VLDL_PL"                     "XXL_VLDL_C"                     
#> [109] "XXL_VLDL_CE"                     "XXL_VLDL_FC"                    
#> [111] "XXL_VLDL_TG"                     "XL_VLDL_P"                      
#> [113] "XL_VLDL_L"                       "XL_VLDL_PL"                     
#> [115] "XL_VLDL_C"                       "XL_VLDL_CE"                     
#> [117] "XL_VLDL_FC"                      "XL_VLDL_TG"                     
#> [119] "L_VLDL_P"                        "L_VLDL_L"                       
#> [121] "L_VLDL_PL"                       "L_VLDL_C"                       
#> [123] "L_VLDL_CE"                       "L_VLDL_FC"                      
#> [125] "L_VLDL_TG"                       "M_VLDL_P"                       
#> [127] "M_VLDL_L"                        "M_VLDL_PL"                      
#> [129] "M_VLDL_C"                        "M_VLDL_CE"                      
#> [131] "M_VLDL_FC"                       "M_VLDL_TG"                      
#> [133] "S_VLDL_P"                        "S_VLDL_L"                       
#> [135] "S_VLDL_PL"                       "S_VLDL_C"                       
#> [137] "S_VLDL_CE"                       "S_VLDL_FC"                      
#> [139] "S_VLDL_TG"                       "XS_VLDL_P"                      
#> [141] "XS_VLDL_L"                       "XS_VLDL_PL"                     
#> [143] "XS_VLDL_C"                       "XS_VLDL_CE"                     
#> [145] "XS_VLDL_FC"                      "XS_VLDL_TG"                     
#> [147] "IDL_P"                           "IDL_L"                          
#> [149] "IDL_PL"                          "IDL_C"                          
#> [151] "IDL_CE"                          "IDL_FC"                         
#> [153] "IDL_TG"                          "L_LDL_P"                        
#> [155] "L_LDL_L"                         "L_LDL_PL"                       
#> [157] "L_LDL_C"                         "L_LDL_CE"                       
#> [159] "L_LDL_FC"                        "L_LDL_TG"                       
#> [161] "M_LDL_P"                         "M_LDL_L"                        
#> [163] "M_LDL_PL"                        "M_LDL_C"                        
#> [165] "M_LDL_CE"                        "M_LDL_FC"                       
#> [167] "M_LDL_TG"                        "S_LDL_P"                        
#> [169] "S_LDL_L"                         "S_LDL_PL"                       
#> [171] "S_LDL_C"                         "S_LDL_CE"                       
#> [173] "S_LDL_FC"                        "S_LDL_TG"                       
#> [175] "XL_HDL_P"                        "XL_HDL_L"                       
#> [177] "XL_HDL_PL"                       "XL_HDL_C"                       
#> [179] "XL_HDL_CE"                       "XL_HDL_FC"                      
#> [181] "XL_HDL_TG"                       "L_HDL_P"                        
#> [183] "L_HDL_L"                         "L_HDL_PL"                       
#> [185] "L_HDL_C"                         "L_HDL_CE"                       
#> [187] "L_HDL_FC"                        "L_HDL_TG"                       
#> [189] "M_HDL_P"                         "M_HDL_L"                        
#> [191] "M_HDL_PL"                        "M_HDL_C"                        
#> [193] "M_HDL_CE"                        "M_HDL_FC"                       
#> [195] "M_HDL_TG"                        "S_HDL_P"                        
#> [197] "S_HDL_L"                         "S_HDL_PL"                       
#> [199] "S_HDL_C"                         "S_HDL_CE"                       
#> [201] "S_HDL_FC"                        "S_HDL_TG"                       
#> [203] "XXL_VLDL_PL_pct"                 "XXL_VLDL_C_pct"                 
#> [205] "XXL_VLDL_CE_pct"                 "XXL_VLDL_FC_pct"                
#> [207] "XXL_VLDL_TG_pct"                 "XL_VLDL_PL_pct"                 
#> [209] "XL_VLDL_C_pct"                   "XL_VLDL_CE_pct"                 
#> [211] "XL_VLDL_FC_pct"                  "XL_VLDL_TG_pct"                 
#> [213] "L_VLDL_PL_pct"                   "L_VLDL_C_pct"                   
#> [215] "L_VLDL_CE_pct"                   "L_VLDL_FC_pct"                  
#> [217] "L_VLDL_TG_pct"                   "M_VLDL_PL_pct"                  
#> [219] "M_VLDL_C_pct"                    "M_VLDL_CE_pct"                  
#> [221] "M_VLDL_FC_pct"                   "M_VLDL_TG_pct"                  
#> [223] "S_VLDL_PL_pct"                   "S_VLDL_C_pct"                   
#> [225] "S_VLDL_CE_pct"                   "S_VLDL_FC_pct"                  
#> [227] "S_VLDL_TG_pct"                   "XS_VLDL_PL_pct"                 
#> [229] "XS_VLDL_C_pct"                   "XS_VLDL_CE_pct"                 
#> [231] "XS_VLDL_FC_pct"                  "XS_VLDL_TG_pct"                 
#> [233] "IDL_PL_pct"                      "IDL_C_pct"                      
#> [235] "IDL_CE_pct"                      "IDL_FC_pct"                     
#> [237] "IDL_TG_pct"                      "L_LDL_PL_pct"                   
#> [239] "L_LDL_C_pct"                     "L_LDL_CE_pct"                   
#> [241] "L_LDL_FC_pct"                    "L_LDL_TG_pct"                   
#> [243] "M_LDL_PL_pct"                    "M_LDL_C_pct"                    
#> [245] "M_LDL_CE_pct"                    "M_LDL_FC_pct"                   
#> [247] "M_LDL_TG_pct"                    "S_LDL_PL_pct"                   
#> [249] "S_LDL_C_pct"                     "S_LDL_CE_pct"                   
#> [251] "S_LDL_FC_pct"                    "S_LDL_TG_pct"                   
#> [253] "XL_HDL_PL_pct"                   "XL_HDL_C_pct"                   
#> [255] "XL_HDL_CE_pct"                   "XL_HDL_FC_pct"                  
#> [257] "XL_HDL_TG_pct"                   "L_HDL_PL_pct"                   
#> [259] "L_HDL_C_pct"                     "L_HDL_CE_pct"                   
#> [261] "L_HDL_FC_pct"                    "L_HDL_TG_pct"                   
#> [263] "M_HDL_PL_pct"                    "M_HDL_C_pct"                    
#> [265] "M_HDL_CE_pct"                    "M_HDL_FC_pct"                   
#> [267] "M_HDL_TG_pct"                    "S_HDL_PL_pct"                   
#> [269] "S_HDL_C_pct"                     "S_HDL_CE_pct"                   
#> [271] "S_HDL_FC_pct"                    "S_HDL_TG_pct"

The first column, Sample_id, contains the sample identifiers. Columns 2 to 19 contain information on different tags that a sample may contain. Explanation on any tags present in an actual result file is given along with the analysis report but here all possible tag columns are provided for demonstarational purposes. The rest of the columns correspond to machine readable abbreviations of the NMR-quantified blood biomarkers.

Let us drop the tag columns as they are not used in this demo.

Rename alternative biomarker names

If you have a Nightingale result file with alternative biomarker names, for example XXL-VLDL-C_% instead of XXL_VLDL_C_pct or LA/FA instead of LA_pct, you may utilize the variable alternative_ids in the ggforestplot::df_NG_biomarker_metadata dataset to convert from one type to the other. An example is provided below, where we transform a dataset with alternative biomarker name (as long as they are found in the ggforestplot::df_NG_biomarker_metadata$alternative_ids list) to their ggforestplot::df_NG_biomarker_metadata$machine_readable_name.

Association analysis

Join the blood biomarker data with other variables

Next we join the blood biomarker data above with other variables we may have, such as patient basic information (e.g. gender, age, e.t.c.) or patient outcomes (e.g. a disease event).

The second dataset you downloaded earlier (from here), called clinical_data.csv, has additional to the NMR information corresponding to the same fictional individuals. It has information on gender, age, body mass index (BMI) as well as incident T2D diabetes.

We will join the two datasets using the IDs, i.e. column Sample_id in the NMR data and column identifier in the clinical data. Except for the columns having different names in the two files, the IDs themselves are not the same either (which is also often the case in real life). Sample_id in the NMR is a character column with the prefix ResearchInstitutionProjectLabId_ while identifier in clinical data is a character consisting of 4 numbers.

Below we read the clinical data and join the two datasets after a few adjustments.

Linear regression

We will first plot the distribution for a specific biomarker to inspect the data more closely. We choose glycoprotein acetyls (abbrev. GlycA) which reflects the amount of N-acetyl groups in circulating glycoproteins, many of which are involved in the acute-phase inflammatory response.

We first add a column to the dataset, marking the subjects as obese or not-obese. We then plot the GlycA distributions for the two sets as boxplots.

GlycA distribution for obese and not-obese subjects

GlycA distribution for obese and not-obese subjects

In the box plot above, the lower, middle and higher hinges correspond to 25%, 50% and 75% quantiles, respectively. Therefore, we see that the GlycA distribution for obese subjects is clearly shifted to higher values as compared to non-obese subjects, indicating that this marker is likely positively associated with obesity.

We will now estimate associations of each biomarker to BMI via linear regression:

\[\begin{align*} y = \beta * x + \alpha \end{align*}\]

where the blood biomarker is the outcome, \(y\), and BMI is the exposure \(x\). The association of \(x\) with \(y\) refers to the beta coefficient (\(\beta\)).

Below, we use function ggforestplot::discovery_regression(), with input parameter model set to 'lm' to estimate multiple, adjusted, linear associations, for all the biomarkers. Note that we scale the biomarkers so that the magnitude of associations are directly comparable across the different biomarkers. Log-transformation is done in order to satisfy the assumption of linear regression for normal distribution of the residuals.

# Extract names of NMR biomarkers
nmr_biomarkers <- dplyr::intersect(
  ggforestplot::df_NG_biomarker_metadata$machine_readable_name,
  colnames(df_nmr_results)
)

# NMR biomarkers here should be 250
stopifnot(length(nmr_biomarkers) == 250)

# Select only variables to be used for the model and collapse to a long data 
# format
df_long <-
  df_full_data %>%
  # Select only model variables
  dplyr::select(tidyselect::all_of(nmr_biomarkers), gender, baseline_age, BMI) %>%
  # Log-transform and scale biomarkers
  dplyr::mutate_at(
    .vars = dplyr::vars(tidyselect::all_of(nmr_biomarkers)),
    .funs = ~ .x %>% log1p() %>% scale %>% as.numeric()
  ) %>%
  # Collapse to a long format
  tidyr::gather(
    key = biomarkerid,
    value = biomarkervalue,
    tidyselect::all_of(nmr_biomarkers)
  )

# Estimate sex- and age-adjusted associations of metabolite to BMI
df_assoc_per_biomarker_bmi <-
  ggforestplot::discovery_regression(
    df_long = df_long,
    model = "lm",
    formula =
      formula(
       biomarkervalue ~ BMI + factor(gender) + baseline_age
      ),
    key = biomarkerid,
    predictor = BMI
  ) %>% 
  # Join this dataset with the grouping data in order to choose a different 
  # biomarker naming option
  left_join(
    select(
      df_NG_biomarker_metadata, 
      name,
      biomarkerid = machine_readable_name
    ), 
    by = "biomarkerid")

head(df_assoc_per_biomarker_bmi)
#> # A tibble: 6 x 5
#>   biomarkerid    estimate      se   pvalue name          
#>   <chr>             <dbl>   <dbl>    <dbl> <chr>         
#> 1 Total_C          0.0169 0.00544 1.91e- 3 Total-C       
#> 2 non_HDL_C        0.0293 0.00536 5.01e- 8 non-HDL-C     
#> 3 Remnant_C        0.0303 0.00535 1.79e- 8 Remnant-C     
#> 4 VLDL_C           0.0428 0.00526 7.63e-16 VLDL-C        
#> 5 Clinical_LDL_C   0.0251 0.00541 3.71e- 6 Clinical LDL-C
#> 6 LDL_C            0.0277 0.00539 2.94e- 7 LDL-C

The data frame df_assoc_per_biomarker_bmi above, contains 250 rows, one for each blood biomarker, and 5 variables (columns): the biomarkerid (same as variable machine_readable_name in df_NG_biomarker_metadata) which has the biomarker abbreviations in the same format as in the input csv file, the biomarker name which is a bit more descriptive than biomarkerid, and the variables estimate, se and pvalue that correspond to the linear regression coefficient \(\beta\), the standard error and the p-value, respectively.

Forest plot

Let’s now visualize the results. For the purposes of this demo, we will plot a selected subset of the biomarkers in a forestplot layout. At the end of the tutorial, you will find an example of how to plot associations for all Nightingale biomarkers in a pdf (see also vignette("ggforestplot") for more details on the usage of ggforestplot::forestplot()).

The ggforestplot package includes a data frame, called ggforestplot::df_NG_biomarker_metadata, with metadata on the Nightingale blood biomarkers, such as different naming options, descriptions and group/sugroup information. We will use the group (or sugroup) information in this data frame to decide which groups of biomarkers and in what order we want to visualize here.

# Display blood biomarker groups
ggforestplot::df_NG_biomarker_metadata %>% 
  pull(group) %>% 
  unique()
#>  [1] "Cholesterol"                              
#>  [2] "Triglycerides"                            
#>  [3] "Phospholipids"                            
#>  [4] "Cholesteryl esters"                       
#>  [5] "Free cholesterol"                         
#>  [6] "Total lipids"                             
#>  [7] "Lipoprotein particle concentrations"      
#>  [8] "Lipoprotein particle sizes"               
#>  [9] "Other lipids"                             
#> [10] "Apolipoproteins"                          
#> [11] "Fatty acids"                              
#> [12] "Amino acids"                              
#> [13] "Glycolysis related metabolites"           
#> [14] "Ketone bodies"                            
#> [15] "Fluid balance"                            
#> [16] "Inflammation"                             
#> [17] "Lipoprotein subclasses"                   
#> [18] "Relative lipoprotein lipid concentrations"

# # You may wish, alternatively, to display blood biomarker subgroups and plot 
# # according to that. You can see which subgroups are available in the 
# # grouping data by uncommenting the following lines. In several cases, subgroup
# # is identical to group. 
# ggforestplot::df_NG_biomarker_metadata %>% 
#   pull(subgroup) %>% 
#   unique()

# Choose the groups you want to plot and define the order with which group 
# categories will appear in the plot
group_order <- c(
  "Branched-chain amino acids",
  "Aromatic amino acids",
  "Amino acids",
  "Fluid balance",
  "Inflammation",
  "Fatty acids",
  "Triglycerides",
  "Other lipids",
  "Cholesterol"
)
  
# Extract a subset of the df_NG_biomarker_metadata, with the desired group
# order, to set the order of biomarkers in the forestplot later on
df_with_groups <- 
  ggforestplot::df_NG_biomarker_metadata %>% 
  # Select subset of variables
  select(name = name,
         group) %>% 
  # Filter and arrange for the wanted groups
  filter(group %in% group_order) %>%
  arrange(factor(group, levels = group_order))
Statistical significance for multiple testing

We’d also like to add to the forrestplot a Bonferroni correction to account for multiple testing. Here we have a comparison of 250 biomarkers. If we suppose the common significance threshold \(\alpha = 0.05\), the Bonferroni correction for each individual hypothesis, assuming the 250 tests are independent, is \(\alpha = 0.05 / 250 \approx 0.0002\). However, for biologically correlated measures this correction is too strict; Here, instead of correcting for the total number of biomarkers we will use the number of principal components that explain 99% of the variance in the data.

Therefore, our significance threshold here will be \(\alpha = 0.05 / 41 \approx 0.001\)

Below we plot the results. In the forestplot function, our main input is the data frame df_assoc_per_biomarker_bmi, while we use the previously constructed data frame df_with_groups and its variable group to impose the grouping and the order of biomarkers in the plot. For the statistical significance, we input the significance threshold as the parameter psignif = 0.001 and we define explicitly the variable name in df_assoc_per_biomarker_bmi that contains the p-values of the linear regression, in this case the column pvalue.

The figure shows that BMI is significantly associated with several of the plotted blood biomarkers. Specifically, higher values of branched chain amino acids, aromatic amino acids, omega-6 and glycoprotein acetylation are significantly associated with obesity, as has been previously seen in Würtz P et al. Metabolic Signatures of Adiposity in Young Adults: Mendelian Randomization Analysis and Effects of Weight Change., PLoS Med. 2014.

Cox proportional hazards regression

Similar analysis may be performed for other phenotypic traits. Two other very common regression approaches in epidemiology are logistic regression for studying the association to binary outcomes and Cox proportional hazards for time-to-event outcomes.

In similar spirit as above, we utilize ggforestplot::discovery_regression() to demonstrate an example of fitting Cox proportional hazards in all of the biomarkers to estimate the gender-, age- and BMI- adjusted hazard ratios for type 2 diabetes.

# Select only variables to be used for the model and collapse to a long data 
# format
df_long <-
  df_full_data %>%
  # Select only model variables
  dplyr::select(tidyselect::all_of(nmr_biomarkers), gender, baseline_age, BMI, age_at_diabetes, incident_diabetes) %>%
  # Log-transform and scale biomarkers
  dplyr::mutate_at(
    .vars = dplyr::vars(tidyselect::all_of(nmr_biomarkers)),
    .funs = ~ .x %>% log1p() %>% scale %>% as.numeric()
  ) %>%
  # Collapse to a long format
  tidyr::gather(
    key = biomarkerid,
    value = biomarkervalue,
    tidyselect::all_of(nmr_biomarkers)
  )

# Estimate sex-adjusted associations of metabolite to T2D
# Notice that parameter model below is set to 'coxph'
df_assoc_per_biomarker_diab <-
  discovery_regression(
    df_long = df_long, 
    model = "coxph",
    formula =
      formula(
        survival::Surv(
          time = baseline_age,
          time2 = age_at_diabetes,
          event = incident_diabetes
        ) ~ biomarkervalue + factor(gender) + BMI
      ),
    key = biomarkerid,
    predictor = biomarkervalue
  ) %>% 
  # Join this dataset with the grouping data in order to choose a different 
  # biomarker naming option
  left_join(
    select(
      df_NG_biomarker_metadata, 
      name,
      biomarkerid = machine_readable_name
    ), 
    by = "biomarkerid")

head(df_assoc_per_biomarker_diab)
#> # A tibble: 6 x 5
#>   biomarkerid    estimate     se  pvalue name          
#>   <chr>             <dbl>  <dbl>   <dbl> <chr>         
#> 1 Total_C          0.0161 0.0690 0.815   Total-C       
#> 2 non_HDL_C        0.0524 0.0715 0.464   non-HDL-C     
#> 3 Remnant_C        0.0876 0.0708 0.216   Remnant-C     
#> 4 VLDL_C           0.238  0.0725 0.00101 VLDL-C        
#> 5 Clinical_LDL_C  -0.0229 0.0700 0.744   Clinical LDL-C
#> 6 LDL_C            0.0207 0.0707 0.770   LDL-C

Plot results for fatty acids.

Notice that the main difference in the latter forestplot as compared to the forestplot for linear associations above, is parameter logodds = TRUE. In this case, the beta values, here log hazard ratios, are exponentiated inside the forestplot to obtain hazard ratios and plotted in a logarithmic scale. The confidence intervals for the odds ratios are estimated using the standard errors of the log hazard ratios, as follows \(\text{CI}_\text{low} = \exp(\beta - 1.96 * \text{SE})\) and \(\text{CI}_\text{high} = \exp(\beta + 1.96 * \text{SE})\). If you wish to use some confidence interval other than the default 95%, you may specify this in the parameter ci of the function.

A forestplot for all Nightingale blood biomarkers

Finally, the ggforestplot package provides a custom function that plots and prints all Nightingale blood biomarkers in a single pdf file, called plot_all_NG_biomarkers(). There are two layouts available for 2016 and 2020 versions of the biomarker platform. Alternatively, one can also input a custom layout. Its usage is straightforward.

You may view the output of this function here.

Final remarks

The purpose of this demo is to familiarize the user with the Nightingale Health biomarker dataset and to showcase simple association analysis using linear and Cox proportional hazards regression.

Further details on data analysis approaches may be found in the following example publications.