TLDR

Load packages

##update/install MLHO
devtools::install_github("clai-group/mlho")
Downloading GitHub repo clai-group/mlho@HEAD
globals (0.17.0 -> 0.18.0) [CRAN]
future  (1.40.0 -> 1.49.0) [CRAN]
Installing 2 packages: globals, future

The downloaded binary packages are in
    /var/folders/tk/fzb3c9wj2zn6bztrz6kd_mxr0000gq/T//RtmpSnxDFX/downloaded_packages
── R CMD build ─────────────────────────────────────────────────────────────────
* checking for file ‘/private/var/folders/tk/fzb3c9wj2zn6bztrz6kd_mxr0000gq/T/RtmpSnxDFX/remotes106af12ff0e2e/clai-group-MLHO-44e016f/DESCRIPTION’ ... OK
* preparing ‘mlho’:
* checking DESCRIPTION meta-information ... OK
* checking for LF line-endings in source and make files and shell scripts
* checking for empty or unneeded directories
Removed empty directory ‘mlho/entropy’
Removed empty directory ‘mlho/results’
  NB: this package now depends on R (>= 3.5.0)
  WARNING: Added dependency on R >= 3.5.0 because serialized objects in
  serialize/load version 3 cannot be read in older versions of R.
  File(s) containing such objects:
    ‘mlho/data/incident_data.RData’ ‘mlho/data/pHE_map.RData’
    ‘mlho/data/syntheticmass.RData’
* building ‘mlho_0.1.1.tar.gz’
# load MLHO, afterwards source the MSMR.lite.R file to overwrite the MSMR.lite function
# in the package with the updated one (the encounter functionality is only available in the R file)
library(mlho)

#load and install required dependecies
pacman::p_load(data.table, devtools, backports, Hmisc, tidyr,dplyr,ggplot2,plyr,scales,readr, httr, DT, lubridate, DALEX, tidyverse,reshape2,foreach,doParallel,caret,gbm,lubridate,praznik)
library(counterfactuals)
Warning: package 'counterfactuals' was built under R version 4.3.3
library(iml)

Prepare the data

We load several datasets from the MLHO package, including incident data and demographic information.

dbmart consists of patient ID (patient_num) and associated phenotypes (phenx). Each patient can have multiple features, including different diagnostic events or conditions.

labelDT includes patient ID (patient_num), the start date of each event (start_date), and a binary label (label) indicating the outcome of interest.

dems contains dempgraphic information for each patient.

dbmart <- mlho::incident_dbmart
labelDT <- mlho::incident_labeldt
labelDT <- labelDT %>% mutate(o_date=case_when(label == 0 ~ start_date, label == 1 ~ start_date+sample(1:40,1)))
labelDT$start_date= as.Date(labelDT$start_date)
dems <- mlho::incident_dems
head(dbmart)
# A tibble: 6 × 4
  patient_num                          phenx     DESCRIPTION          start_date
  <chr>                                <chr>     <chr>                <date>    
1 478a4846-0ae6-4ec8-8155-019708911526 76601001  Intramuscular injec… 2019-08-24
2 478a4846-0ae6-4ec8-8155-019708911526 76601001  Intramuscular injec… 2019-11-23
3 478a4846-0ae6-4ec8-8155-019708911526 76601001  Intramuscular injec… 2020-02-22
4 478a4846-0ae6-4ec8-8155-019708911526 261352009 Face mask (physical… 2020-03-11
5 478a4846-0ae6-4ec8-8155-019708911526 65200003  Insertion of intrau… 2020-05-01
6 478a4846-0ae6-4ec8-8155-019708911526 76601001  Intramuscular injec… 2020-05-23
head(labelDT)
# A tibble: 6 × 4
  patient_num                          start_date label o_date    
  <chr>                                <date>     <dbl> <date>    
1 478a4846-0ae6-4ec8-8155-019708911526 2019-08-24     1 2019-09-04
2 478a4846-0ae6-4ec8-8155-019708911526 2019-11-23     1 2019-12-04
3 478a4846-0ae6-4ec8-8155-019708911526 2020-02-22     1 2020-03-04
4 478a4846-0ae6-4ec8-8155-019708911526 2020-03-11     0 2020-03-11
5 478a4846-0ae6-4ec8-8155-019708911526 2020-05-01     0 2020-05-01
6 478a4846-0ae6-4ec8-8155-019708911526 2020-05-23     1 2020-06-03

Splitting data into training and testing sets using a 70-30 ratio

We extract a unique list of “patient_num” from dbmart. Using the list of unique patient ID, we randomly select 30% of these patients to include in our test set.

uniqpats <- c(as.character(unique(dbmart$patient_num)))


test_ind <- sample(uniqpats,
                   round(.3*length(uniqpats)))

Transform train data

After splitting the data into training and testing sets, the next step is to transform the data to ensure that the data aligns with the requirements of the modeling functions in the MLHO package.

dat.train  <- subset(dbmart,!(dbmart$patient_num %in% c(test_ind)))
data.table::setDT(dat.train)
#values must be in column named value
dat.train[,value := 1]
uniqpats.train <- c(as.character(unique(dat.train$patient_num)))

We use the MSMR.me function from the MLHO package to perform a series of transformations and feature selections by labeling each feature by categorizing them to “history”, “past”, and “last”. The parameters include options for sparsity, use of the joint mutual information criterion (jmi), the number of top features to select (topn), and others that influence how the data is processed and analyzed.

The figure above explains how the features are labeled in MSMR.me. Each patient’s data is filtered to process medical encounters sequentially. Events before the first encounter are labeled as “history.” For each encounter, data from the current to the last encounter are labeled as “last,” and if there is a previous encounter, data from the last encounter to the previous one are labeled as “past.” The buffer parameter gives flexibility to add a time interval within the infection period. For example, a buffer can be the 14 days of COVID-19 infection by each time label.

Once the data is labeled, all labels (history, past, last) are merged and reformatted into a wide format, where each patient row summarizes counts of each label.

MLHO.dat <- dat.train
labels = labelDT
patients <- uniqpats.train
binarize=T
sparsity=0.05 ## Sample size * sparsity. Don't pick a too small value to avoid overfitting
jmi=TRUE
topn=50
patients <- uniqpats.train
multicore=T
encounterLevel=T
valuesToMerge = F
timeBufffer=c(h=0,p=0,l=0,o=-30)
dat.train <- MSMR.me(MLHO.dat,
                   labels,
                   binarize,
                   sparsity,
                   jmi,
                   topn,
                   patients <- uniqpats.train,
                   multicore=FALSE,
                   encounterLevel=TRUE,
                   valuesToMerge = TRUE,
                   timeBufffer)
[1] "step - 1: sparsity screening!"
[1] "Applying encounter based transformations!"
[1] "step 2: JMI dimensionality reduction!"
rank = dat.train$rank
dat.train = dat.train$AVR

Transform test data

We repeat the data processing and transformation again on the test set.

dat.test <- subset(dbmart,dbmart$patient_num %in% c(test_ind))
uniqpats.test <- c(as.character(unique(dat.test$patient_num)))
# remove phenx not required to create the encounter based phenx 
# (remove _last, _past and _history from the colnames to determine the phenxs)
dat.train.colnames <- vapply(strsplit(colnames(dat.train),"_"),`[`, 1, FUN.VALUE=character(1))
dat.test <- subset(dat.test,dat.test$phenx %in% dat.train.colnames)
setDT(dat.test)
#values must be in column named value
dat.test$value <- 1

MLHO.dat.test = dat.test

# important to have a value and phenx column to merge
dat.test <- MSMR.me(MLHO.dat=dat.test,
                      patients = uniqpats.test,
                      sparsity=NA,
                      jmi = FALSE,
                      labels = labelDT,
                      encounterLevel = TRUE,
                      valuesToMerge = TRUE,
                      binarize = F,
                      timeBufffer)
[1] "Applying encounter based transformations!"
# remove sparse and not relevant _past, _last _history phenx according to the train data
dat.test <- dat.test %>% select(one_of(colnames(dat.train)))

Update demographics and labels Data

The dems dataset, which contains demographic information, is updated to include relevant labels from labelDT. This integration involves merging both datasets by “patient_num”, then modifying the “patient_num” to include the “start_date” for a unique identifier per patient encounter.

dems <- dems %>%
  merge(labelDT,by = "patient_num") %>%
  mutate(patient_num = paste0(patient_num,"_" ,start_date)) %>%
  select(-start_date, -label)

Similarly, labelDT is updated to concatenate “patient_num” with “start_date” to create a unique identifier for each patient’s encounter, which simplifies subsequent merging and data handling processes. The “start_date” column is then removed to clean up the dataset:

# merge patientnum and encounter date in labelDT
labelDT <- labelDT %>%
  mutate(patient_num = paste0(patient_num,"_" ,start_date))  %>%
  select(-start_date)

Train model

We use the mlearn function to do the modeling, which includes training the model and testing it on the test set.

model.test <- mlearn(dat.train,
                     dat.test,
                     dems=NULL,
                     save.model=FALSE,
                     classifier="gbm",
                     note="mlho_test_run",
                     cv="cv",
                     nfold=5,
                     aoi="random phenx from dbmart",
                     multicore=FALSE,
                     calSHAP = T,
                     counterfactual = T,
                     save.model.counterfactual = F)
[1] "the modeling!"
Warning in train.default(x, y, weights = w, ...): The metric "Accuracy" was not
in the result set. ROC will be used instead.
Iter   TrainDeviance   ValidDeviance   StepSize   Improve
     1        0.9510             nan     0.1000    0.0134
     2        0.9286             nan     0.1000    0.0104
     3        0.9123             nan     0.1000    0.0082
     4        0.8977             nan     0.1000    0.0066
     5        0.8868             nan     0.1000    0.0053
     6        0.8779             nan     0.1000    0.0042
     7        0.8611             nan     0.1000    0.0070
     8        0.8536             nan     0.1000    0.0037
     9        0.8464             nan     0.1000    0.0032
    10        0.8385             nan     0.1000    0.0025
    20        0.7875             nan     0.1000    0.0022
    40        0.7366             nan     0.1000    0.0008
    60        0.7091             nan     0.1000    0.0003
    80        0.6914             nan     0.1000    0.0002
   100        0.6796             nan     0.1000   -0.0001
   120        0.6701             nan     0.1000   -0.0000
   140        0.6627             nan     0.1000    0.0000
   150        0.6604             nan     0.1000   -0.0002

Iter   TrainDeviance   ValidDeviance   StepSize   Improve
     1        0.9397             nan     0.1000    0.0203
     2        0.9078             nan     0.1000    0.0166
     3        0.8843             nan     0.1000    0.0126
     4        0.8622             nan     0.1000    0.0102
     5        0.8451             nan     0.1000    0.0074
     6        0.8301             nan     0.1000    0.0076
     7        0.8150             nan     0.1000    0.0070
     8        0.8034             nan     0.1000    0.0060
     9        0.7939             nan     0.1000    0.0047
    10        0.7851             nan     0.1000    0.0044
    20        0.7269             nan     0.1000    0.0019
    40        0.6744             nan     0.1000    0.0005
    60        0.6482             nan     0.1000   -0.0001
    80        0.6332             nan     0.1000   -0.0002
   100        0.6214             nan     0.1000   -0.0002
   120        0.6157             nan     0.1000   -0.0001
   140        0.6104             nan     0.1000   -0.0002
   150        0.6090             nan     0.1000   -0.0000

Iter   TrainDeviance   ValidDeviance   StepSize   Improve
     1        0.9316             nan     0.1000    0.0226
     2        0.8948             nan     0.1000    0.0175
     3        0.8650             nan     0.1000    0.0154
     4        0.8404             nan     0.1000    0.0116
     5        0.8208             nan     0.1000    0.0097
     6        0.8018             nan     0.1000    0.0081
     7        0.7868             nan     0.1000    0.0070
     8        0.7729             nan     0.1000    0.0067
     9        0.7595             nan     0.1000    0.0062
    10        0.7469             nan     0.1000    0.0047
    20        0.6840             nan     0.1000    0.0018
    40        0.6372             nan     0.1000    0.0003
    60        0.6172             nan     0.1000    0.0002
    80        0.6052             nan     0.1000   -0.0001
   100        0.5977             nan     0.1000   -0.0001
   120        0.5912             nan     0.1000   -0.0002
   140        0.5865             nan     0.1000   -0.0003
   150        0.5841             nan     0.1000   -0.0001

Iter   TrainDeviance   ValidDeviance   StepSize   Improve
     1        0.9550             nan     0.1000    0.0126
     2        0.9350             nan     0.1000    0.0098
     3        0.9185             nan     0.1000    0.0077
     4        0.9057             nan     0.1000    0.0054
     5        0.8932             nan     0.1000    0.0065
     6        0.8821             nan     0.1000    0.0052
     7        0.8730             nan     0.1000    0.0042
     8        0.8654             nan     0.1000    0.0032
     9        0.8552             nan     0.1000    0.0051
    10        0.8487             nan     0.1000    0.0033
    20        0.7954             nan     0.1000    0.0024
    40        0.7412             nan     0.1000    0.0011
    60        0.7098             nan     0.1000    0.0004
    80        0.6897             nan     0.1000    0.0003
   100        0.6758             nan     0.1000   -0.0001
   120        0.6670             nan     0.1000    0.0002
   140        0.6588             nan     0.1000    0.0002
   150        0.6559             nan     0.1000   -0.0004

Iter   TrainDeviance   ValidDeviance   StepSize   Improve
     1        0.9353             nan     0.1000    0.0199
     2        0.9045             nan     0.1000    0.0141
     3        0.8804             nan     0.1000    0.0116
     4        0.8605             nan     0.1000    0.0096
     5        0.8444             nan     0.1000    0.0070
     6        0.8298             nan     0.1000    0.0062
     7        0.8164             nan     0.1000    0.0065
     8        0.8053             nan     0.1000    0.0050
     9        0.7952             nan     0.1000    0.0052
    10        0.7862             nan     0.1000    0.0045
    20        0.7325             nan     0.1000    0.0013
    40        0.6808             nan     0.1000    0.0005
    60        0.6518             nan     0.1000    0.0005
    80        0.6324             nan     0.1000    0.0002
   100        0.6218             nan     0.1000    0.0000
   120        0.6149             nan     0.1000   -0.0002
   140        0.6076             nan     0.1000   -0.0002
   150        0.6042             nan     0.1000   -0.0002

Iter   TrainDeviance   ValidDeviance   StepSize   Improve
     1        0.9323             nan     0.1000    0.0229
     2        0.8968             nan     0.1000    0.0169
     3        0.8676             nan     0.1000    0.0142
     4        0.8428             nan     0.1000    0.0119
     5        0.8206             nan     0.1000    0.0105
     6        0.8042             nan     0.1000    0.0080
     7        0.7902             nan     0.1000    0.0074
     8        0.7778             nan     0.1000    0.0052
     9        0.7671             nan     0.1000    0.0045
    10        0.7555             nan     0.1000    0.0054
    20        0.6904             nan     0.1000    0.0011
    40        0.6387             nan     0.1000    0.0001
    60        0.6169             nan     0.1000   -0.0001
    80        0.6001             nan     0.1000   -0.0001
   100        0.5920             nan     0.1000   -0.0002
   120        0.5863             nan     0.1000   -0.0000
   140        0.5806             nan     0.1000   -0.0003
   150        0.5794             nan     0.1000   -0.0003

Iter   TrainDeviance   ValidDeviance   StepSize   Improve
     1        0.9511             nan     0.1000    0.0134
     2        0.9296             nan     0.1000    0.0106
     3        0.9132             nan     0.1000    0.0084
     4        0.8998             nan     0.1000    0.0068
     5        0.8887             nan     0.1000    0.0054
     6        0.8761             nan     0.1000    0.0060
     7        0.8673             nan     0.1000    0.0046
     8        0.8568             nan     0.1000    0.0045
     9        0.8496             nan     0.1000    0.0037
    10        0.8423             nan     0.1000    0.0037
    20        0.7903             nan     0.1000    0.0017
    40        0.7357             nan     0.1000    0.0006
    60        0.7040             nan     0.1000    0.0007
    80        0.6862             nan     0.1000    0.0005
   100        0.6705             nan     0.1000    0.0003
   120        0.6606             nan     0.1000   -0.0001
   140        0.6535             nan     0.1000   -0.0000
   150        0.6508             nan     0.1000   -0.0000

Iter   TrainDeviance   ValidDeviance   StepSize   Improve
     1        0.9405             nan     0.1000    0.0205
     2        0.9103             nan     0.1000    0.0159
     3        0.8852             nan     0.1000    0.0123
     4        0.8631             nan     0.1000    0.0106
     5        0.8454             nan     0.1000    0.0085
     6        0.8291             nan     0.1000    0.0079
     7        0.8169             nan     0.1000    0.0049
     8        0.8046             nan     0.1000    0.0052
     9        0.7947             nan     0.1000    0.0049
    10        0.7854             nan     0.1000    0.0041
    20        0.7281             nan     0.1000    0.0015
    40        0.6689             nan     0.1000    0.0003
    60        0.6457             nan     0.1000   -0.0001
    80        0.6294             nan     0.1000   -0.0001
   100        0.6164             nan     0.1000   -0.0001
   120        0.6084             nan     0.1000   -0.0001
   140        0.6018             nan     0.1000   -0.0006
   150        0.5993             nan     0.1000   -0.0003

Iter   TrainDeviance   ValidDeviance   StepSize   Improve
     1        0.9348             nan     0.1000    0.0232
     2        0.8976             nan     0.1000    0.0181
     3        0.8665             nan     0.1000    0.0155
     4        0.8418             nan     0.1000    0.0114
     5        0.8198             nan     0.1000    0.0096
     6        0.8003             nan     0.1000    0.0087
     7        0.7837             nan     0.1000    0.0072
     8        0.7705             nan     0.1000    0.0059
     9        0.7576             nan     0.1000    0.0062
    10        0.7469             nan     0.1000    0.0048
    20        0.6837             nan     0.1000    0.0025
    40        0.6320             nan     0.1000    0.0003
    60        0.6095             nan     0.1000   -0.0001
    80        0.5977             nan     0.1000   -0.0001
   100        0.5861             nan     0.1000   -0.0001
   120        0.5800             nan     0.1000   -0.0001
   140        0.5756             nan     0.1000   -0.0002
   150        0.5740             nan     0.1000   -0.0002

Iter   TrainDeviance   ValidDeviance   StepSize   Improve
     1        0.9549             nan     0.1000    0.0124
     2        0.9343             nan     0.1000    0.0099
     3        0.9177             nan     0.1000    0.0075
     4        0.9060             nan     0.1000    0.0060
     5        0.8935             nan     0.1000    0.0061
     6        0.8851             nan     0.1000    0.0034
     7        0.8751             nan     0.1000    0.0053
     8        0.8659             nan     0.1000    0.0042
     9        0.8594             nan     0.1000    0.0025
    10        0.8521             nan     0.1000    0.0035
    20        0.7979             nan     0.1000    0.0024
    40        0.7413             nan     0.1000    0.0009
    60        0.7105             nan     0.1000    0.0008
    80        0.6901             nan     0.1000    0.0002
   100        0.6776             nan     0.1000    0.0000
   120        0.6663             nan     0.1000   -0.0001
   140        0.6594             nan     0.1000    0.0000
   150        0.6579             nan     0.1000   -0.0002

Iter   TrainDeviance   ValidDeviance   StepSize   Improve
     1        0.9418             nan     0.1000    0.0194
     2        0.9106             nan     0.1000    0.0148
     3        0.8857             nan     0.1000    0.0103
     4        0.8637             nan     0.1000    0.0102
     5        0.8475             nan     0.1000    0.0073
     6        0.8321             nan     0.1000    0.0079
     7        0.8185             nan     0.1000    0.0062
     8        0.8074             nan     0.1000    0.0050
     9        0.7961             nan     0.1000    0.0050
    10        0.7879             nan     0.1000    0.0034
    20        0.7331             nan     0.1000    0.0020
    40        0.6753             nan     0.1000    0.0004
    60        0.6445             nan     0.1000   -0.0002
    80        0.6280             nan     0.1000    0.0003
   100        0.6176             nan     0.1000   -0.0002
   120        0.6101             nan     0.1000    0.0000
   140        0.6043             nan     0.1000    0.0000
   150        0.6016             nan     0.1000   -0.0001

Iter   TrainDeviance   ValidDeviance   StepSize   Improve
     1        0.9334             nan     0.1000    0.0219
     2        0.8962             nan     0.1000    0.0175
     3        0.8658             nan     0.1000    0.0146
     4        0.8419             nan     0.1000    0.0117
     5        0.8233             nan     0.1000    0.0094
     6        0.8051             nan     0.1000    0.0085
     7        0.7887             nan     0.1000    0.0067
     8        0.7761             nan     0.1000    0.0058
     9        0.7654             nan     0.1000    0.0052
    10        0.7544             nan     0.1000    0.0047
    20        0.6865             nan     0.1000    0.0016
    40        0.6314             nan     0.1000    0.0014
    60        0.6079             nan     0.1000    0.0007
    80        0.5936             nan     0.1000    0.0008
   100        0.5841             nan     0.1000    0.0007
   120        0.5789             nan     0.1000   -0.0001
   140        0.5749             nan     0.1000   -0.0004
   150        0.5724             nan     0.1000   -0.0001

Iter   TrainDeviance   ValidDeviance   StepSize   Improve
     1        0.9538             nan     0.1000    0.0121
     2        0.9339             nan     0.1000    0.0092
     3        0.9191             nan     0.1000    0.0073
     4        0.9048             nan     0.1000    0.0059
     5        0.8922             nan     0.1000    0.0061
     6        0.8819             nan     0.1000    0.0049
     7        0.8731             nan     0.1000    0.0042
     8        0.8635             nan     0.1000    0.0040
     9        0.8567             nan     0.1000    0.0027
    10        0.8497             nan     0.1000    0.0034
    20        0.7960             nan     0.1000    0.0017
    40        0.7410             nan     0.1000    0.0006
    60        0.7096             nan     0.1000    0.0006
    80        0.6900             nan     0.1000    0.0003
   100        0.6767             nan     0.1000    0.0000
   120        0.6674             nan     0.1000   -0.0002
   140        0.6598             nan     0.1000   -0.0001
   150        0.6571             nan     0.1000    0.0000

Iter   TrainDeviance   ValidDeviance   StepSize   Improve
     1        0.9399             nan     0.1000    0.0193
     2        0.9120             nan     0.1000    0.0132
     3        0.8861             nan     0.1000    0.0133
     4        0.8643             nan     0.1000    0.0101
     5        0.8462             nan     0.1000    0.0089
     6        0.8313             nan     0.1000    0.0074
     7        0.8183             nan     0.1000    0.0062
     8        0.8067             nan     0.1000    0.0057
     9        0.7975             nan     0.1000    0.0043
    10        0.7884             nan     0.1000    0.0044
    20        0.7331             nan     0.1000    0.0019
    40        0.6824             nan     0.1000    0.0007
    60        0.6537             nan     0.1000    0.0003
    80        0.6338             nan     0.1000   -0.0000
   100        0.6236             nan     0.1000   -0.0002
   120        0.6143             nan     0.1000    0.0001
   140        0.6095             nan     0.1000    0.0001
   150        0.6039             nan     0.1000    0.0004

Iter   TrainDeviance   ValidDeviance   StepSize   Improve
     1        0.9324             nan     0.1000    0.0232
     2        0.8981             nan     0.1000    0.0175
     3        0.8693             nan     0.1000    0.0136
     4        0.8443             nan     0.1000    0.0124
     5        0.8241             nan     0.1000    0.0095
     6        0.8070             nan     0.1000    0.0090
     7        0.7921             nan     0.1000    0.0070
     8        0.7796             nan     0.1000    0.0054
     9        0.7661             nan     0.1000    0.0062
    10        0.7553             nan     0.1000    0.0050
    20        0.6895             nan     0.1000    0.0009
    40        0.6392             nan     0.1000    0.0007
    60        0.6185             nan     0.1000    0.0002
    80        0.6039             nan     0.1000    0.0001
   100        0.5938             nan     0.1000   -0.0002
   120        0.5877             nan     0.1000   -0.0000
   140        0.5834             nan     0.1000   -0.0001
   150        0.5818             nan     0.1000   -0.0002

Iter   TrainDeviance   ValidDeviance   StepSize   Improve
     1        0.9342             nan     0.1000    0.0227
     2        0.8990             nan     0.1000    0.0170
     3        0.8688             nan     0.1000    0.0149
     4        0.8425             nan     0.1000    0.0110
     5        0.8207             nan     0.1000    0.0109
     6        0.8023             nan     0.1000    0.0073
     7        0.7867             nan     0.1000    0.0076
     8        0.7726             nan     0.1000    0.0062
     9        0.7612             nan     0.1000    0.0055
    10        0.7513             nan     0.1000    0.0045
    20        0.6866             nan     0.1000    0.0015
    40        0.6432             nan     0.1000    0.0003
    60        0.6171             nan     0.1000   -0.0002
    80        0.6020             nan     0.1000   -0.0001
   100        0.5941             nan     0.1000    0.0001
   120        0.5899             nan     0.1000   -0.0003
   140        0.5849             nan     0.1000   -0.0001
   150        0.5837             nan     0.1000   -0.0003
Warning in train.default(x, y, weights = w, ...): The metric "Accuracy" was not
in the result set. ROC will be used instead.
Iter   TrainDeviance   ValidDeviance   StepSize   Improve
     1        0.9535             nan     0.1000    0.0122
     2        0.9353             nan     0.1000    0.0097
     3        0.9198             nan     0.1000    0.0078
     4        0.9069             nan     0.1000    0.0064
     5        0.8931             nan     0.1000    0.0065
     6        0.8820             nan     0.1000    0.0052
     7        0.8737             nan     0.1000    0.0042
     8        0.8658             nan     0.1000    0.0023
     9        0.8580             nan     0.1000    0.0034
    10        0.8492             nan     0.1000    0.0040
    20        0.7949             nan     0.1000    0.0022
    40        0.7377             nan     0.1000    0.0008
    60        0.7045             nan     0.1000    0.0006
    80        0.6847             nan     0.1000    0.0004
   100        0.6712             nan     0.1000   -0.0001
   120        0.6602             nan     0.1000    0.0002
   140        0.6531             nan     0.1000    0.0001
   150        0.6503             nan     0.1000    0.0001

Iter   TrainDeviance   ValidDeviance   StepSize   Improve
     1        0.9399             nan     0.1000    0.0195
     2        0.9082             nan     0.1000    0.0155
     3        0.8845             nan     0.1000    0.0121
     4        0.8637             nan     0.1000    0.0098
     5        0.8464             nan     0.1000    0.0080
     6        0.8299             nan     0.1000    0.0070
     7        0.8178             nan     0.1000    0.0059
     8        0.8077             nan     0.1000    0.0046
     9        0.7972             nan     0.1000    0.0050
    10        0.7889             nan     0.1000    0.0044
    20        0.7311             nan     0.1000    0.0016
    40        0.6790             nan     0.1000    0.0009
    60        0.6466             nan     0.1000    0.0001
    80        0.6307             nan     0.1000   -0.0000
   100        0.6183             nan     0.1000   -0.0001
   120        0.6069             nan     0.1000   -0.0001
   140        0.6015             nan     0.1000   -0.0002
   150        0.5998             nan     0.1000   -0.0001

Iter   TrainDeviance   ValidDeviance   StepSize   Improve
     1        0.9339             nan     0.1000    0.0217
     2        0.8964             nan     0.1000    0.0174
     3        0.8657             nan     0.1000    0.0144
     4        0.8410             nan     0.1000    0.0114
     5        0.8199             nan     0.1000    0.0099
     6        0.8024             nan     0.1000    0.0080
     7        0.7877             nan     0.1000    0.0067
     8        0.7748             nan     0.1000    0.0060
     9        0.7632             nan     0.1000    0.0052
    10        0.7525             nan     0.1000    0.0047
    20        0.6859             nan     0.1000    0.0019
    40        0.6374             nan     0.1000    0.0003
    60        0.6107             nan     0.1000    0.0001
    80        0.5985             nan     0.1000   -0.0001
   100        0.5893             nan     0.1000    0.0001
   120        0.5813             nan     0.1000   -0.0002
   140        0.5777             nan     0.1000   -0.0001
   150        0.5762             nan     0.1000   -0.0001

Iter   TrainDeviance   ValidDeviance   StepSize   Improve
     1        0.9511             nan     0.1000    0.0131
     2        0.9315             nan     0.1000    0.0104
     3        0.9149             nan     0.1000    0.0085
     4        0.9026             nan     0.1000    0.0052
     5        0.8888             nan     0.1000    0.0070
     6        0.8781             nan     0.1000    0.0057
     7        0.8681             nan     0.1000    0.0047
     8        0.8601             nan     0.1000    0.0037
     9        0.8518             nan     0.1000    0.0044
    10        0.8453             nan     0.1000    0.0031
    20        0.7942             nan     0.1000    0.0013
    40        0.7379             nan     0.1000    0.0005
    60        0.7063             nan     0.1000    0.0000
    80        0.6872             nan     0.1000    0.0004
   100        0.6731             nan     0.1000    0.0001
   120        0.6638             nan     0.1000   -0.0001
   140        0.6574             nan     0.1000   -0.0000
   150        0.6546             nan     0.1000    0.0000

Iter   TrainDeviance   ValidDeviance   StepSize   Improve
     1        0.9397             nan     0.1000    0.0201
     2        0.9071             nan     0.1000    0.0148
     3        0.8813             nan     0.1000    0.0117
     4        0.8591             nan     0.1000    0.0100
     5        0.8408             nan     0.1000    0.0086
     6        0.8270             nan     0.1000    0.0065
     7        0.8139             nan     0.1000    0.0061
     8        0.8042             nan     0.1000    0.0046
     9        0.7936             nan     0.1000    0.0051
    10        0.7860             nan     0.1000    0.0031
    20        0.7278             nan     0.1000    0.0019
    40        0.6739             nan     0.1000    0.0010
    60        0.6461             nan     0.1000    0.0002
    80        0.6301             nan     0.1000    0.0002
   100        0.6171             nan     0.1000    0.0009
   120        0.6103             nan     0.1000   -0.0004
   140        0.6047             nan     0.1000   -0.0000
   150        0.6011             nan     0.1000   -0.0000

Iter   TrainDeviance   ValidDeviance   StepSize   Improve
     1        0.9338             nan     0.1000    0.0225
     2        0.8945             nan     0.1000    0.0175
     3        0.8668             nan     0.1000    0.0134
     4        0.8420             nan     0.1000    0.0117
     5        0.8216             nan     0.1000    0.0104
     6        0.8025             nan     0.1000    0.0090
     7        0.7862             nan     0.1000    0.0068
     8        0.7730             nan     0.1000    0.0064
     9        0.7606             nan     0.1000    0.0057
    10        0.7505             nan     0.1000    0.0043
    20        0.6852             nan     0.1000    0.0022
    40        0.6338             nan     0.1000    0.0008
    60        0.6085             nan     0.1000   -0.0000
    80        0.5987             nan     0.1000    0.0000
   100        0.5896             nan     0.1000   -0.0004
   120        0.5833             nan     0.1000   -0.0001
   140        0.5787             nan     0.1000   -0.0004
   150        0.5758             nan     0.1000   -0.0003

Iter   TrainDeviance   ValidDeviance   StepSize   Improve
     1        0.9535             nan     0.1000    0.0130
     2        0.9331             nan     0.1000    0.0103
     3        0.9168             nan     0.1000    0.0083
     4        0.9034             nan     0.1000    0.0067
     5        0.8918             nan     0.1000    0.0046
     6        0.8812             nan     0.1000    0.0056
     7        0.8725             nan     0.1000    0.0046
     8        0.8648             nan     0.1000    0.0038
     9        0.8556             nan     0.1000    0.0044
    10        0.8487             nan     0.1000    0.0026
    20        0.7960             nan     0.1000    0.0013
    40        0.7390             nan     0.1000    0.0007
    60        0.7087             nan     0.1000    0.0005
    80        0.6902             nan     0.1000   -0.0001
   100        0.6748             nan     0.1000    0.0000
   120        0.6637             nan     0.1000    0.0001
   140        0.6574             nan     0.1000   -0.0002
   150        0.6549             nan     0.1000   -0.0000

Iter   TrainDeviance   ValidDeviance   StepSize   Improve
     1        0.9394             nan     0.1000    0.0200
     2        0.9087             nan     0.1000    0.0155
     3        0.8841             nan     0.1000    0.0123
     4        0.8645             nan     0.1000    0.0097
     5        0.8470             nan     0.1000    0.0078
     6        0.8323             nan     0.1000    0.0073
     7        0.8178             nan     0.1000    0.0065
     8        0.8055             nan     0.1000    0.0065
     9        0.7944             nan     0.1000    0.0048
    10        0.7884             nan     0.1000    0.0025
    20        0.7290             nan     0.1000    0.0019
    40        0.6782             nan     0.1000    0.0005
    60        0.6519             nan     0.1000    0.0004
    80        0.6284             nan     0.1000    0.0002
   100        0.6185             nan     0.1000   -0.0000
   120        0.6112             nan     0.1000   -0.0001
   140        0.6060             nan     0.1000   -0.0002
   150        0.6036             nan     0.1000    0.0000

Iter   TrainDeviance   ValidDeviance   StepSize   Improve
     1        0.9349             nan     0.1000    0.0237
     2        0.8977             nan     0.1000    0.0186
     3        0.8697             nan     0.1000    0.0128
     4        0.8430             nan     0.1000    0.0129
     5        0.8211             nan     0.1000    0.0112
     6        0.8014             nan     0.1000    0.0088
     7        0.7866             nan     0.1000    0.0068
     8        0.7733             nan     0.1000    0.0054
     9        0.7614             nan     0.1000    0.0057
    10        0.7511             nan     0.1000    0.0050
    20        0.6861             nan     0.1000    0.0008
    40        0.6347             nan     0.1000    0.0004
    60        0.6114             nan     0.1000    0.0001
    80        0.5967             nan     0.1000   -0.0003
   100        0.5885             nan     0.1000   -0.0002
   120        0.5819             nan     0.1000   -0.0002
   140        0.5771             nan     0.1000   -0.0003
   150        0.5742             nan     0.1000   -0.0004

Iter   TrainDeviance   ValidDeviance   StepSize   Improve
     1        0.9538             nan     0.1000    0.0127
     2        0.9328             nan     0.1000    0.0098
     3        0.9174             nan     0.1000    0.0078
     4        0.9040             nan     0.1000    0.0063
     5        0.8905             nan     0.1000    0.0063
     6        0.8798             nan     0.1000    0.0052
     7        0.8700             nan     0.1000    0.0049
     8        0.8619             nan     0.1000    0.0038
     9        0.8520             nan     0.1000    0.0045
    10        0.8457             nan     0.1000    0.0029
    20        0.7929             nan     0.1000    0.0011
    40        0.7399             nan     0.1000    0.0008
    60        0.7091             nan     0.1000    0.0006
    80        0.6895             nan     0.1000    0.0001
   100        0.6778             nan     0.1000    0.0004
   120        0.6678             nan     0.1000    0.0000
   140        0.6615             nan     0.1000    0.0002
   150        0.6589             nan     0.1000   -0.0001

Iter   TrainDeviance   ValidDeviance   StepSize   Improve
     1        0.9387             nan     0.1000    0.0199
     2        0.9066             nan     0.1000    0.0160
     3        0.8810             nan     0.1000    0.0123
     4        0.8607             nan     0.1000    0.0103
     5        0.8439             nan     0.1000    0.0083
     6        0.8291             nan     0.1000    0.0075
     7        0.8167             nan     0.1000    0.0052
     8        0.8060             nan     0.1000    0.0051
     9        0.7969             nan     0.1000    0.0038
    10        0.7903             nan     0.1000    0.0026
    20        0.7268             nan     0.1000    0.0017
    40        0.6735             nan     0.1000    0.0009
    60        0.6443             nan     0.1000    0.0003
    80        0.6270             nan     0.1000   -0.0001
   100        0.6168             nan     0.1000   -0.0001
   120        0.6106             nan     0.1000   -0.0001
   140        0.6056             nan     0.1000   -0.0006
   150        0.6029             nan     0.1000   -0.0002

Iter   TrainDeviance   ValidDeviance   StepSize   Improve
     1        0.9374             nan     0.1000    0.0204
     2        0.8989             nan     0.1000    0.0185
     3        0.8675             nan     0.1000    0.0151
     4        0.8428             nan     0.1000    0.0125
     5        0.8202             nan     0.1000    0.0108
     6        0.8041             nan     0.1000    0.0084
     7        0.7873             nan     0.1000    0.0079
     8        0.7735             nan     0.1000    0.0063
     9        0.7610             nan     0.1000    0.0061
    10        0.7521             nan     0.1000    0.0040
    20        0.6864             nan     0.1000    0.0014
    40        0.6356             nan     0.1000    0.0001
    60        0.6128             nan     0.1000    0.0004
    80        0.5997             nan     0.1000   -0.0001
   100        0.5918             nan     0.1000   -0.0001
   120        0.5849             nan     0.1000   -0.0003
   140        0.5804             nan     0.1000   -0.0002
   150        0.5778             nan     0.1000    0.0000

Iter   TrainDeviance   ValidDeviance   StepSize   Improve
     1        0.9538             nan     0.1000    0.0123
     2        0.9353             nan     0.1000    0.0098
     3        0.9193             nan     0.1000    0.0079
     4        0.9062             nan     0.1000    0.0056
     5        0.8933             nan     0.1000    0.0064
     6        0.8812             nan     0.1000    0.0050
     7        0.8734             nan     0.1000    0.0040
     8        0.8646             nan     0.1000    0.0045
     9        0.8569             nan     0.1000    0.0033
    10        0.8505             nan     0.1000    0.0027
    20        0.8004             nan     0.1000    0.0013
    40        0.7450             nan     0.1000    0.0010
    60        0.7149             nan     0.1000   -0.0000
    80        0.6969             nan     0.1000    0.0005
   100        0.6843             nan     0.1000    0.0000
   120        0.6741             nan     0.1000   -0.0002
   140        0.6676             nan     0.1000   -0.0001
   150        0.6641             nan     0.1000    0.0001

Iter   TrainDeviance   ValidDeviance   StepSize   Improve
     1        0.9373             nan     0.1000    0.0199
     2        0.9090             nan     0.1000    0.0146
     3        0.8845             nan     0.1000    0.0118
     4        0.8647             nan     0.1000    0.0088
     5        0.8465             nan     0.1000    0.0084
     6        0.8323             nan     0.1000    0.0067
     7        0.8205             nan     0.1000    0.0058
     8        0.8113             nan     0.1000    0.0039
     9        0.8010             nan     0.1000    0.0047
    10        0.7912             nan     0.1000    0.0041
    20        0.7337             nan     0.1000    0.0018
    40        0.6861             nan     0.1000    0.0002
    60        0.6547             nan     0.1000    0.0004
    80        0.6394             nan     0.1000   -0.0002
   100        0.6263             nan     0.1000   -0.0001
   120        0.6183             nan     0.1000   -0.0000
   140        0.6104             nan     0.1000   -0.0002
   150        0.6090             nan     0.1000   -0.0002

Iter   TrainDeviance   ValidDeviance   StepSize   Improve
     1        0.9345             nan     0.1000    0.0221
     2        0.8993             nan     0.1000    0.0162
     3        0.8714             nan     0.1000    0.0141
     4        0.8468             nan     0.1000    0.0115
     5        0.8273             nan     0.1000    0.0089
     6        0.8083             nan     0.1000    0.0094
     7        0.7926             nan     0.1000    0.0068
     8        0.7782             nan     0.1000    0.0059
     9        0.7665             nan     0.1000    0.0051
    10        0.7567             nan     0.1000    0.0043
    20        0.6949             nan     0.1000    0.0012
    40        0.6483             nan     0.1000    0.0004
    60        0.6191             nan     0.1000    0.0003
    80        0.6081             nan     0.1000    0.0000
   100        0.5983             nan     0.1000   -0.0005
   120        0.5909             nan     0.1000    0.0000
   140        0.5862             nan     0.1000   -0.0002
   150        0.5842             nan     0.1000   -0.0004

Iter   TrainDeviance   ValidDeviance   StepSize   Improve
     1        0.9309             nan     0.1000    0.0233
     2        0.8955             nan     0.1000    0.0174
     3        0.8667             nan     0.1000    0.0136
     4        0.8406             nan     0.1000    0.0130
     5        0.8201             nan     0.1000    0.0105
     6        0.8029             nan     0.1000    0.0086
     7        0.7892             nan     0.1000    0.0065
     8        0.7756             nan     0.1000    0.0061
     9        0.7644             nan     0.1000    0.0047
    10        0.7524             nan     0.1000    0.0053
    20        0.6864             nan     0.1000    0.0014
    40        0.6406             nan     0.1000    0.0010
    60        0.6186             nan     0.1000    0.0002
    80        0.6040             nan     0.1000    0.0001
   100        0.5942             nan     0.1000    0.0000
   120        0.5880             nan     0.1000   -0.0002
   140        0.5835             nan     0.1000   -0.0003
   150        0.5806             nan     0.1000   -0.0000
Loading required package: pROC
Type 'citation("pROC")' for a citation.

Attaching package: 'pROC'
The following objects are masked from 'package:stats':

    cov, smooth, var
Loading required package: PRROC
Warning: package 'PRROC' was built under R version 4.3.3
Loading required package: rlang
Warning: package 'rlang' was built under R version 4.3.3

Attaching package: 'rlang'
The following objects are masked from 'package:purrr':

    %@%, flatten, flatten_chr, flatten_dbl, flatten_int, flatten_lgl,
    flatten_raw, invoke, splice
The following object is masked from 'package:backports':

    %||%
The following object is masked from 'package:data.table':

    :=
Loading required package: ModelMetrics

Attaching package: 'ModelMetrics'
The following object is masked from 'package:pROC':

    auc
The following objects are masked from 'package:caret':

    confusionMatrix, precision, recall, sensitivity, specificity
The following object is masked from 'package:base':

    kappa
Setting levels: control = N, case = Y
Setting direction: controls < cases
Preparation of a new explainer is initiated
  -> model label       :  gbm 
  -> data              :  5336  rows  51  cols 
  -> target variable   :  0  values 
  -> target variable   :  length of 'y' is different than number of rows in 'data' (  WARNING  ) 
  -> predict function  :  yhat.train  will be used (  default  )
  -> predicted values  :  No value for predict function target column. (  default  )
  -> model_info        :  package caret , ver. 7.0.1 , task classification (  default  ) 
  -> predicted values  :  numerical, min =  0.011907 , mean =  0.1909481 , max =  0.9730879  
  -> residual function :  difference between y and yhat (  default  )
Warning in min(residuals): no non-missing arguments to min; returning Inf
Warning in max(residuals): no non-missing arguments to max; returning -Inf
  -> residuals         :  numerical, min =  Inf , mean =  NaN , max =  -Inf  
  A new explainer has been created!  

Visualize results

Here we create a plot of the feature importance scores for each of the top (here we have ) predictors identified by MLHO.

To do so, let’s map the concept codes to their “English” translation. That’s why we kept that 4th column called description in dbmart.

features = model.test$features
features$features.new <- sub("_.*", "", features$features )
features$label = sub(".*_", "", features$features )
dbmart.concepts <- dbmart[!duplicated(paste0(dbmart$phenx)), c("phenx","DESCRIPTION")]
mlho.features <- data.frame(merge(features,dbmart.concepts,by.x="features.new",by.y = "phenx"))
mlho.features$feature.desc = paste(mlho.features$DESCRIPTION, mlho.features$label, sep = "_")
datatable(dplyr::select(mlho.features,feature.desc,`Feature importance`=Overall), options = list(pageLength = 5), filter = 'bottom')

now visualizing feature importance

(plot<- ggplot(mlho.features) +
    geom_segment(
      aes(y = 0,
          x = reorder(feature.desc,Overall),
          yend = Overall,
          xend = feature.desc),
      size=0.5,alpha=0.5) +
    geom_point(
      aes(x=reorder(feature.desc,Overall),y=Overall),
      alpha=0.5,size=2,color="red") +
    theme_minimal()+
   coord_flip()+
    labs(y="Feature importance",x=""))
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

SHAP value and visualization

When setting calSHAP=TRUE, SHAP values are also calculated to explain the output of models as shown below.

shap_value <- model.test$shap
head(as.data.frame(shap_value))
                             variable  contribution    variable_name
1000126_1_last     1000126_1_last = 0 -0.0055755209   1000126_1_last
1000126_1_past     1000126_1_past = 0  0.0029849985   1000126_1_past
104326007_1_past 104326007_1_past = 0  0.0002794491 104326007_1_past
104375008_1_past 104375008_1_past = 0  0.0000000000 104375008_1_past
117010004_1_past 117010004_1_past = 0  0.0000000000 117010004_1_past
15777000_1_last   15777000_1_last = 0 -0.0039708992  15777000_1_last
                 variable_value sign label B
1000126_1_last                0   -1   gbm 0
1000126_1_past                0    1   gbm 0
104326007_1_past              0    1   gbm 0
104375008_1_past              0    0   gbm 0
117010004_1_past              0    0   gbm 0
15777000_1_last               0   -1   gbm 0
plot(shap_value)

dbmart.concepts.new = mlho.features %>% select(features, feature.desc)
colnames(dbmart.concepts.new) = c("phenx", "DESCRIPTION")
mshapviz(shap_value, dbmart.concepts.new, plot_type = "waterfall", top_n = 6, num = 1)
Selecting by abs_S
Selecting by abs_S

mshapviz_all(shap_value, dbmart.concepts.new, top_n = 5)
Selecting by mean_S
Selecting by mean_S