INTRODUCTION
Dealing with missing data and resolving ambiguity in the input is an important aspect of creating analysis data sets. Typical examples include the imputation of treatment administration times that are not explicitly captured in the input, substitution of missing demographic data with best guesses from the overall population, or making assumptions about pharmacokinetic observations that are be below the detection limit of the method.
In most cases, there are different options how to resolve such data inconsistencies, and in fact, analysts use individual imputation strategies based on the scientific question, certain conventions or even personal preferences. The nif package accounts for these different approaches by giving the data programmer the choice of different pre-defined imputation rule sets, or the option to define custom imputation rules.
This vignette outlines the inner workings of the two functions that
are central to the dataset generation pipeline, i.e.,
add_administration() and add_obervation(),
including the ways how data imputations can be adjusted.
BASELINE AND DEMOGRAPHIC PARAMETERS
The subjects’ age is derived from the ‘AGE’ field in the ‘DM’ domain of the input SDTM data. If this columns is missing, age is derived as the difference between ‘RFSTDTC’ and ‘BRTHDTC’.
Besides age, the subjects’ height and body weight are included, if
possible, as standard fields in the nif object. Both values are derived
from the ‘VS’ domain of the SDTM input, where the baseline time point is
identified as either VISIT == "SCREENING" or
VSBLFL' == "Y". If multiple measurements fulfilling these
condition are found for a given subject, their mean value is used.
TREATMENT ADMINISTRATIONS
In general, drug administration events are added to a nif object in the following way:
library(dplyr)
library(nif)
my_nif <- nif() |>
add_administration(examplinib_sad, extrt = "EXAMPLINIB", analyte = "RS2023")Note that add_administration() has an optional argument,
imputation, that defines a set of imputation rules,
essentially a list of functions that are applied at different stages.
The default is ’imputation_rules_standard`. More on this further
below.
Subject filtering
By default, the subjects included by
add_administration() exclude screening failures, as well as
subjects not treated (see the default ‘subject_filter’ string of
!ACTARMCD %in% c('SCRNFAIL', 'NOTTRT'). Other exclusion
filters can be used, if needed.
Time imputations
Depending on the study type (single vs. multiple administrations), ‘EX’ may define administration episodes spanning multiple administrations, i.e., from ‘EXSTDTC’ to ‘EXENDTC’. A typical example is shown below (some columns omitted for clarity):
#> STUDYID USUBJID EXDOSE EXSTDTC EXENDTC
#> 1 2023000001 20230000011010001 5 2000-12-31T10:18 2000-12-31T10:18
In general, add_administration() expands administration
episodes to individual rows for each administration event.
Administration episodes in ‘EX’ do not include time information for
individual treatment administrations but only for the first and last
ones (i.e., as reflected in ‘EXSTDTC’ and ‘EXENDTC’). In addition, the
time parts of ‘EXSDTDTC’ or ‘EXENDTC’ may be missing. This if often the
case when preliminary and not fully cleaned SDTM data are used to
generate an analysis data set Since precise time information for
administration events is usually essential for modeling analyses, a
series of time imputations are performed by
add_admininstration().
The following section describes the steps performed by the default imputation rule set, ‘imputation_rules_standard’, and the alternative pre-defined rule set, ‘imputation_rules_1’. It is possible and encouraged to write custom rule sets, however the details go beyond the scope of this vignette.
In either case, a first set of data imputations are performed on the non-expanded EX domain, with the aim to ensure that each episode has valid ‘EXSTDTC’ and ‘EXENDTC’ fields that can be then expanded to individual rows. A second set of imputations are performed after expansion. The focus there is to ensure that all administration events include the most precise time information, either derived from the available data sources, or imputed in a consistent way.
OBSERVATIONS
To be completed.
1. Missing last EXENDTC
If the last administration episode for a subject (and a treatment)
has an empty ‘EXENDTC’, it is replaced with the date/time provided by
‘DM.RFENDTC’, if available, i.e., the subject’s reference end date. See
the documentation for the (internal) function
impute_exendtc_to_rfendtc() for further details.
2. Ongoing treatment
If after the above imputation attempt, the last administration
episode still has no ‘EXENDTC’ entry, it is replaced with
cut_off_date. This situation is often found in interim
analyses where some subjects are still on treatment. The
cut_off_date parameter can be specified in the call to
add_administration(), or, if not specified, is set to the
last administration event found in the whole dataset (refer to the
documentation of impute_exendtc_to_cutoff() for
details).
3. Missing EXENDTC in other administration episodes
If in an unclean data set, ‘EXENDTC’ is missing in episodes that are
not the last episode for a given subject and treatment, it is replaced
with the day before the subsequent administration episode start
(‘EXSTDTC’). It should be understood that this reflects a rather strong
assumption, i.e., that the treatment was continued into the next
administration episode. Consider this a last-resort imputation that
should be avoided by prior data cleaning, if ever possible. This
imputation, if conducted, therefore issues a warning that cannot be
suppressed with silent = TRUE (see the documentation to
impute_missing_exendtc() for details).
4. Expansion of treatment administration episodes
All administration episodes, i.e., the intervals between ‘EXSTDTC’ and ‘EXENDTC’ for a given row in EX, are expanded into a sequence of rows with one administration day per row. The administration times for all rows except for the last are taken from the time information in EXSTDTD, whereas the time for the last administration event in the respective episode is taken from the time information in EXENDTC.
5. Impute administration time from PCRFTDTC
For administration days for which PK sampling events are recorded in
PC, the administration time is taken from PC.PCRFTDTC, if this field is
available. Time information derived during expansion (see 4.) is
overwritten during this process. See the documentation to
impute_admin_times_from_pcrftdtc() for details.