ModelValidation {iCARE} | R Documentation |
This function is used to validate absolute risk models.
ModelValidation(study.data, total.followup.validation = FALSE, predicted.risk = NULL, predicted.risk.interval = NULL, linear.predictor = NULL, iCARE.model.object = list(model.formula = NULL, model.cov.info = NULL, model.snp.info = NULL, model.log.RR = NULL, model.ref.dataset = NULL, model.ref.dataset.weights = NULL, model.disease.incidence.rates = NULL, model.competing.incidence.rates = NULL, model.bin.fh.name = NA, apply.cov.profile = NULL, apply.snp.profile = NULL, n.imp = 5, use.c.code = 1, return.lp = TRUE, return.refs.risk = TRUE), number.of.percentiles = 10, reference.entry.age = NULL, reference.exit.age = NULL, predicted.risk.ref = NULL, linear.predictor.ref = NULL, linear.predictor.cutoffs = NULL, dataset = "Example Dataset", model.name = "Example Risk Prediction Model")
study.data |
Data frame which includes the variables below.
|
total.followup.validation |
logical; TRUE if risk validation is performed over the total followup, for all other cases (e.g., 5 year or 10 year risk validation) it is FALSE |
predicted.risk |
vector of predicted risks; should be supplied if risk prediction is done by some
method other than that implemented in |
predicted.risk.interval |
scalar or vector denoting the number of years after entering
the study over which risk validation is desired (e.g., 5 for validating a model for 5 year risk)
if |
linear.predictor |
vector of risk scores for each subject, i.e. |
iCARE.model.object |
A named list containing the input arguments to the function |
number.of.percentiles |
the number of percentiles of the risk score that determines the number of strata over which the risk prediction model is to be validated, default = 10 |
reference.entry.age |
age of entry to be specified for computing absolute risk of the reference population |
reference.exit.age |
age of exit to be specified for computing absolute risk of the reference population |
predicted.risk.ref |
predicted absolute risk in the reference population assuming the entry age to be
as specified in |
linear.predictor.ref |
vector of risk scores for the reference population |
linear.predictor.cutoffs |
user specified cut-points for the linear predictor to define categories for absolute risk calibration and relative risk calibration |
dataset |
name and type of dataset to be displayed in the output, e.g., "PLCO Full Cohort" or "Full Cohort Simulation" |
model.name |
name of the model to be displayed in output, e.g., "Synthetic Model" or "Simulation Setting" |
This function returns a list of the following objects:
Subject_Specific_Observed_Outcome
: observed outcome after adjusting the observed followup according
to the risk prediction interval: 1 if disease has occurred by the end of followup, 0 if censored
Risk_Prediction_Interval
: Character object showing the interval of risk prediction
(e.g., 5 years). If the risk prediction is over the total followup of the study,
this reads "Observed Followup"
Adjusted.Followup
: followup time (in years) after adjusting the observed followup according to the risk
prediction interval
Subject_Specific_Predicted_Absolute_Risk
: predicted absolute risk of disease for each subject
Reference_Absolute_Risk
: predicted absolute risk in the reference population
Subject_Specific_Risk_Score
: estimated risk score for each subject; the missing covariates are handled
internally using the imputation in iCARE
Reference_Risk_Score
: risk score for the reference population
Population_Incidence_Rate
: age specific disease incidence rate in the population
Study_Incidence_Rate
: estimated age specific incidence rate in the study
Category_Results
: observed and predicted absolute risks and observed and predicted relative risks
in each category defined by the risk score
Category_Specific_Observed_Absolute_Risk
: Observed absolute risk in each category defined by the risk score
Category_Specific_Predicted_Absolute_Risk
: Predicted absolute risk in each category defined by the risk score
Category_Specific_Observed_Relative_Risk
: Observed relative risk in each category defined by the risk score
Category_Specific_Predicted_Relative_Risk
: Predicted relative risk in each category defined by the risk score
Variance_Matrix_Absolute_Risk
: Variance-covariance matrix of the vector of cateogry specific absolute risks
Variance_Matrix_LogRelative_Risk
: Variance-covariance matrix of the vector of cateogry specific relative risks
Hosmer_Lemeshow_Results
: results of the Hosmer-Lemeshow type chisquare test comparing the observed and predicted absolute risks
HL_pvalue
: pvalue of the Hosmer-Lemeshow type chisquare test
RR_test_result
: results of the chisquare test comparing the observed and predicted relative risks
RR_test_pvalue
: pvalue of the chisquare test of relative risk
AUC
: estimate of the Area Under the Curve (AUC) defined as the probability that for a randomly sampled
case-control pair the case has a higher risk score than the control; for the full cohort setting we compute
the empirical proportion and for the nested case-control setting we compute the inverse probability weighted estimator
Variance_AUC
: estimate of the variance of Area Under the Curve (AUC): for the full cohort setting the regular
asymptotic variance is estimated and for the nested case-control setting the influence function based variance
estimate of the inverse probability weighted variance estimator is computed
CI_AUC
: 95 percent Wald based confidence interval of Area Under the Curve (AUC) using the asymptotic variance
Overall_Expected_to_Observed_Ratio
: The overall ratio of the expected risk to the observed risk
CI_Overall_Expected_to_Observed_Ratio
: 95 percent Wald based confidence interval of the overall ratio of the expected risk to the observed risk
data(bc_data, package="iCARE") validation.cohort.data$inclusion = 0 subjects_included = intersect(validation.cohort.data$id, validation.nested.case.control.data$id) validation.cohort.data$inclusion[subjects_included] = 1 validation.cohort.data$observed.followup = validation.cohort.data$study.exit.age - validation.cohort.data$study.entry.age selection.model = glm(inclusion ~ observed.outcome * (study.entry.age + observed.followup), data = validation.cohort.data, family = binomial(link = "logit")) validation.nested.case.control.data$sampling.weights = selection.model$fitted.values[validation.cohort.data$inclusion == 1] set.seed(50) bc_model_formula = observed.outcome ~ famhist + as.factor(parity) data = validation.nested.case.control.data risk.model = list(model.formula = bc_model_formula, model.cov.info = bc_model_cov_info, model.snp.info = bc_15_snps, model.log.RR = bc_model_log_or, model.ref.dataset = ref_cov_dat, model.ref.dataset.weights = NULL, model.disease.incidence.rates = bc_inc, model.competing.incidence.rates = mort_inc, model.bin.fh.name = "famhist", apply.cov.profile = data[,all.vars(bc_model_formula)[-1]], apply.snp.profile = data[,bc_15_snps$snp.name], n.imp = 5, use.c.code = 1, return.lp = TRUE, return.refs.risk = TRUE) ModelValidation(study.data = data, total.followup.validation = TRUE, predicted.risk.interval = NULL, iCARE.model.object = risk.model, number.of.percentiles = 10)