Research Design, Biostatistics, and Literature Evaluation
A 95% CI describes the interval that would contain the true population point estimate in 95 of 100
random samples.
Correlation: Describes the association between two variables
Correlation value (r) contains a range of values from -1 to +1. An r value of -1 or +1 indicates a perfect
negative or positive relationship, respectively. Relative to 0, the closer the values are to 1, the stronger the
relationship between the two variables.
Spearman rank: Nonparametric continuous data or ordinal data
Regression Analysis: Describes whether an independent variable can predict an outcome or the causal effect
of an exposure on an outcome in the presence of possible confounders (Table 2). Can be used to describe the
strength of the association between a predictor variable and a dependent variable
Linear regression
Used when the dependent variable is continuous (e.g., length of stay)
A single predictor variable (continuous or categorical) can be tested in a simple linear regression
model.
More than one predictor variable (continuous or categorical) can be tested at a time in a multivariable
linear regression model.
Used when the dependent variable is a categorical variable (e.g., mortality)
A single predictor variable (continuous or categorical) can be tested in a simple logistic regression
model.
More than one predictor variable (continuous or categorical) can be tested at a time in a multivariable
logistic regression model.
Considerations
Selection of variables may depend on the primary goal of the regression model (predictive vs.
explanatory).
Variables for models that examine causal associations should be identified as possible
confounders for exposure of interest that may affect the outcome.
| (a) | A priori determination through literature review and anecdotes |
|---|---|
| (b) | R-squared, Mallows C(P) statistic, manual or automated (ie, backward elimination or |
forward selection) step-type
| (c) | Directed acyclic graphs and causal networks |
|---|
ii.
Selecting variables for inclusion solely on the basis of correlation between each independent
variable and outcome of interest, p values on univariate analysis, or automatic variable selection
(e.g., forward or backward stepwise selection) may result in overfitting or misconceptions about
true confounders.
iii.
Avoid colinear variables (i.e., septic shock and vasopressor requirement may be too similar
[e.g., multicollinearity])
Goodness of fit may be assessed in multiple ways (R2, Hosmer-Lemeshow tests, etc.)
Must be critically assessed as goodness of fit may not completely reflect important outliers or
may demonstrate overfitting
Regression analyses make several key assumptions, some of which are as follows:
Linear relationships between independent variables and outcome
ii.
No multicollinearity
iii.
Random component of the model is normally distributed
iv.
Appropriate number of variables for study sample size (e.g., at least around 10β20 cases per
independent variable in the model)