Page 19/32 · Research Design, Biostatistics & Literature Evaluation

Data Tables

Research Design, Biostatistics & Literature Evaluation

Julie E. Farrar ~3 min read Module 2 of 20

/ 32

Research Design, Biostatistics, and Literature Evaluation

A 95% CI describes the interval that would contain the true population point estimate in 95 of 100

random samples.

Correlation: Describes the association between two variables

Correlation value (r) contains a range of values from -1 to +1. An r value of -1 or +1 indicates a perfect

negative or positive relationship, respectively. Relative to 0, the closer the values are to 1, the stronger the

relationship between the two variables.

2Pearson: Parametric continuous data

Spearman rank: Nonparametric continuous data or ordinal data

Regression Analysis: Describes whether an independent variable can predict an outcome or the causal effect

of an exposure on an outcome in the presence of possible confounders (Table 2). Can be used to describe the

strength of the association between a predictor variable and a dependent variable

Linear regression

Used when the dependent variable is continuous (e.g., length of stay)

A single predictor variable (continuous or categorical) can be tested in a simple linear regression

model.

More than one predictor variable (continuous or categorical) can be tested at a time in a multivariable

linear regression model.

2Logistic regression

Used when the dependent variable is a categorical variable (e.g., mortality)

A single predictor variable (continuous or categorical) can be tested in a simple logistic regression

model.

More than one predictor variable (continuous or categorical) can be tested at a time in a multivariable

logistic regression model.

Considerations

Selection of variables may depend on the primary goal of the regression model (predictive vs.

explanatory).

Variables for models that examine causal associations should be identified as possible

confounders for exposure of interest that may affect the outcome.

(a)	A priori determination through literature review and anecdotes
(b)	R-squared, Mallows C(P) statistic, manual or automated (ie, backward elimination or

forward selection) step-type

(c)	Directed acyclic graphs and causal networks

ii.

Selecting variables for inclusion solely on the basis of correlation between each independent

variable and outcome of interest, p values on univariate analysis, or automatic variable selection

(e.g., forward or backward stepwise selection) may result in overfitting or misconceptions about

true confounders.

iii.

Avoid colinear variables (i.e., septic shock and vasopressor requirement may be too similar

[e.g., multicollinearity])

Goodness of fit may be assessed in multiple ways (R2, Hosmer-Lemeshow tests, etc.)

Must be critically assessed as goodness of fit may not completely reflect important outliers or

may demonstrate overfitting

Regression analyses make several key assumptions, some of which are as follows:

Linear relationships between independent variables and outcome

ii.

No multicollinearity

iii.

Random component of the model is normally distributed

iv.

Appropriate number of variables for study sample size (e.g., at least around 10–20 cases per

independent variable in the model)

شرح الفيديو التعليمي — مزامنة مع الـ PDF

بدء التشغيل من: الدقيقة 18 فتح على YouTube