Preferred terminology

From NorthShore Analytics
Jump to: navigation, search
indicator variable
a variable with only two outcomes: 0 and 1, FALSE and TRUE
binary variable
see indicator variable
dummy variable
see indicator variable
categorical variable
a variable with two or more outcomes whose values cannot be meaningfully converted into comparable numbers, e.g., ethnicity, gender, geographical region. see also nominal variable & ordinal variable.
ordinal variable
a categorical variable whose values can be meaningfully ordered but not quantitatively compared, e.g., stage of cancer, education level, degree of satisfaction see also continuous variable
interval variable
a discrete variable whose values are equidistant and the zero is arbitrarily set, e.g., IQ scores, number of hospital admissions, number of children in a family see also continuous variable
continuous variable
a variable whose values can take on any (real) number, e.g., body mass index, systolic blood pressure, hemoglobin
null hypothesis
a hypothesis stating that the measurements of two nominally different descriptors or outcomes of two different series of events are substantially identical
alternative hypothesis
a hypothesis stating that the measurements of two nominally different descriptors or outcomes of two different series of events are not identical
Type I error
an incorrect rejection of the null hypothesis}, see also false positive
Type II error
an incorrect acceptance of the alternative hypothesis
t-test
Student's t-test
true positive
an event correctly classified as having an outcome of interest
false positive
an event incorrectly classified as having an outcome of interest}, see also Type I error
true negative
an event correctly classified as not having an outcome of interest
false negative
an event incorrectly classified as not having an outcome of interest see also Type II error
positive predictive value

also precision \begin{equation} PPV=\frac{\text{number of true positives}}{\text{number of predicted positives}} =\frac{\text{number of true positives}}{\text{number of true positives + number of false positives}} \end{equation}

negative predictive value\[NPV=\frac{\text{number of true negatives}}{\text{number of predicted negatives}} =\frac{\text{number of true negatives}}{\text{number of true negatives + number of false negatives}}\]
survival probability
probability that a member of a population survives from time 0 to time t
relative survival probability
survival probability of a group under consideration relative to that of a benchmark group, e.g., survival probability of cancer patients relative to the general population of the same age
survival analysis
a branch of statistics (or, more broadly, applied mathematics) concerned with predicting failure events among a given population or group of technical objects
time since first diagnosis
interval of time between the first recorded or implied diagnosis assigned to the patient and the time t of interest (e.g., current time)
true positive rate
also sensitivity, hit rate, recall \( TPR=\frac{\text{number of true positives}}{\text{number of true positives + number of false negatives}} = \frac{\text{number of true positives}}{\text{total positive outcomes}} \)
specificity
also true negative rate \( TNR= \frac{\text{true negatives}}{\text{false positives + true negatives}}=\frac{\text{true negatives}}{\text{total negative outcomes}}\)
false positive rate
also fallout \( FPR=\frac{\text{false positives}}{\text{false positives + true negatives}}=\frac{\text{false positives}}{\text{total negative outcomes}}\)
receiver operating characteristic (ROC) curve
A curve that visualizes the accuracy of a classification algorithm as a relationship between true positive rate and false positive rate
lift curve
A curve that visualizes the relationship between true positive rate and the fraction of the population targeted by the response solicitation campaign. It is a variation on the receiver operating characteristic (ROC) curve
lift\[Lift =\frac{\text{percentage of outcomes of interest in the population selected by the model}}{\text{percentage of outcomes of interest in the whole population}}\]
F_1 score\[F_1 = 2 \frac{\text{positive predictive value} \times \text{true positive rate}}{\text{positive predictive value} + \text{true positive rate}}\]
Matthews' correlation coefficient\[MCC= \frac{TP \times TN - FP \times FN}{\sqrt{(TP+FP)(TP+FN)(TN+FP)(TN+FN)}},\]

where TP is the number of true positives, TN is the number of true negatives, FP is the number of false positives and FN is the number of false negatives

interaction term
also cross term. In a (generalized) linear model, a nonlinear term of the form

\(\prod_{i=1}^{m} X_i\), where \(X_i\) is the i-th predictive variable, m the order of nonlinearity; the simplest nontrivial (m=2) case being \(X_1 \times X_2\)