Preferred terminology
From NorthShore Analytics
- indicator variable
- a variable with only two outcomes: 0 and 1, FALSE and TRUE
- binary variable
- see indicator variable
- dummy variable
- see indicator variable
- categorical variable
- a variable with two or more outcomes whose values cannot be meaningfully converted into comparable numbers, e.g., ethnicity, gender, geographical region. see also nominal variable & ordinal variable.
- ordinal variable
- a categorical variable whose values can be meaningfully ordered but not quantitatively compared, e.g., stage of cancer, education level, degree of satisfaction see also continuous variable
- interval variable
- a discrete variable whose values are equidistant and the zero is arbitrarily set, e.g., IQ scores, number of hospital admissions, number of children in a family see also continuous variable
- continuous variable
- a variable whose values can take on any (real) number, e.g., body mass index, systolic blood pressure, hemoglobin
- null hypothesis
- a hypothesis stating that the measurements of two nominally different descriptors or outcomes of two different series of events are substantially identical
- alternative hypothesis
- a hypothesis stating that the measurements of two nominally different descriptors or outcomes of two different series of events are not identical
- Type I error
- an incorrect rejection of the null hypothesis}, see also false positive
- Type II error
- an incorrect acceptance of the alternative hypothesis
- t-test
- Student's t-test
- true positive
- an event correctly classified as having an outcome of interest
- false positive
- an event incorrectly classified as having an outcome of interest}, see also Type I error
- true negative
- an event correctly classified as not having an outcome of interest
- false negative
- an event incorrectly classified as not having an outcome of interest see also Type II error
- positive predictive value
also precision \begin{equation} PPV=\frac{\text{number of true positives}}{\text{number of predicted positives}} =\frac{\text{number of true positives}}{\text{number of true positives + number of false positives}} \end{equation}
- negative predictive value\[NPV=\frac{\text{number of true negatives}}{\text{number of predicted negatives}} =\frac{\text{number of true negatives}}{\text{number of true negatives + number of false negatives}}\]
- survival probability
- probability that a member of a population survives from time 0 to time t
- relative survival probability
- survival probability of a group under consideration relative to that of a benchmark group, e.g., survival probability of cancer patients relative to the general population of the same age
- survival analysis
- a branch of statistics (or, more broadly, applied mathematics) concerned with predicting failure events among a given population or group of technical objects
- time since first diagnosis
- interval of time between the first recorded or implied diagnosis assigned to the patient and the time t of interest (e.g., current time)
- true positive rate
- also sensitivity, hit rate, recall \( TPR=\frac{\text{number of true positives}}{\text{number of true positives + number of false negatives}} = \frac{\text{number of true positives}}{\text{total positive outcomes}} \)
- specificity
- also true negative rate \( TNR= \frac{\text{true negatives}}{\text{false positives + true negatives}}=\frac{\text{true negatives}}{\text{total negative outcomes}}\)
- false positive rate
- also fallout \( FPR=\frac{\text{false positives}}{\text{false positives + true negatives}}=\frac{\text{false positives}}{\text{total negative outcomes}}\)
- receiver operating characteristic (ROC) curve
- A curve that visualizes the accuracy of a classification algorithm as a relationship between true positive rate and false positive rate
- lift curve
- A curve that visualizes the relationship between true positive rate and the fraction of the population targeted by the response solicitation campaign. It is a variation on the receiver operating characteristic (ROC) curve
- lift\[Lift =\frac{\text{percentage of outcomes of interest in the population selected by the model}}{\text{percentage of outcomes of interest in the whole population}}\]
- F_1 score\[F_1 = 2 \frac{\text{positive predictive value} \times \text{true positive rate}}{\text{positive predictive value} + \text{true positive rate}}\]
- Matthews' correlation coefficient\[MCC= \frac{TP \times TN - FP \times FN}{\sqrt{(TP+FP)(TP+FN)(TN+FP)(TN+FN)}},\]
where TP is the number of true positives, TN is the number of true negatives, FP is the number of false positives and FN is the number of false negatives
- interaction term
- also cross term. In a (generalized) linear model, a nonlinear term of the form
\(\prod_{i=1}^{m} X_i\), where \(X_i\) is the i-th predictive variable, m the order of nonlinearity; the simplest nontrivial (m=2) case being \(X_1 \times X_2\)