Improving Validity Measures for Alcohol-Related Models

Period of Performance: 09/30/2004 - 08/31/2005


Phase 2 SBIR

Recipient Firm

Martingale Research Corporation
Plano, TX 75074
Principal Investigator


DESCRIPTION (provided by applicant): Improved goodness-of-fit (GOF) measures that support specification testing for binary logistic, multinomial logistic, and linear regression models would be invaluable to investigators in alcohol-related epidemiological research and the broader clinical trials and health services research communities. Such models are used extensively to identify patterns of alcohol-related symptoms, define criteria of alcohol use disorders, and evaluate policies regulating use and distribution of alcoholic beverages. However, many regression models are inevitably misspecified (i.e., do not contain the true distribution that generated the observed data). This may lead to incorrect inferences when applying standard inferential techniques. In actual practice currently available statistical software uses GOF summary measures and statistical diagnostics that are not specifically designed to detect model misspecification and can be shown to make incorrect statistical inferences in the presence of common sources of model misspecification. Accordingly, researchers have minimal statistical tools to make reliable inferences for readily evaluating if their fitted model is the theoretically correct one. Information matrix (IM) tests are a type of model misspecification test (White, 1982, 1994, 1998) that are specifically intended to solve problems such as those previously mentioned. Phase II research will extend Phase I findings to develop and implement new IM statistical tests for: 1) multinomial logistic regression, and 2) linear regression on independent identically distributed as well as locally correlated observations. The Phase II experimental design will utilize Monte Carlo simulation bootstrapping methods for the purposes of evaluating the new IM tests using representative NIAAA databases. Specifically, the simulation studies will empirically characterize both the appropriateness of the large sample assumptions as well as the specificity and sensitivity (i.e., statistical power) of the new IM tests. These simulation study methodologies in conjunction with the new IM tests will be integrated into a prototype user-friendly standalone software package for the purposes of supporting epidemiological and health related regression modeling. In summary, Phase II research will establish the essential technical foundation for Phase III commercialization with the long-term objective of providing a suite of model specification tests as an advanced statistical tool for regression modeling in order to improve epidemiological and health-related research.