Statistical Methods for Incomplete Data with Measurement Errors

Period of Performance: 05/01/2016 - 04/30/2017


Phase 2 SBIR

Recipient Firm

Data Numerica Institute, Inc.
Bellevue, WA 98006
Principal Investigator


? DESCRIPTION (provided by applicant): Missing data, censored data and surrogate markers are common incomplete data problems in biomedical data analysis. In this project, we are interested in statistical methods for experimental, observational, and genetic studies where there exist missing data, measurement errors, and surrogate markers. Examples include health surveys containing non-responders or missing items, surrogate marker data with measurement errors, etc. The applications could be longitudinal clinical trials, multilevel community studies, genetic markers, health surveys, etc. The incomplete data could be the non-ignorable missing response used in a model or as predictors, i.e. missing response, missing covariate, and covariate measurement errors. The most complicated scenario is the combination of such difficulties, e.g. missing response with covariate measurement errors, censored data with surrogate markers and measurement errors, etc. In this project, the ultimate results will be two statistical packages aiming at longitudinal and survival responses: 1) MiMe: statistical methods for missing data and measurement errors, and 2) Laso: joint modeling methods for longitudinal and survival outcomes in the study of surrogate marker for clinical event time. Functional and structural approaches will be developed, and they are applicable to many other areas, e.g. genetic markers association studies. The results from this project include innovative statistical methods, sensitivity analysis, graphical methods, case studies, software tools, and publications. An R version will be available and advanced used may apply this version for comparison studies vs. other approaches or customize this version for further extensions. A second version is to incorporate the tools from this research into our online data analysis platform, the Longit Informatics Center. Subscribers can access many statistical packages, modules, and dynamic graphics in Longit for data analysis. For various commercialization purposes, we will deliver online and offline versions, i.e. internet, intraweb, and desktop versions. We will also license ou API version for integrating with other analytic systems in business and other non-biomedical fields. One example is to integrate Longit with Alteryx, a commercial data mining tool for big data analysis.