Enabling scientific discovery from complex data at extreme scales

Period of Performance: 02/21/2017 - 02/20/2018


Phase 1 STTR

Recipient Firm

5184 Tehachapi Way
Antioch, CA 94531
Firm POC
Principal Investigator

Research Institution

Lawrence Berkeley National Laboratory
One Cyclotron Road, 971-SP
Berkeley, CA 94720
Institution POC


Statistical machine learning has had substantial impacts on many business areas, including finance, supply chain management, cyber security, bioengineering, and elsewhere – it is the core of companies including Google, FaceBook, TrueCar, and many others. However, extant tools for predictive analytics provide little or no insight into underlying processes. Hence, while machine learning has inarguably been enormously beneficial to industry, it has yet to enable engineering-level insights into complex systems. A central challenge is to develop “open box” learning machines that provide deep insights into the systems they model, enabling the application of marketplace engineering principals to all industrial sectors. Preminon LLC, in collaboration with the Brown Group at Lawrence Berkeley National Laboratory, will develop a new, indefinitely scalable algorithm for feature discovery in supervised, unsupervised, and semisupervised regimes on massively multi-dimensional, hybrid (structured and unstructured) data, in both streaming and static regimes. Our techniques are based on our previous work on iterative Random Forests (iRF, https://github.com/sumbose/iRF). In nonlinear complex systems, to obtain engineering-level insights from BigData, it is necessary to identify and map nonlinear interactions. Mapping nonlinear interactions is done with forward approaches, which become computationally intractable at relatively low-orders. Our approach, for the first time, de-couples the order of interaction from the cost of detection – we have developed importance sampling in the space of all subsets nonparametric regression. We aim to generate HPC-compatible open box learning machines that provide substantial improvements to the informativeness of predictive analytics. We will commercialize this technology as licensable software in the rapidly growing business intelligence sector, currently at $2.7B, and projected to reach $9.7B by 2020.