Sci-Score, a tool to support rigor and transparency guidelines

Period of Performance: 08/03/2017 - 08/02/2018

$222K

Phase 1 SBIR

Recipient Firm

Scicrunch, Inc.
SAN DIEGO, CA 92122
Principal Investigator

Abstract

Project Summary While standards in reporting of scientific methods are absolutely critical to producing reproducible science, meeting such standards is difficult. Checklists and instructions are tough to follow often resulting in low and inconsistent compliance. Scientific journals and societies as well as the National Institutes of Health are now actively proposing general guidelines to address reproducibility issues, particularly in the reporting of methods (e.g., http://www.cell.com/star- methods), but the trickier part will be to train the biomedical community to use these standards to effectively improve how scientific methods are communicated. To support new standards in methods reporting, specifically the RRID standard for Rigor and Transparency of Key Biological Resources, we propose to build Sci- Score a text mining based tool suite to help authors meet the standard. Sci- Score will provide an automated check on compliance with the RRID standard already implemented by over 100 journals including Cell, Journal of Neuroscience, and eLife. The innovation behind Sci- score is the provision of a score, which can be obtained by individual investigators, which reflects a numerical validation of the quality of their methods reporting. We posit that the score will serve as a tool that investigators and journals can use to compete with themselves and each other, or in the very least allow them to see how close they are to the average in meeting quality requirements. Recently, our group has developed a text mining algorithm that has now been successfully been used to detect software tools and databases from the SciCrunch Registry in published papers. Digital tools are one of four resource types that the RRID standard identifies. We propose to extend this approach to the other types of entities: antibodies, cell lines and model organisms. Resource identification along with other quality metrics twill be used to train an algorithm to score the overall quality of the methods document. If successful, the tool could be used by editors, reviewers, and investigators to improve the number of RRIDs, therefore the quality of descriptors of key biological resources in published papers. This SBIR project will build a set of algorithms similar to the resource finding pipeline and develop it into an industrial robust and reconfigurable software system. Our Phase I specific aims include to 1) creating gold sets of data for each resource type and training a set of algorithms for each resource type; 2) designing and evaluating the scoring system; 3) designing and evaluating a report generating system based on the previous aims. In Phase II, we will develop a scalable backend infrastructure to serve the needs of scientific publishers and research community.