By: Taha A. Kass-Hout, M.D., M.S.
Last year, I worked with a group of colleagues throughout the Food and Drug Administration (FDA) on a project that is critical for the agency’s future: the modernization of our information technology platforms to prepare for the influx of "Big Data" – the enormous data sets we receive daily from manufacturers, health care providers, regulatory bodies, scientists and others.
These data sets are not only larger than ever before, they are also arriving more frequently than ever and varying enormously in format, and quality.
This year alone, we expect to receive somewhere between 1.5 and 2 million submissions through our eSubmission Gateway – and some submissions can now be as large as a Terabyte (one trillion bytes) in size. This is the very definition of a big data.
But, at FDA, we view it as an opportunity and a challenge. To meet both, we are building an innovative technology environment that can handle vast amounts of data and provide powerful tools to identify and extract the information we need to collect, store and analyze.
A key example is our recent leveraging of cloud computing.
"Cloud computing" is, basically, computing on demand. Think of how you use water, or electricity, at the same time as do your neighbors and millions of others. You pay only for what you use, and service is always guaranteed. You don’t need to wait till your neighbor is done to use the washer or dryer because there is only enough electrical capacity to handle one person at a time.
The same is true of cloud computing, which stores data on the Internet, rather than on the hard drive or drives of computers. In essence, it gives us the ongoing, simultaneous capacity to collect, control and analyze enormous data sets.
For example, FDA, partnering with state and local health organizations, identifies thousands of foodborne pathogen contaminants every year. We sequence, store and analyze this data to understand, locate, and contain life-threatening outbreaks. Again, cloud computing aids us in this effort.
Finally, FDA has some of the world’s most valuable data stores about human health and medicine. Through OpenFDA, our newest IT program, we are making some of these existing publicly available data sets more easily accessible to the public and to our regulatory stakeholders in a structured, computer readable format that will make it possible for technology specialists, such as mobile application creators, web developers, data visualization artists and researchers to quickly search, query, or pull massive amounts of public information instantaneously and directly from FDA datasets on an as needed basis. OpenFDA is beginning with an initial pilot program involving the millions of reports of drug adverse events and medication errors that have been submitted to the FDA from 2004 to 2013 and will later be expanded to include the agency’s databases on product recalls and product labeling.
OpenFDA promotes data sharing, data access, and transparency in our regulatory and safety processes, and spurs innovative ideas for mining the data and promoting the public health.
Big data is important to the way we carry out regulatory science, which is the science of developing new tools and approaches to assess the safety, efficacy, quality, and performance of FDA-regulated products. Through innovative methods such as cloud computing, we are taking advantage of this flood tide of new information to continue to protect and promote the public health.
Taha A. Kass-Hout, M.D., M.S., is FDA’s Chief Health Informatics Officer and Director of FDA’s Office of Informatics and Technology Innovation.