'Big data' advances could help solve health, energy challenges
Two teams of Arizona State University computer science researchers are working to develop the next generations of data-driven predictive systems to improve our ability to respond to epidemics and more effectively manage buildings and their energy systems.
Both teams are led by K. Selçuk Candan, a professor in the School of Computing, Informatics, and Decision Systems Engineering, one of the ASU’s Ira A. Fulton Schools of Engineering.
Candan has been awarded two National Science Foundation (NSF) grants to support the research, as well as a grant from Johnson Controls, Inc., a global company that provides products and services to optimize building operations, including energy systems.
His team is striving to devise better ways to analyze, integrate and index large volumes of data that will be used to produce simulations. Researchers use the simulations to derive accurate information and predictions necessary to design more effective systems.
Candan’s team for the building and energy management systems project includes Maria Luisa Sapino, an adjunct professor of computer science at ASU and a professor at the University of Torino, Italy, and Youngchoon Park, a technical fellow with Johnson Controls, Inc.
The epidemic management team includes Sapino and Gerardo Chowell-Puente, an associate professor in ASU’s School of Human Evolution and Social Change, whose expertise includes epidemiology, mathematics, computer modeling and statistics.
According to the U.S. Energy Information Administration, buildings consume more energy than any other sector, accounting for 48.7 percent of overall energy consumption. In addition, building energy consumption is projected to grow faster than consumption by industry and transportation sectors.
Candan’s team hopes to create a new building energy data management system (e-SDMS) that helps reduce energy dependency, consumption and costs. Accomplishing that goal will help remove major obstacles to environmentally sustainable development, particularly in developing countries, Candan says.
Computational models for the spatio-temporal dynamics of emerging infectious diseases, and data- and model-driven computer simulations of the spread of diseases, are increasingly critical in predicting the geo-temporal evolution of epidemics, Candan says. These models are used to effectively manage such health emergencies through a diverse set of pharmaceutical and non-pharmaceutical control measures.
The new data-driven epidemic simulation system (epiDMS) the team is developing will be part of a system to address the key data challenges underlying epidemic-spread simulations that hinder real-time analysis and decision-making during outbreaks of epidemics. Such problems slow reaction to fast-spreading epidemics such as Swine Flu and severe acute respiratory syndrome (SARS).
The two NSF grants are providing $500,000 for each of the two projects – the building/energy management system and the epidemic management system.
The Johnson Controls grant of $50,000 to ASU’s Center for Embedded Systems – an NSF Industry/University Cooperative Research Center – will also provide the Center and Candan with research data and building energy systems domain expertise, and help to deploy the project.
Candan’s work focuses on solving the “big data” computational challenges that arise from the need to model, index, search, visualize and analyze – in a scalable manner – large volumes of data sets from observations and simulations.
While very powerful simulation software exists, Candan explains, the software presents two major challenges: creating models to support such simulations, and analyzing simulation results are both extremely costly. Simulations involve hundreds of parameters, affected by complex dynamic processes operating at different spatial and temporal resolutions, he says. This means simulations and observations cover days to months of data, and may be considered at different granularities of space and time.
New parameters, new contexts
For input, building energy simulations, for example, use building models – describing the building structure, materials used, cooling/heating units, heat-transfer characteristics and energy costs. A single building model may involve hundreds of parameters tracked for hundreds of thousands of time steps. Multiple simulation results, with varying parameter settings, often need to be interpreted and possibly compared with real-world observations to make effective decisions, Candan says.
Candan’s team is developing systems to support data-driven simulations that can potentially guide design decisions and management strategies, and enable experts to explore and analyze models and simulations from diverse parameters and at multiple scales.
He says the data-management software will enable significant savings in modeling, execution and analysis through modular re-use of existing simulation results in new settings – such as recontextualization of models and simulation results under new parameters and new contexts. The data encoding, partitioning and analysis algorithms will be efficiently computable and leverage massive parallelism to tackle scalability challenges.
Producing more ‘big data’ experts
Candan is also helping to develop new graduate-level computer science studies with concentrations in “big data” systems. The program will help meet the growing need for data scientists and engineers who can design, build, implement and manage large data systems for industry and scientific discovery, he says.
The “big data” concentrations will enable to students to gain expertise in designing scalable (parallel, distributed and real-time) systems for acquiring, storing, securing and accessing large-scale heterogeneous multi-source data over its life cycle, teaching them to use analytical tools to mine information from the data.
Courses will include research, case studies and presentations from industry and government experts who can provide students diverse perspectives on the course topics.
The projects Candan’s teams are working on with the support from the three new grants will also have an impact on these computer science concentrations. The challenges his teams face and the outcomes of their research will be incorporated into the curricula.
These studies will introduce computer science students to “big data” management, indexing and analysis, and parallel data processing, as well as familiarize them with challenges in the area of energy, sustainability and epidemic response management.
Written by Mayank Prasad and Joe Kullman