Bringing disparate data together with new computing theories and tools
We live in a world of "big data."
A world where we can capture a continuous stream of information from all kinds of systems. However, as the scope and sources of data expand it becomes harder to model and analyze the data to make these systems better.
Complex applications like manufacturing, social network analysis, healthcare and fraud detection have data that exhibit different forms of variability. This is called heterogeneity.
Current computing models that look at data to predict the desired output of a process are able to accurately analyze one form of heterogeneous data, but applications today collect data that spans multiple types of heterogeneity. The existing models can no longer keep up.
Arizona State University assistant professor Jingrui He is looking to address this challenge. She was introduced to the problem of analyzing multiple types of heterogeneous data when she was tasked with creating novel tools and algorithms for semiconductor manufacturers while working at IBM Research and for social media networks at the request of friends at Facebook and Yahoo. When she tried to find papers that introduce the techniques for addressing this data to help her, there weren’t many.
“The state of the art cannot address this problem,” He said. “In the past people have worked on modeling a single type of heterogeneous data, and we now understand how this model performs given certain conditions, but what if we have more than one type of heterogeneity? Does this affect the model? Under what conditions will the additional heterogeneity affect the model’s performance and how the different types interact with each other?”
He set out to answer these questions by creating new algorithms and theories that will advance the ability to analyze multiple types of heterogeneous data.
Her efforts have earned He a National Science Foundation CAREER Award to help her achieve her research goals. This award, given to young engineers who are seen as research and educational leaders in their field, provides $500,441 in funding over five years.
Pioneering a new field of data analysis
He’s research will take three forms. First is to create a suite of effective and efficient algorithms for modeling the interaction of multiple types of heterogeneity. Second is to theoretically characterize the model’s performance and how it is affected by the interaction of multiple types of heterogeneity. And third is to systematically evaluate the algorithms and theories on real applications.
As part of He’s research, she and her team of five graduate students are collaborating with IBM Research, Facebook and Yahoo, who represent industries that can receive timely and measurable impacts on manufacturing and security. The students will also be given the opportunity to work on real problems of high importance to these companies.
In the semiconductor manufacturing process, for example, manufacturers look at data from the multiple chambers that are used for generating chips. Each chamber produces different types of data, such as temperature or pressure, and defects can be introduced in any step of the process. These datasets include multiple types of heterogeneity. Additional complexity is added when taking into account multiple competing expert opinions on what manufacturing processes should be used.
“A major advantage of using our technique is to significantly reduce cost by boosting the accuracy of the prediction model,” He said.
For semiconductor manufacturing, if manufacturers can predict the quality of a chip based on the heterogeneous data analyzed in the proposed model, they no longer have to physically test a chip, which destroys it, and that lowers production costs.
Sharing new techniques and knowledge
Part of He’s CAREER Award research includes integrating new techniques into curriculum and K-12 outreach efforts.
In one of her classes, Statistical Machine Learning, she plans to make available software tools created from her research for analyzing different combinations of heterogeneous data to help students better carry out class projects.
She also plans to create a new undergraduate class that introduces machine learning, its applications in social media analysis and healthcare, and how heterogeneous analysis can be used to address challenges in these applications.
He’s outreach efforts involve introducing computer science and machine learning concepts related to her research to K-12 students.
She began her outreach efforts at the Association for the Advancement of Artificial Intelligence Open House on Feb. 13 where student news organization Channel One News recorded content for a series to introduce artificial intelligence to high school students.
He also plans to present her research to K-12 students at ASU's Night of the Open Door and high school students at Fulton Summer Academy programs. Last year at Fulton Summer Academy she had two high school students work on machine learning problems she and her graduate students presented to them. He also hopes to get advice on directions to take her research from her outreach efforts with young students.