Every second, approximately 6,000 tweets are posted on Twitter. Every minute, 360,000 tweets. Every hour, almost 22 million tweets. Every day, more than 500 million tweets. That’s a significant amount of data — and it represents only one social media platform out of hundreds.
Social media offers an enormous volume of unstructured data that can generate knowledge and help make better decisions on a larger scale. While humans are clearly efficient data generators, computers are having a difficult time processing and analyzing the sheer volume of data.
Arizona State University Associate Professor Ming Zhao has taken the driver’s seat in developing the Energy Efficient Big Data Research System, called GEARS, a new computing infrastructure created by a consortium of interdisciplinary researchers who are turning the noise of social media data into useful data sources that can improve machine learning and detect security threats or important, but hidden, incidents like disease outbreaks or crimes in real time.
But GEARS’ functionality isn’t limited to social media. The team is ready to clutch the increasingly challenging and diverse big data applications present in today’s world full of sensors and the internet of things, which are generating data ranging from brain signals to activity in deep space.
“Scientific discoveries are driven by data, not just experimentation anymore,” said Zhao, a faculty member in the ASU Ira A. Fulton Schools of Engineering. “But how do we make use of that data?”
In order to support new discoveries through data, we need new, higher-performance systems. However, power consumption has become a limiting factor for big data systems, so improvements in energy efficiency are also important.
Zhao’s efforts to solve both performance and energy-efficiency challenges of big data technologies — in a project titled “GEARS — An Infrastructure for Energy-Efficient Big Data Research on Heterogeneous and Dynamic Data” — is funded by the National Science Foundation through a three-year, $750,000 grant.
More diverse hardware for more diverse big data tasks
The way Zhao aims to meet performance and efficiency goals is through heterogeneous computing, or a combination of multiple processor and storage types. Though it isn’t yet a common term, heterogeneous computing is fairly common among our everyday devices.
One example of heterogeneous computing is the iPhone X’s processor, which has four cores optimized for performance and two cores optimized for power efficiency. While these general-purpose cores can run a variety of apps and operating system duties, the processor also features a dual-core neural engine that is specialized for machine learning tasks and operates Face ID, a face-recognition application requiring significant computing horsepower to run quickly.
The neural engine’s circuitry is specially designed to handle the complex computing involved in recognizing a user’s face compared to a general-purpose processor core with a one-size-fits-most structure that’s good enough for simple applications.
Heterogeneous computing on the storage side can also be seen in many computers, which are likely to include both hard-disk drives (HDDs) for inexpensive, high-capacity storage and solid-state drives (SSDs) for storage that’s speedy to access.
GEARS is taking a similar approach but at a much larger scale. The system is expanding beyond having only general-purpose computer processors (central processing units) to incorporating accelerators (graphics processing units and field-programmable gate arrays, or FPGAs), and integrating a deep hierarchy of storage tiers (dynamic random-access memory, non-volatile memory, or NVM, HDDs and various SSD technologies) that each have their own advantages and disadvantages depending on a given application’s characteristics.
The inclusion of heterogeneous hardware such as FPGAs and NVMs allows GEARS to tackle tough big data problems, for example, problems that cannot be easily parallelized and that are sensitive to delays. In many cases, these less traditional hardware designs also consume less power, contributing to the energy-efficiency goals of GEARS.
Easy-to-use software for heavy-duty hardware
GEARS incorporates software components that help optimize the use of the various processor and accelerator types and storage resources.
“It’s easy to buy the heterogeneous hardware (components) and put them together, but it’s up to the software system to make good use of the devices,” said Zhao, who is director of the Research Laboratory for Virtualized Infrastructures, Systems and Applications that started the development of GEARS’s underlying technology.
While some researchers on the GEARS team are focusing on developing the hardware and software infrastructure, others are developing new algorithms to make efficient use of the infrastructure and to make it user-friendly for other data scientists.
“Usability is important, so we want to make it really easy for users to develop applications for the heterogeneous hardware of GEARS,” Zhao said.
One way GEARS researchers are achieving this is by developing extensions to popular data analytics platforms such as Apache Spark. Data scientists can develop an application with Spark as they normally would, then apply the application to the high-performance, energy-efficient, optimized GEARS infrastructure.
Another example is extending widely used machine learning platforms such as TensorFlow, which will allow researchers to conveniently deploy their algorithms on GEARS and benefit from its heterogeneous computing power.
GEARS wants your big data challenges
Now that they have the system in place, the GEARS team is eager to take on diverse big data challenges beyond the realm of computer science.
“Essentially, anything that requires big data could potentially benefit from GEARS,” Zhao said.
So far, GEARS has helped with several interdisciplinary projects in collaboration with researchers at ASU, other universities and companies across the country, and even around the globe. These include projects related to neuroscience, sustainability, medicine, aerospace, botany and geography.
Assistant Professor Fengbo Ren, a co-principal investigator of the GEARS project, is helping researchers from ASU’s School of Geographical Sciences and Urban Planning, for example, to develop a deep learning system with GEARS using an unstructured big data source of remote sensor data and photographs to automatically classify terrain features. Outside of the university, researchers from the Phoenix Children’s Hospital are working with another GEARS co-PI, Professor K. Selçuk Candan, to develop deep phenotyping for physiologic biomarkers of post-traumatic epilepsy in children.
Zhao says the team is happy to support anyone at ASU by hosting their big data applications on the GEARS hardware in the ASU Research Computing high-performance computing data center. For collaborators outside ASU, the team is happy to share the GEARS technology and open source software.
Zhao’s team will transfer this new technology beyond ASU, benefit a wider community of data scientists and engage with industry partners through ASU's Center for Assured and Scalable Data Engineering, an Industry-University Cooperative Research Center.
“People can learn from our technologies and our lessons and experiences we got from building GEARS to make their own version of GEARS,” Zhao said.
More Science and technology
Podcast explores the future in a rapidly evolving world
What will it mean to be human in the future? Who owns data and who owns us? Can machines think?These are some of the questions…
New NIH-funded program will train ASU students for the future of AI-powered medicine
The medical sector is increasingly exploring the use of artificial intelligence, or AI, to make health care more affordable and…
Cosmic clues: Metal-poor regions unveil potential method for galaxy growth
For decades, astronomers have analyzed data from space and ground telescopes to learn more about galaxies in the universe.…