ASU team receives half-million-dollar NSF grant for web-based data-categorization app


Screenshot of a CatMapper search page.

A search page allows users to search for specific categories and guides users to a page where they can view a range of contextual information on each category. Photo courtesy the CatMapper team

You’re a social scientist trying to do a study on health, wealth and discrimination across hundreds of ethnicities worldwide. Until recently, this would have taken a lot of time and manual calculations with plenty of room for error.

There was no way to easily connect data on health and discrimination for ethnicities across thousands of datasets from countries around the world, explains Daniel Hruschka, professor and associate director at the School of Human Evolution and Social Change at Arizona State University.

Out of frustration, Hruschka and colleagues teamed up and started building CatMapper three years ago. The program helps scientists map categories between different datasets, which inspired the name "CatMapper."

“The number of datasets with information about ethnicities, religions, languages and geographic districts has exploded in recent years, and there is great potential for new analyses that bring these diverse datasets together,” Hruschka said. “However, bringing them together is a wicked problem, because every dataset has a different way of labeling the same thing. To make matters worse, there are thousands of religions, tens of thousands of ethnicities, languages and dialects, and hundreds of thousands of geographic districts to map across datasets.”

The app works to build bridges between different datasets that represent the same information but may be categorized, or named, in different ways. This will make it much easier to unlock and bring together data from a much larger set of datasets. 

“CatMapper helps users sort through this Wild West of categories when bringing data together from different sources,” Hruschka said.

Recently, the project received a $550,000 National Science Foundation grant to continue building and expanding. Year to date, CatMapper has over 40,000 views and has helped build new datasets for a number of projects. 

Currently, CatMapper houses two applications that help with two kinds of categories. SocioMap handles sociopolitical categories and ArchaMap handles archaeological artifact types, explained Robert Bischoff, an anthropology graduate student. Bischoff is the primary developer of the applications, and has written all of the code and manages the database. 

“I never thought I'd be managing websites and Linux servers as an archaeologist, but not only have I learned how to do these things working on CatMapper, I'm now in a full-time position with the Center for Archaeology and Society where I'm using these same skills as the database manager,” Bischoff said.

Daniel Hruschka

Professor Daniel Hruschka

The applications have four functions: exploring information on hundreds of thousands of categories; translating categories from new datasets; bridging data across datasets; and documenting and sharing users’ prior work. 

“The big thing is how do you determine what counts as the same across different datasets when bringing data together?” Hruschka said. “Other people have done comparative studies like this where they have brought data together, but it's a challenging task and it involves tons of decisions. And these decisions are usually not well-documented. So if you want to try and replicate what someone has done in the past, it's almost impossible.”

The web-based applications are free and Hruschka said users include scholars and policymakers, as well as both graduate and undergraduate students. The team is also aiming to make it useful for a wider range of everyday users.

The CatMapper team also includes Matthew Peeples, associate professor at the School of Human Evolution and Social Change, and Sharon Hsiao, assistant professor at the School of Engineering at Santa Clara University.

More Science and technology

 

Man crouched in the dirt in a desert landscape.

Lucy's lasting legacy: Donald Johanson reflects on the discovery of a lifetime

Fifty years ago, in the dusty hills of Hadar, Ethiopia, a young paleoanthropologist, Donald Johanson, discovered what would…

A closeup of a silicon wafer next to a molded wafer

ASU and Deca Technologies selected to lead $100M SHIELD USA project to strengthen U.S. semiconductor packaging capabilities

The National Institute of Standards and Technology — part of the U.S. Department of Commerce — announced today that it plans to…

Close-up illustration of cancer cells

From food crops to cancer clinics: Lessons in extermination resistance

Just as crop-devouring insects evolve to resist pesticides, cancer cells can increase their lethality by developing resistance to…