Big Data at the University of Minnesota

April 10, 2017

Today, we are squarely in the era of “Big Data’’ or “Data as King,’’ and Computer Science is at the heart of this revolution. Realizing long ago that the amount of data being collected far exceeds what humans can analyze without assistance, Computer Science has invented new methods to analyze very large scale and dynamic datasets in order to help explain the underlying phenomena behind the data for science, engineering, and business.

Minnesota CS&E is one of the undisputed leaders, having established its credentials years before Big Data and the field of Data Science became fashionable. According to Microsoft Academic Search (academic.research.microsoft.com), Minnesota is ranked 8th worldwide out of 4,796 institutions for data science, and 4th worldwide amongst academic institutions.

For years, CS&E faculty have embraced “Data as King’’ as a guiding research principle.  What stands out is the extensive breadth and depth of our research in addressing the core challenges across the entire “big data pipeline’’ that covers the spectrum of algorithms for data analysis, infrastructure, and applications.

Big Data Methods

To yield insights hidden in the vast data across a variety of domains, Minnesota CS&E faculty have developed a wide-range of powerful data analysis methods.

  • Data Mining: The goal is to find salient and persistent patterns in data, as well as unusual or outlying data points within a large data set (“a needle in a haystack’’).
  • Clustering: The goal is to find groups of similar data points, according to suitable distance/similarity measures, geometric properties, or other representations of data objects.
  • Large Scale Optimization: Scaling up models and methods to billions of data points has emerged as a key challenge. Optimization is a core technology at the heart of many machine learning and data mining problems.
  • Predictive Analytics: In recent years, important advances have been made to better predict future possibilities and trends.
  • Visualization and Visual Analytics: Visualization is a key tool to extract patterns and intuition from large complex data sets and the fastest way to convey big data concepts.
Big Data Infrastructure

To cope with the massive amounts of data inherent in data science domains, Minnesota CS&E faculty have developed computer systems infrastructure to enable the scalable storage, transmission, and computing of data to support data analysis methods.

  • Storage: CS&E research is focused on the technology needed to handle and preserve large-volume data, including technologies like NVRAM, SSD, SWD, Tiera, and Spatial Hadoop.
  • Computation: CS&E faculty are working on the computational infrastructure needed to support distributed data-intensive computing, including work on Nebula and NoSQL cloud data storage systems.
  • Networking: The work our faculty are doing promises to help guide the evolution of future Internet services with an aim to improve quality of experience and system efficiencies.
Big Data Applications

Minnesota CS&E faculty actively work on big data applications as lead collaborators in many areas, including: environmental science, genomics, social networks, social computing and business intelligence, and smart health. They lead big data projects in collaboration with scientists from the medical school, business school and school of biological sciences at the University of Minnesota and other institutions.

  • Genomics: CS&E faculty work on big data analytics of various genomic data, studying biomedical applications such as evolution of pathogen affecting humans, gut microbiomes, cancer biology, and chemical-genetic interactions in drug design.
  • Smart Health: Through collaborations with U of M’s Health Science Center, our faculty are analyzing millions of unstructured electronic health records (EHRs) that often represent an incomplete sampling of information about patients’ health issues and the way in which they are treated. 
  • Business Intelligence & Social Computing: Minnesota CS&E is a leader in the data intensive field of social computing.  In particular, we have been a long-time innovator in important areas such as recommender systems, market basket analysis, social network analysis, and text mining.
  • Environmental Sciences: A wealth of climate and ecosystem data is now available from satellite and ground-based sensors, while climate model simulations offer huge potential for understanding the behavior of the Earth’s ecosystem and for advancing the science of climate change.
Leadership in Big Data

Minnesota CS&E is a leader not only in the technical aspects of data science but also in the growth and expansion of the field through numerous initiatives, well-regarded labs and projects, and highly visible national collaborations.

Data Science Initiatives

Labs and Selected Projects

Large-Scale Data Science Collaborations

Big Data Around the U

Big Data Faculty Experts