Storage Systems for Efficient Processing and Storing Big Data

With the prevalent of big data applications, huge volumes of data are collected. Processing, preserving and storing the collected data to enable these applications become very challenging. The goal of the project is to push the boundaries of file and storage systems by exploring and developing new technologies and techniques to improve the usability, scalability, security, reliability, and performance of storage systems for efficient processing and storing big data. The current research project includes how to taking advantages of new memory/storage technologies like NVRAM (Non-Volatile RAM), SSD (Solid State Drives) and SWD (Shingled Write Disks), how to integrating new memory/storage technologies into existing/emerging memory/storage hierarchies like cloud storage, backup/archive platforms, and how to exploring new storage research issues like long-term data preservation, software defined storage, storage support for MapReduce/Hadoop and object storage devices. More Information:

Intelligent and High Performance Systems

This research group focuses on computer architecture, high-performance parallel computing, computer systems performance analysis, computing with emerging technologies, approximate computing, and storage systems. We have a particular emphasis on the interaction of software and compilers with computer architecture, and the interaction of computer architecture and circuits. Examples of project

1. Work with researchers at the Center for Research in Intelligent Storage ( on new ideas for hardware and software systems for storing, processing, and communicating large data sets.

2. Using appropriate statistical tools and methods, analyze a large corpus of publications in a specific research field to expose and understand historical trends and to potentially predict future innovations.

More Information:

Big data processing in mobile cloud platforms.

We are developing an intelligent ubiquitous cloud platform that can support end-users running latency-sensitive big data applications on low-powered edge devices, such as mobile phones and Google Glass, which are resource-limited. The project has two primary components: the edge cloud Nebula that can harness nearby edge resources to form a geographically-dispersed cloud that has low latency to large data sources; and a smart middleware that uses machine learning to identify user similarities and opportunities for caching and speculative execution to optimize application performance. The big data aspects of this project include: (1) efficient storage and retrieval of large data streams from edge devices or external sources (e.g. websites, data repositories, and online services); (2) data analysis and mining of these data streams to identify optimization opportunities, and (3) scheduling of cloud computing resources for efficient and timely execution of data-intensive applications.

More Information:

Automated out-of-core execution of parallel message-passing applications.

Big Data analytics represent a range of computational methods that are designed to harness the power of the vast amounts of data generated and collected in various scientific, engineering, military, commercial, and educational domains. The shear size of the data to be analyzed, and in many cases, the complexity of the analysis, has made out-of-core (OOC) distributed computing approaches, which primarily store their data on disks, the preferred computational methodology. Converting existing applications to operate efficiently in an OOC distributed computing fashion is a non-trivial task as it requires a significant software re- engineering effort. Moreover, the current high-level frameworks for developing new OOC distributed computing applications support restrictive computational models, which limit the achievable performance for many classes of computations. This project is developing the methods and software tools to allow message-passing distributed applications to efficiently solve problems whose data and/or memory requirements far exceed the memory available in the underlying computer system. It will develop an OOC distributed computing framework that couples scalable distributed memory message-passing programs with a runtime system that facilitates OOC distributed execution.

More Information: