Analyzing Alzheimer's Disease Neuroimaging Initiative (ADNI) data

The ADNI datasets typically include several patient-related features like age, sex, physical and mental condition measurements, state of the disease, as well as images. One challenge, among many, is how to integrate information from such diverse sources to explain the progression of Alzheimer's disease. Several components of the data are high-dimensional, and a good analysis of this data must also include uncertainty measures and quality guarantees.

Bridge Clinical Practice with Data Science

Develop new data science methodologies and adapt existing techniques to clinical data towards discovering new knowledge and improving care delivery. Clinical applications of data science often pose challenges that other applications do not. A few typical examples include an unusually high standard of external validity, the expectation that a model can be transferred from one health care system to another; a shift from association towards causation; focus on correlated longitudinal data with informatively and longitudinally missing values.

More Information:

Compare climate simulation model performances

There are many facets to this research, here are three examples: (a) downscaling issues: Many global climate models produce output on a coarse spatial scale, owing to computational and other limitations, and Physics-related issues. In order to use such outputs, we need to obtain climate-related forecasts on a finer spatial resolution level. How do we do that? (b) Global climate models have varying strengths and weaknesses, depending on the details of how they are run, the Physics behind them and so on. How do we create an ensemble for future climate projections, based on all the model outputs? (c) How do we check if any climate model is able to replicate complex, known features that have been discovered for the Earth's climate?

Natural Language Processing in Health Sciences: Work on analysis of electronic health record (EHR) big data for improving quality of patient care as well as discovery of pharmacovigilance knowledge through mining a large scale of biomedical literature.

More Information:

Natural Language Processing in Health Sciences:

There is significant amount of unstructured data (i.e. free texts) in healthcare domain, including clinical notes in electronic health records, biomedical literature and social media. Natural language processing is the technique to analyze textual big data. Examples on this research area include: (1) extracting clinical information (e.g., symptom) from clinical notes; (2) developing algorithms to identify patients’ phenotypes in electronic health records; (3) discover novel drug-drug interactions from biomedical literature; and (4) Understanding adverse events related to drugs or supplements in social medial.

More information:

Statistical inference in high dimensions

When we analyze high-dimensional data, what sort of assumptions do we need to make in order to guarantee that our analysis is credible? What are the related uncertainty measures? How does modern high-dimensional analysis fit into traditional Bayesian or non-Bayesian statistical paradigms? Where/when/why/how does cross-validation or randomization or permutation methods work? How can we conduct the analysis in a (both statistically and computationally) efficient way? This project involves theoretical statistics, apart from computations.

Statistical Machine Learning in Biology: Develop statistical machine learning methods for analyzing high-dimensional data sets arising from genomics, neuroscience, and other areas of biology. More specifically, develop unsupervised learning methods such as probabilistic graphical models, cluster analysis, and dimension reduction to uncover patterns from massive data set.

More Information:

Statistical Methods for Education

This research involves the design and analysis for educational measurements. Examples of research questions include: (1) How do we design a computerized adaptive test efficiently so that fewer test items are needed to assess the proficiency of the students? (2) How do we model students' responses to the test items and how do we make inference on the students and items? (3) How do we model students' learning path and help the students improve their performance?

More Information:

Statistical modeling, Machine Learning, and Causal Analysis Methods in the Field of Biology and Medicine
(1) devise and implement new causal discovery methods that are specifically tailored to the characteristics of biomedical data, (2) benchmark novel and existing causal discovery and predictive modeling methods in order to evaluate their efficacy on biomedical data, (3) design analytical experiments to discover critical contributing factors to pathologies and diseases from multimodality high dimensional high volume data to aid the development of diagnostic technologies and identification of potential treatment targets.

More Information: