Explainable and Generalizable AI with Attention + Reasoning + Language

We are interested in AI systems that perceive, learn, and reason to perform real-world tasks. We aim to develop such systems that are explainable, generalizable and trustworthy. Below are a few current topics:
- Attention and reasoning: our research shows the importance of attention in a variety of tasks (e.g., action recognition, sentiment prediction, social relationship recognition, and video storytelling). It has also pioneered a series of studies that iteratively select information (i.e., attend) and reason with the information to perform AI tasks (e.g., AiR, IQVA).
- Explainable AI: our research highlights explainability that encodes and presents the decision-making process of deep learning models, alleviating the black-box issue in machine learning (e.g., neural module networks, reasoning-aware attention). We also develop large-scale explanation data and models.
- Generalizable AI: our research further enables AI to perceive and reason with commonsense or domain-specific knowledge as humans do (e.g., in saliency prediction, human-object interaction). Our recent knowledge incorporation and augmentation methods achieve state-of-the-art in visual reasoning.

SALICON: Data-driven Approaches to Saliency Research

We study saliency and make a number of data and model contributions to unaddressed issues in saliency research (see here for details). SALICON is an ongoing effort in understanding and predicting attention in complex natural scenes with big data paradigm and data-driven approaches. So far we have:
- Innovated a new experimental method for gaze collection that enables: crowdsourcing gaze data (for both computational and clinical studies), and studies with groups not good with eye trackers (e.g., kids, chimps).
- Collected the largest-to-date gaze data for training and benchmarking saliency models as well as encouraging methods that leverage multiple annotation modalities from MS COCO.
- Developed a deep neural network based model that bridges the "semantic gap" in predicting where people look, and currently ranks top in the MIT Saliency Benchmark.
- Developed an adversarial network to anticipate future gaze in videos.
- Co-hosted the large-scale saliency challenge in LSUN workshop yearly at CVPR.

Deep Learning and Unsupervised Representation Learning

We propose a number of theoretical and application innovations in machine learning methods. We study methods to make deep neural networks more efficient, generalizable, and trustworthy. We develop representations and architectures for visual, behavioral and neural data, addressing domain-specific needs and overcoming their challenges. We are also interested in visualizing and interpreting deep neural networks.
- Trustworthiness prediction
- Enhancing congruency in machine learning
- Learning to learn from noisy labeled data
- Efficient deep neural networks
- Unsupervised representation learning

Artificial Intelligence for Mental Health

We develop neural networks and other AI technologies to understand neurodevelopmental and neuropsychiatric disorders for automatic screening and improved personalized care. For example, atypical attention and behaviors play a significant role in developing impaired skills (e.g., social skills) and reflects a number of common developmental / psychiatric disorders including ASD, ADHD, and OCD. Our study integrates several techniques including behavioral, fMRI and computational modeling to characterize the heterogeneity in these disorders and to develop clinical solutions. We are lucky to work with leading scientists and clinicians including Ralph Adolphs, Jed Elison, Suma Jacob, Christine Conelea, Sherry Chan, and Kelvin Lim.
- Identifying people with autism with deep learning and multi-modal distillation.
- Classifying ages of toddlers based on machine learning and eye-tracking
- Atypical attention in autism quantified through model-based eye tracking
- Revealing the world of autism through the lens of a camera

Artificial Intelligence for Neural Decoding

We are interested in areas bridging artificial intelligence and human functions. We have developed a neural decoder to infer human motor intention based on peripheral nerve neural recordings and demonstrated the first 15 degree of freedom motor decoding with amputee patients. It has also shown an example of true bi-directional human-machine interface with mutual learning and adaptation: the human learns over time to better express their intent to mind-control prosthetic hand movement, and the machine learns from the choices and refined the model. We collaborate with talented engineers, scientists, and clinicians on this exciting topic: Zhi Yang, Edward Keefer, and Jonathan Cheng.
- Artificial intelligence enables real-time and intuitive control of prostheses via nerve interface
- A bioelectric neural interface towards intuitive prosthetic control for amputees
- Human motor decoding from neural signals: a review


Previous Research