Attention, Vision, Language, and Reasoning

We are interested in the intersection of attention, vision, language, and reasoning.
We study the relationships of these components and develop models to integrate these modalities to acquire, learn, and reason with the information. For example, our research shows importance of attention and its integration with a number of visual and language tasks such as sentiment prediction, action recognition, object recognition (alleviating adversarial examples), human-object interaction, social relationship recognition, and video storytelling.
Our recent work also leverages attention as an interface to understand and diagnose the reasoning process leading to task outcomes, and develops neural networks with enhanced interpretability and performance. So far we have developed datasets and models for image captioning and visual-question answering in 360° environments.

SALICON: Data-driven Approaches to Saliency Research

We study saliency and make a number of data and model contributions to unaddressed issues in saliency research (see here for details). SALICON is an ongoing effort in understanding and predicting attention in complex natural scenes with big data paradigm and data-driven approaches. So far we have:
- Innovated a new experimental method for gaze collection that enables: crowdsourcing gaze data (for both computational and clinical studies), and studies with groups not good with eye trackers (e.g., kids, chimps).
- Collected the largest-to-date gaze data for training and benchmarking saliency models as well as encouraging methods that leverage multiple annotation modalities from MS COCO.
- Developed a deep neural network based model that bridges the "semantic gap" in predicting where people look, and currently ranks top in the MIT Saliency Benchmark.
- Developed an adversarial network to anticipate future gaze in videos.
- Co-hosted the large-scale saliency challenge in LSUN workshop yearly at CVPR.

Deep Learning and Unsupervised Representation Learning

We propose a number of theoretical and application innovations in machine learning methods. We study methods to make deep neural networks more efficient, scalable, and generalizable. We develop fundamental representations and architectures for visual, behavioral and neural data, addressing domain-specific needs and overcoming their challenges. We are also interested in visualizing and interpreting deep neural networks.
- Direction concentration learning: enhancing congruency in machine learning
- Learning to learn from noisy labeled data
- Efficient deep neural networks
- Unsupervised representation learning
- Multi-task learning
Below show examples of deep learning methods for attention and vision studies as well as for emerging healthcare and brain science research.

Artificial Intelligence for Mental Health

We develop neural networks and other AI technologies to understand neurodevelopmental and neuropsychiatric disorders for automatic screening and improved personalized care. For example, atypical attention and behaviors play a significant role in developing impaired skills (e.g., social skills) and reflects a number of common developmental / psychiatric disorders including ASD, ADHD, and OCD. Our study integrates several techniques including behavioral, fMRI and computational modeling to characterize the heterogeneity in these disorders and to develop clinical solutions. We are lucky to work with leading scientists and clinicians including Ralph Adolphs, Jed Elison, Suma Jacob, Christine Conelea, Sherry Chan, and Kelvin Lim.
- Classifying individuals with ASD through facial emotion recognition and eye-tracking
- Identifying people with autism with deep learning and multi-modal distillation.
- Classifying ages of toddlers based on machine learning and eye-tracking
- Atypical attention in autism quantified through model-based eye tracking
- Revealing the world of autism through the lens of a camera

Artificial Intelligence for Neural Decoding

We are interested in areas bridging artificial intelligence and human functions. We have developed a neural decoder to infer human motor intention based on peripheral nerve neural recordings and demonstrated the first 15 degree of freedom motor decoding with amputee patients. We collaborate with talented engineers and scientists on this exciting topic: Zhi Yang and Edward Keefer.

- Human motor decoding from neural signals: a review


Previous Research