Yu Liu's academic page

Yu Liu's academic page 

Founder, a tiny stealth venture
Industrial doctoral supervisor, CUHK
liuyuisanai@gmail.com
Google Scholar

News

  • [2024] My team have 21 papers published on NeurIPS/CVPR/TMLR/ICML/ECCV in 2024

  • [2024] Our AIGC product MiaoHua has garnered users over 3,000,000, with a DAU exceeding 530,000, all within a remarkable 9-day post-launch.

  • [2024] We are granted the 吴文俊奖 - 科技进步一等奖

  • [2024] Our OpenDILab achieves 21,000+ stars on GitHub!

  • [2023] My team have 20 papers published on TPAMI/ICCV/ICLR/NeurIPS/IROS/AAAI in 2023

  • [2023] My team won the championship of CARLA Autonomous Driving Challenge

  • [2022] My team have 9 papers published on ECCV/CoRL/NeurIPS/AAAI in 2022

  • [2022] We release DI-Star, an implementation of AlphaStar in pyTorch, beating pro players with 6000+ MMR.

  • [2021] My team won the best paper of ICCV21 MFR workshop

  • [2021] My team won 3 championships of ICCV21 The Masked Face Recognition Challenge

  • [2021] We release OpenDILab, an open source decision intelligence platform

  • [2020] 5 papers with 1 oral got published on CVPR/ECCV 2020

  • [2020] My team won 2 championships of ActivityNet on the Spatio-temporal Action Localization (AVA) track and the Trimmed Activity Recognition (Kinetics) track

  • [2020] My team won the championship of NIST FRVT 1:N, a 12-million-level commercial facial recognition benchmark held by US government.

  • [2019] 7 papers with 4 oral presentations published on ICCV/CVPR/AAAI in 2019

  • [2019] I won 4 champions in 4 ICCV AI challenges:MMIT (solutions), OpenImage Instance Segmentation Challenge (solutions), OpenImage Object Detection Challenge (solutions), LFR 2019 (model and report)

  • [2019] I was granted the Google PhD Fellowship 2019.

  • [2018] 4 papers published by CVPR/ECCV/IJCV/AAAI in 2018.

  • [2017] 6 papers published on CVPR/ICCV/NIPS/AAAI/T-PAMI in 2017.

  • [2016] We win the 1st place in ECCV-MOT16 Challenge! code

  • [2016] We win the 1st place in ImageNet Challenge 2016! code

About me

I am in the process of launching a startup venture. Previously, I was the Executive Director of Research and GM at SenseTime Group, spearheading large-scale AIGC and multi-modal interactive models, where I led a team of approximately 100 top-tier researchers and developers, utilizing over 4,000 GPUs to drive innovative technology and products. I hold a PhD from MMLab, CUHK, supervised by Prof. Xiaogang Wang, and have won multiple international AI competitions, along with a Google PhD Fellowship.

Working Experience

Publications

See full list at Google Scholar.
*equal contribution +corresponding author

Large model for Multi-modal generation, AIGC

  • MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines
    Dongzhi Jiang, Renrui Zhang, Ziyu Guo, Yanmin Wu, jiayi lei, Pengshuo Qiu, Pan Lu, Zehui Chen, Guanglu Song, Peng Gao, Yu Liu, Chunyuan Li, Hongsheng Li
    2025 ICLR

  • SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction
    Yang Zhou, Hao Shao, Letian Wang, Steven L. Waslander, Hongsheng Li, Yu Liu
    2025 ICLR

  • Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning
    Hao Shao, Shengju Qian, Han Xiao, Guanglu Song, Zhuofan Zong, Letian Wang, Yu Liu+, Hongsheng Li
    (Spotlight) 2024 NeurIPS

  • Instruction-Guided Visual Masking
    Jinliang Zheng, Jianxiong Li, Sijie Cheng, Yinan Zheng, Jiaming Li, Jihao Liu, Yu Liu+, Jingjing Liu, Xianyuan Zhan
    2024 NeurIPS

  • MoVA: Adapting Mixture of Vision Experts to Multimodal Context
    Zhuofan Zong, Bingqi Ma, Dazhong Shen, Guanglu Song, Hao Shao, Dongzhi Jiang, Hongsheng Li, Yu Liu+
    2024 NeurIPS

  • Phased Consistency Model
    Fu-Yun Wang, Zhaoyang Huang, Alexander William Bergman, Dazhong Shen, Peng Gao, Michael Lingelbach, Keqiang Sun, Weikang Bian, Guanglu Song, Yu Liu+, Hongsheng Li, Xiaogang Wang
    2024 NeurIPS

  • CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
    Dongzhi Jiang, Guanglu Song, Xiaoshi Wu, Renrui Zhang, Dazhong Shen, Zhuofan Zong, Yu Liu+, Hongsheng Li
    2024 NeurIPS

  • Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models
    Bingqi Ma, Zhuofan Zong, Guanglu Song, Hongsheng Li, Yu Liu+
    2024 NeurIPS

  • FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis
    Linjiang Huang, Rongyao Fang, Aiping Zhang, Guanglu Song, Si Liu, Yu Liu+, Hongsheng Li
    2024 ECCV

  • Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models
    Xiaoshi Wu, Yiming Hao, Manyuan Zhang, Keqiang Sun, Zhaoyang Huang, Guanglu Song, Yu Liu+, Hongsheng Li
    2024 ECCV

  • ZoLA: Zero-Shot Creative Long Animation Generation with Short Video Model
    Fu Yun Wang, Zhaoyang Huang, Qiang Ma, Guanglu Song, Xudong LU, Weikang Bian, Yijin Li, Yu Liu+, Hongsheng Li
    (Oral) 2024 ECCV

  • Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation
    Fu-Yun Wang, Xiaoshi Wu, Zhaoyang Huang, Xiaoyu Shi, Dazhong Shen, Guanglu Song Yu Liu+, Hongsheng Li
    2024 ECCV

  • Three Things We Need to Know About Transferring Stable Diffusion to Visual Dense Prediciton Tasks
    Manyuan Zhang, Guanglu Song, Xiaoyu Shi, Yu Liu+, Hongsheng Li
    2024 ECCV

  • Enhancing Vision-Language Model with Unmasked Token Alignment
    Jihao Liu, Jinliang Zheng, Boxiao Liu, Yu Liu+, Hongsheng Li
    2024 TMLR

  • Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance
    Dazhong Shen, Guanglu Song, Zeyue Xue, Fu-Yun Wang, Yu Liu+
    2024 CVPR

  • EasyDrag: Efficient Point-based Manipulation on Diffusion Models
    Xingzhong Hou, Boxiao Liu, Yi Zhang, Jihao Liu, Yu Liu+, Haihang You
    2024 CVPR

  • GLID: Pre-training a Generalist Encoder-Decoder Vision Model
    Jihao Liu, Jinliang Zheng, Yu Liu+, Hongsheng Li
    2024 CVPR

  • RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths
    Zeyue Xue, Guanglu Song, Qiushan Guo, Boxiao Liu, Zhuofan Zong, Yu Liu+, Ping Luo
    2023 NIPS

  • Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction
    Zhuofan Zong, Dongzhi Jiang, Guanglu Song, Zeyue Xue, Jingyong Su, Hongsheng Li, Yu Liu+
    2023 ICCV

  • Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding
    Jihao Liu, Tai Wang, Boxiao Liu, Qihang Zhang, Yu Liu+, Hongsheng Li
    2023 ICCV

  • UniFormer: Unifying Convolution and Self-Attention for Visual Recognition
    Kunchang Li, Yali Wang, Junhao Zhang, Peng Gao, Guanglu Song, Yu Liu+, Hongsheng Li, Yu Qiao
    2023 TPAMI

  • MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers
    Jihao Liu, Xin Huang, Jinliang Zheng, Yu Liu+, Hongsheng Li
    2023 CVPR

Large-scale Reinforcement Learning, Embodied AI

  • MaskMA: Towards Zero-Shot Multi-Agent Decision Making with Mask-Based Collaborative Learning
    Jie Liu, Yinmin Zhang, Chuming Li, Zhiyuan You, Zhanhui Zhou, Chao Yang, Yaodong Yang, Yu Liu+, Wanli Ouyang
    2024 TMLR

  • DecisionNCE: Embodied Multimodal Representations via Implicit Preference Learning
    Jianxiong Li, Jinliang Zheng, Yinan Zheng, Liyuan Mao, Xiao Hu, Sijie Cheng, Haoyi Niu, Jihao Liu, Yu Liu+, Jingjing Liu, Ya-Qin Zhang, Xianyuan Zhan
    2024 ICML

  • SmartRefine: An Scenario-Adaptive Refinement Framework for Efficient Motion Prediction
    Yang Zhou, Hao Shao, Letian Wang, Steven L. Waslander, Hongsheng Li, Yu Liu+
    2024 CVPR

  • LMDrive: Closed-Loop End-to-End Driving with Large Language Models
    Hao Shao, Yuxuan Hu, Letian Wang, Guanglu Song, Steven L. Waslander, Yu Liu+, Hongsheng Li
    2024 CVPR

  • Critic-Guided Decision Transformer for Offline Reinforcement Learning
    Yuanfu Wang, Chao Yang, Ying Wen, Yu Liu+, Yu Qiao
    2024 AAAI

  • A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning
    Yinmin Zhang, Jie Liu, Chuming Li, Yazhe Niu, Yaodong Yang, Yu Liu+, Wanli Ouyang
    2024 AAAI

  • LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios
    Yazhe Niu, Yuan Pu, Zhenjie Yang, Xueyan Li, Tong Zhou, Jiyuan Ren, Shuai Hu, Hongsheng Li, Yu Liu+
    (Spotlight) 2023 NIPS

  • Theoretically Guaranteed Policy Improvement Distilled from Model-Based Planning
    Chuming Li, Ruonan Jia, Jie Liu, Yinmin Zhang, Yazhe Niu, Yaodong Yang, Yu Liu+, Wanli Ouyang
    2023 ECAI

  • Accelerating Reinforcement Learning for Autonomous Driving using Task-Agnostic and Ego-Centric Motion Skills
    Tong Zhou, Letian Wang, Ruobing Chen, Wenshuo Wang, Yu Liu+
    2023 IROS

  • Theoretically Guaranteed Policy Improvement Distilled from Model-Based Planning
    Chuming Li, Ruonan Jia, JIAWEI YAO, Jie Liu, Yinmin Zhang, Yazhe Niu, Yaodong Yang, Yu Liu+, Wanli Ouyang
    2023 IJCAI - PRL workshop

  • Efficient Reinforcement Learning for Autonomous Driving with Parameterized Skills and Priors
    Letian Wang, Jie Liu, Hao Shao, Wenshuo Wang, Ruobing Chen, Yu Liu+, Steven L. Waslander
    2023 RSS

  • ReasonNet: End-to-End Driving with Temporal and Global Reasoning
    Hao Shao, Letian Wang, Ruobing Chen, Steven L. Waslander, Hongsheng Li, Yu Liu+
    (CARLA 2022 Champion) 2023 CVPR

  • GoBigger: A Scalable Platform for Cooperative-Competitive Multi-Agent Interactive Simulation
    Ming Zhang, Shenghan Zhang, Zhenjie Yang, Lekai Chen, Jinliang Zheng, Chao Yang, Chuming Li, Hang Zhou, Yazhe Niu, Yu Liu+
    2023 ICLR

  • Cooperative Multi-agent Q-learning with Bidirectional Action-dependency
    Chuming Li, Jie Liu, Yinmin Zhang, Yuhong Wei, Yazhe Niu, Yaodong Yang, Yu Liu+, Wanli Ouyang
    2023 AAAI

  • Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion Transformer
    Hao Shao, Letian Wang, Ruobing Chen, Hongsheng Li, Yu Liu+
    2022 CoRL

Large-scale Optimiation, Computer Vision

  • Teach-DETR: Better Training DETR with Teachers
    Linjiang Huang, Kaixin Lu, Guanglu Song, Liang Wang, Si Liu, Yu Liu+, Hongsheng Li
    2023 TPAMI

  • DETRs with Collaborative Hybrid Assignments Training
    Zhuofan Zong, Guanglu Song, Yu Liu+
    2023 ICCV

  • Decoupled DETR: Spatially Disentangling Localization and Classification for Improved End-to-End Object Detection
    Manyuan Zhang, Guanglu Song, Yu Liu+, Hongsheng Li
    2023 ICCV

  • Masked Autoencoders Are Stronger Knowledge Distillers for Object Detectors
    Shanshan Lao, Guanglu Song, Boxiao Liu, Yu Liu+, Yujiu Yang
    2023 ICCV

  • UniKD: Universal Knowledge Distillation for Mimicking Homogeneous or Heterogeneous Object Detectors
    Shanshan Lao, Guanglu Song, Boxiao Liu, Yu Liu+, Yujiu Yang
    2023 ICCV

  • Large-batch Optimization for Dense Visual Predictions
    Zeyue Xue, Jianming Liang, Guanglu Song, Zhuofan Zong, Liang Chen, Yu Liu+, Ping Luo+
    2022 NeurIPS

  • UniNet: Unified Architecture Search with Convolution, Transformer, and MLP
    Jihao Liu, Xin Huang, Guanglu Song, Hongsheng Li+, Yu Liu+
    2022 ECCV

  • Self-slimmed Vision Transformer
    Zhuofan Zong, Kunchang Li, Guanglu Song, Yali Wang, Yu Qiao, Biao Leng, Yu Liu+
    2022 ECCV

  • TokenMix: Rethinking Image Mixing for Data Augmentation in Vision Transformers
    Jihao Liu, Boxiao Liu, Hang Zhou, Hongsheng Li+, Yu Liu+
    2022 ECCV

  • Unifying Visual Perception by Dispersible Points Learning
    Jianming Liang, Guanglu Song, Biao Leng, Yu Liu+
    2022 ECCV

  • Towards Robust Face Recognition with Comprehensive Search
    Manyuan Zhang, Guanglu Song, Yu Liu+, Hongsheng Li
    2022 ECCV

  • Rethinking Robust Representation Learning Under Fine-grained Noisy Faces
    Bingqi Ma, Guanglu Song, Boxiao Liu, Yu Liu+
    2022 ECCV

  • UniFormer: Unified Transformer for Efficient Spatial-Temporal Representation Learning
    Kunchang Li, Yali Wang, Peng Gao, Guanglu Song, Yu Liu+, Hongsheng Li, Yu Qiao
    2022 ICLR

  • Switchable K-class Hyperplanes for Noise-robust Representation Learning
    Boxiao Liu, Guanglu Song, Manyuan Zhang, Haihang You, Yu Liu+
    2021 ICCV

  • Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization
    Junting Pan, Siyu Chen, Mike Zheng Shou, Yu Liu+, Jing Shao, Hongsheng Li
    2021 CVPR

  • Discriminability Distillation in Group Representation Learning
    Manyuan Zhang, Guanglu Song, Hang Zhou, Yu Liu+
    2020 ECCV

  • Learning Where to Focus for Efficient Video Object Detection
    Zhengkai Jiang, Yu Liu+, Ceyuan Yang, Jihao Liu, Gao Peng, Qian Zhang, Shiming Xiang, Chunhong Pan
    2020 ECCV

  • Search to Distill: Pearls are Everywhere but not the Eyes
    Yu Liu+, Xuhui Jia, Mingxing Tan, Raviteja Vemulapalli, Yukun Zhu, Bradley Green, Xiaogang Wang
    (Oral) 2020 CVPR

  • Revisiting the Sibling Head in Object Detector
    Guanglu Song, Yu Liu+, Xiaogang Wang
    (OpenImage 2019 Champion) 2020 CVPR

  • Rotate-and-Render: Unsupervised Photorealistic Face Rotation from Single-View Images
    Hang Zhou, Jihao Liu, Ziwei Liu, Yu Liu+, Xiaogang Wang
    2020 CVPR

  • KPNet: Towards Minimal Face Detector
    Guanglu Song, Yu Liu+, Yuhang Zang, Xiaogang Wang, Biao Leng, Qingsheng Yuan
    (Oral) 2020 AAAI

  • Temporal Interlacing Network
    Hao Shao, Shengju Qian, Yu Liu+
    2020 AAAI

  • Differentiable Kernel Evolution
    Yu Liu+, Jihao Liu, Ailing Zeng, Xiaogang Wang
    2019 IEEE ICCV

  • Towards Flops-constrained Face Recognition
    Yu Liu*, Guanglu Song*, Manyuan Zhang*, Jihao Liu*, Yucong Zhou, Junjie Yan
    (Top-1 Solution) 2019 ICCV Lightweight Face Recognition Challenge & Workshop

  • Gradient Harmonized Single-stage Detector
    Buyu Li*, Yu Liu*, Xiaogang Wang
    (Oral) 2019 AAAI

  • Conditional Adversarial Generative Flow for Controllable Image Synthesis
    Rui Liu, Yu Liu, Xinyu Gong, Xiaogang Wang, Hongsheng Li
    2019 CVPR

  • Exploring Disentangled Feature Representation Beyond Face Identification
    Yu Liu+, Fangyin Wei, Jing Shao, Lv Sheng, Junjie Yan, Xiaogang Wang
    2018 CVPR

  • Transductive Centroid Projection for Semi-supervised Large-scale Recognition
    Yu Liu+, Guanglu Song, Jing Shao, Xiao Jin, Xiaogang Wang
    2018 ECCV

  • Rethinking Feature Discrimination and Polymerization for Large-scale Recognition
    Yu Liu, Hongyang Li, Xiaogang Wang
    2017 NIPS deep learning workshop

  • Recurrent Scale Approximation for Object Detection in CNN
    Yu Liu, Hongyang Li, Junjie Yan et al.
    2017 IEEE ICCV

  • Beyond Trade-off: Accelerate FCN-based Face Detector with Higher Accuracy
    Guanglu Song*, Yu Liu*, Ming Jiang, Yujie Wang, Junjie Yan, Biao Leng
    2018 CVPR

  • Quality Aware Network for Set to Set Recognition
    Yu Liu, Junjie Yan, Wanli Ouyang
    2017 CVPR

  • Knowledge Distillation via Route Constrained Optimization
    Xiao Jin, Baoyun Peng, Yichao Wu, Yu Liu, Jiaheng Liu, Ding Liang, Junjie Yan, Xiaolin Hu
    (Oral) 2019 IEEE ICCV

  • Correlation Congruence for Knowledge Distillation
    Baoyun Peng, Xiao Jin, Jiaheng Liu, Shunfeng Zhou, Yichao Wu, Yu Liu, Dongsheng Li, Zhaoning Zhang
    2019 IEEE ICCV

  • Talking Face Generation by Adversarially Disentangled Audio-Visual Representation
    Hang Zhou, Yu Liu, Ziwei Liu, Ping Luo, Xiaogang Wang
    (Oral) 2019 AAAI

  • Zoom Out-and-In Network with Map Attention Decision for Region Proposal and Object Detection
    Hongyang Li, Yu Liu, W Ouyang, X Wang
    2018 IJCV

  • Region-based Quality Estimation Network for Large-Scale Person Re-identification
    Guanglu Song, Biao Leng, Yu Liu, Congrui Hetang, Shaofan Cai
    2018 AAAI

  • Learning Deep Features via Congenerous Cosine Loss for Person Recognition
    Yu Liu, Hongyang Li, Xiaogang Wang
    arxiv:1702.06890, 2017

  • Scale-Aware Face Detection
    Zekun Hao, Yu Liu, Hongwei Qin, Junjie Yan
    2017 CVPR

  • POI: Multiple Object Tracking with High Performance Detection and Appearance Feature
    F Yu, W Li, Q Li, Y Liu, X Shi, J Yan
    (Top-1 Solution) 2016 ECCV workshop

  • Crafting GBD-Net for Object Detection
    X Zeng, W Ouyang, J Yan, H Li, T Xiao, K Wang, Y Liu, Y Zhou, B Yang, ...
    T-PAMI

  • 3D object understanding with 3D Convolutional Neural Networks
    B Leng, Y Liu, K Yu, X Zhang, Z Xiong
    Information Sciences 366, 188-201, 2016

Honors and Awards

  • Won the 1st prize for scientific and technological progress, CAAI, 2024

  • Won the 1st place in CARLA Autonomous Driving Challenge 2022

  • Won the 1st place in ActivityNet 2020, AVA track

  • Won the 1st place in ActivityNet 2020, Kinetics track

  • Won the 1st place in NIST FRVT held by US government in 2020, 2021 and 2022

  • Won the 1st place in ICCV19 Multi-Moments in Time (MIT) Challenge

  • Won the 1st place in Google OpenImage Object Detection Challenge 2019

  • Won the 1st place in Google OpenImage Instance Segmentation Challenge 2019

  • Google PhD Fellowship in 2019 (1/China, 50/world)

  • Won the 1st place in ICCV19 Lightweight Face Recognition Challenge

  • Won the 1st place in NIST-FRVT threshold based 1:N track 2018

  • Won the 1st place in Multiple Objects Tracking Challenge (MOT16) in 2016

  • Won the 1st place in detection track of ImageNet (ILSVRC) in 2016

  • Won the best undergraduate dissertation in 2016 (1/230)

  • IEEE-Microsoft Undergraduate Fellowship in 2016 (40/world)

  • The Outstanding Winner of Challenge Cup in 2015 (top 1/China)

Projects & Datasets

GitHub

  • OpenDILab, a generalized Decision Intelligence engine.

  • X-Temporal, Easily implement SOTA video understanding methods with PyTorch on multiple machines and GPUs

  • CaffeMex v2.3, a multi-GPU & memory-reduced MAT-Caffe on LINUX and WINDOWS

  • Labeled Pedestrains in the Wild, a large scale pedestrain re-identification benchmark