Biography

I am currently a Research Assistant at Westlake University by Prof. Peidong Liu. Previously, I obtained a B.Eng. in Information Engineering from Guangdong University of Technology in 2024. In summer 2023, I visited the University of Cambridge to study deep learning and computer vision. Under the supervision of Prof. Wei Meng, I studied robotics and SLAM, gaining extensive hands-on experience in debugging and deploying real robotic systems.

My research interest lies in spatial intelligence, 3D/4D vision, multimodal learning, and world models. More specifically, I aim to explore how AI can learn robust and structured spatial information from visual observations, so that they can better represent geometry, semantics, dynamics, and affordances in the physical world. Ultimately, I hope these spatial representations can serve as a foundation for downstream VLMs, VLAs, and world models, enabling more effective reasoning, planning, and action for robotic system in real-world environments.

Publications

* denotes equal contribution or advising; † denotes corresponding author

E-MoFlow poster
E-MoFlow: Learning Egomotion and Optical Flow from Event Data via Implicit Regularization
NeurIPS 2025
Wenpu Li*, Bangyan Liao*, Yi Zhou, Qi Xu, Pian Wan, Peidong Liu†

E-MoFlow jointly learns 6-DoF egomotion and optical flow from events in a fully unsupervised paradigm, without explicit depth estimation.

Event streams to motion and flow
Qualitative results on MVSEC and DSEC benchmarks.
MVSEC
DSEC
SIU3R teaser
SIU3R: Simultaneous Scene Understanding and 3D Reconstruction Beyond Feature Alignment
NeurIPS 2025 Spotlight
Qi Xu*, Dongxu Wei*†, Lingzhe Zhao, Wenpu Li, Zhangchi Huang, Shunping Ji†, Peidong Liu†

SIU3R is a feed-forward framework for simultaneous scene understanding and reconstruction from unposed images, unifying reconstruction with semantic, instance, panoptic, and text-referred segmentation.

Unposed images to 3D understanding and reconstruction
Qualitative results on real-world
Overview
RGB
Sem
Ins
RGB
Sem
Ins
RGB
Sem
Ins
Casual3DHDR pipeline
Casual3DHDR: Deblurring High Dynamic Range 3D Gaussian Splatting from Casually Captured Videos
ACM MM 2025
Shucheng Gong*, Lingzhe Zhao*, Wenpu Li*, Hong Xie†, Yin Zhang, Shiyu Zhao, Peidong Liu†

Casual3DHDR reconstructs sharp HDR Gaussians from videos, jointly optimizing exposure time, CRF, camera motion, and the HDR scene.

Casual video to HDR 3D scene
Novel-view spiral rendering from the reconstructed HDR 3D scene.
Novel view
Spiral rendering Sharp LDR video rendered from the HDR 3D scene with varying exposure time.
BeNeRF: Neural Radiance Fields from a Single Blurry Image and Event Stream
ECCV 2024
Wenpu Li*, Pian Wan*, Peng Wang*, Jinghang Li, Yi Zhou, Peidong Liu†

From a single blurry image and event stream, BeNeRF recovers neural radiance fields and camera motion, then decodes the scene into a clear, vivid novel-view video stream.

Single blur to vivid video stream
Original blurry images vs. decoded sharp videos.
Lego blurred input Blurry input
Decoded stream
Lego Real-world scene
Toys blurred input Blurry input
Decoded stream
Toys Real-world scene
Living Room blurred input Blurry input
Decoded stream
Living Room Synthetic scene
Pink Castle blurred input Blurry input
Decoded stream
Pink Castle Synthetic scene

Robotics Experience

Humanoid Policy Reproduction and Real-Robot Deployment
Xiang Liu*, Wenpu Li*, Lingzhe Zhao*

Reproduced GMT and UH-1 across motion retargeting, policy inference, and real-robot deployment pipelines, then deployed policies on the Unitree G-1 to gain hands-on experience with sim-to-real transfer, whole-body control, and text-to-motion control.

Wheeled Robot Equipped with an Arm for Automatic Storage
Qingrui Zhu*, Wenpu Li*, Wenbin Zheng, Yong Zhang, Junyao Li

Built a wheeled robot with an onboard arm for automated warehouse sorting. The system integrated STM32, Raspberry Pi, and OpenMV hardware, implemented forward and inverse kinematics for arm control, designed communication and motion control for the mobile chassis, and used visual recognition algorithms.

Self-Localization UAV Leveraging Visual Odometry
Wenpu Li*, Guohua Zhang*

Deployed RealSense T265 and ZED cameras for UAV visual odometry, and evaluated ORB-SLAM2, Stereo-DSO, and other SLAM algorithms in real-world flight scenarios.

Open-Source Project

LMMs-Eval project visual

A unified evaluation toolkit for large multimodal models across text, image, video, and audio tasks, with an emphasis on reproducible, efficient, and trustworthy evaluation.

  • Added the VSI-SUPER benchmark and Cambrian-S model support for spatial and video understanding tasks. #1267 #1268
  • Resolved Qwen3-VL and Qwen2.5-VL video bugs affecting timestamp alignment and temporal position encoding. #1261 #1269 #1260 #1244
  • Fixed bugs across vLLM, InternVL, OCRBench, and VSI-Bench to improve inference backend stability and benchmark correctness.

Languages

  • Chinese: Native Language
  • Japanese: Native Language — I was born in Japan and lived there for eight years.
  • English: Intermediate Level
  • Cantonese: Entry Level — I became deeply fond of the language while studying in Guangzhou.

Awards

  • National Bronze Medal in Automatic Storage/Retrieval System at Robocup China open, 2021
  • National Second Prize in Automatic Storage/Retrieval System at Robocup China open, 2022
  • National Second Prize in iCAN Innovation Contest, 2022
  • National Third Prize in Blue Bridge Cup, 2022
  • First Prize in Contemporary Undergraduate Mathematical Contest in Modeling(Guangdong Division), 2022
  • Third Prize in National Undergraduate Electronics Design Contest(Guangdong Division), 2021

Academic Services

  • Journal Reviewer: T-PAMI
  • Conference Reviewer: CVPR, NeurIPS, ECCV, IROS

Acknowledgements

Thanks to Wenyi Zhang for taking the portrait for my homepage.