I am currently a Research Assistant at Westlake University by Prof. Peidong Liu. Previously, I obtained a B.Eng. in Information Engineering from Guangdong University of Technology in 2024. In summer 2023, I visited the University of Cambridge to study deep learning and computer vision. Under the supervision of Prof. Wei Meng, I studied robotics and SLAM, gaining extensive hands-on experience in debugging and deploying real robotic systems.
My research interest lies in spatial intelligence, 3D/4D vision, multimodal learning, and world models. More specifically, I aim to explore how AI can learn robust and structured spatial information from visual observations, so that they can better represent geometry, semantics, dynamics, and affordances in the physical world. Ultimately, I hope these spatial representations can serve as a foundation for downstream VLMs, VLAs, and world models, enabling more effective reasoning, planning, and action for robotic system in real-world environments.
* denotes equal contribution or advising; † denotes corresponding author
E-MoFlow jointly learns 6-DoF egomotion and optical flow from events in a fully unsupervised paradigm, without explicit depth estimation.
SIU3R is a feed-forward framework for simultaneous scene understanding and reconstruction from unposed images, unifying reconstruction with semantic, instance, panoptic, and text-referred segmentation.
Casual3DHDR reconstructs sharp HDR Gaussians from videos, jointly optimizing exposure time, CRF, camera motion, and the HDR scene.
From a single blurry image and event stream, BeNeRF recovers neural radiance fields and camera motion, then decodes the scene into a clear, vivid novel-view video stream.
Blurry input
Blurry input
Blurry input
Blurry input
Reproduced GMT and UH-1 across motion retargeting, policy inference, and real-robot deployment pipelines, then deployed policies on the Unitree G-1 to gain hands-on experience with sim-to-real transfer, whole-body control, and text-to-motion control.
Built a wheeled robot with an onboard arm for automated warehouse sorting. The system integrated STM32, Raspberry Pi, and OpenMV hardware, implemented forward and inverse kinematics for arm control, designed communication and motion control for the mobile chassis, and used visual recognition algorithms.
Deployed RealSense T265 and ZED cameras for UAV visual odometry, and evaluated ORB-SLAM2, Stereo-DSO, and other SLAM algorithms in real-world flight scenarios.
A unified evaluation toolkit for large multimodal models across text, image, video, and audio tasks, with an emphasis on reproducible, efficient, and trustworthy evaluation.
Thanks to Wenyi Zhang for taking the portrait for my homepage.