I am Zhangyang Qi (Nicky name: Alex Chi, Chinese name: ๆˆšๅผ ๆ‰ฌ), a third-year Ph.D. student in computer science at The University of Hong Kong (HKU) since Sep 2022, advised by Prof. Hengshuang Zhao and Prof. Yizhou Yu. I also work as a Research Intern at Shanghai AI Laboratory, supervised by Jiaqi Wang and Tong Wu.

My research interest includes multimodal language models for 3D scene understanding and interactions.

  • Video language models
  • 3D point language models
  • Large language models
  • 3D scene understanding

I am set to graduate in August 2026 and am actively exploring opportunities in my career. I welcome any inquiries to reach out to me via WeChat: openainvidia. Attached are my English Resume and Chinese Resume for your reference.

๐Ÿ”ฅ News

  • 2024.03: ย ๐ŸŽ‰๐ŸŽ‰ GPT4Point has been accept by CVPR 2024.
  • 2023.10: ย ๐ŸŽ‰๐ŸŽ‰ OCBEV has been accept by 3DV 2024.
  • 2022.09: ย ๐ŸŽ‰๐ŸŽ‰ Join HKU as a Ph.D. student.
  • 2022.07: ย ๐ŸŽ‰๐ŸŽ‰ Got bachelorโ€™s degree from HIT with Top Ten Outstanding Students and Outstanding Graduate.

๐Ÿ“– Educations

  • 2022.09 - present, Ph.D. in Computer Science, The University of Hong Kong (HKU).
  • 2018.08 - 2022.07, Bachelor in Information Engineering, Harbin Institute of Technology (HIT).

๐Ÿ“ Publications

CVPR 2024 Highlight
sym

GPT4Point: A Unified Framework for Point-Language Understanding and Generation

Zhangyang Qi, Ye Fang, Zeyi Sun, Xiaoyang Wu, Tong Wu, Jiaqi Wang, Dahua Lin, Hengshuang Zhao

  • The first object-level 3D point cloud multimodal large language model, unifying both point cloud understanding and generation tasks.

[Project]ย  [Paper]ย  [Code]ย 

3DV 2024
sym

OCBEV: Object-Centric BEV Transformer for Multi-View 3D Object Detection

Zhangyang Qi, Jiaqi Wang, Xiaoyang Wu, Hengshuang Zhao

  • An object-centric BEV (Birdโ€™s Eye View) autonomous driving 3D object detection framework, achieving performance improvements on the nuScenes dataset with half the training data.

[Paper]ย 

Arxiv 2025
sym

GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models

Zhangyang Qi, Zhixiong Zhang, Ye Fang, Jiaqi Wang, Hengshuang Zhao

  • The first to utilize a video-based large language model for indoor scene understanding.

[Project]ย  [Paper]ย  [Code]ย 

Arxiv 2025
sym

Tailor3D: Customized 3D Assets Editing and Generation with Dual-Side Images

Zhangyang Qi, Yunhan Yang, Mengchen Zhang, Long Xing, Xiaoyang Wu, Tong Wu, Dahua Lin, Xihui Liu, Jiaqi Wang, Hengshuang Zhao

  • Our work introduces a novel framework for 3D object generation and editing, leveraging dual-view image manipulation.

[Project]ย  [Paper]ย  [Code]ย 

๐ŸŒ Experiences

Shanghai AI Laboratory, Shanghai, China
2022.07 โ€“ Present

  • โ€‹Research Intern, Supervisors: Jiaqi Wang, Tong Wu
  • Research on 3D and video language models, developing the GPT4Point, GPT4Point++, and GPT4Scene.
  • Curated training data for InternLM-XComposer series and V3Det dataset.

Tencent PCG, Shenzhen, China
2021.12 โ€“ 2022.05

  • โ€‹Research Intern
  • Built CLIP-based cross-modal alignment via contrastive learning for image-text matching.
  • Designed joint training paradigms enhancing embedding alignment in multimodal retrieval.

๐ŸŽ– Awards

  • Hong Kong PhD Fellowship Scheme (HKPFS), 2022.
  • HKU Presidential Scholarship (HKUPS), 2022.
  • Top Ten Students of Harbin Institute of Technology, 2021.
  • National Scholarship, 2020.

๐Ÿ’ป Professional Services

  • Conference reviewer: CVPRโ€™24,25, ICCVโ€™25.
  • Teaching assistant: DASC7606: Deep Learning (Graduate course @ HKU), 2023 Spring, 2024 Spring, 2024 Fall.