Jingxiang Sun (孙景翔)

Email: sunjingxiang_stark[at]126.com      Google Scholar      Github      Twitter      Youtube

I am a third-year Ph.D. student in the Department of Automation at Tsinghua University, advised by Prof. Yebin Liu. Before joining Tsinghua, I earned my M.S. from the Department of ECE, University of Illinois at Urbana-Champaign. My research interests focus on multimodal models, digital avatars, and both 2D & 3D content generation.

Currently, I am a student researcher in the AI-Mediated Reality and Interaction Research Group at NVIDIA Research. Previously, I worked as an AGI student researcher at DeepSeek AI, where I led development of the high-quality 3D generation project DreamCraft3D and contributed to the vision-language model DeepSeek-VL.

🔥 I'm on the job market! If you have any opportunities involving world models, multimodal models, 2D/3D generation, or virtual human modeling, please reach out via email at or .

News

2024-10: We introduce DreamCraft3D++, a new technique for high-quality 3D content generation!
2024-08: One paper is accepted by SIGGRAPH ASIA 2024!
2024-03: One paper is accepted by SIGGRAPH 2024!
2024-02: Two papers are accepted by CVPR 2024!
2024-02: Started my internship at NVIDIA Research with Koki Nagano and Shalini De Mello.
2024-01: DreamCraft3D is accepted at ICLR 2024. See you in Vienna!
2023-09: HAvatar is accepted by ACM TOG 2023.
2023-03: StyleAvatar is accepted by ACM SIGGRAPH 2023.
2023-03: Next3D is selected as a CVPR Highlight paper (top 10% of accepted papers).

Technical Reports
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Haoyu Lu*, Wen Liu*, Bo Zhang**, Bingxuan Wang, Kai Dong, Bo Liu, Jingxiang Sun, Tongzheng Ren, Zhuoshu Li, Hao Yang, Yaofeng Sun, Chengqi Deng, Hanwei Xu, Zhenda Xie, Chong Ruan (*Equal Contribution, **Project Lead)
[Hugging Face] [PDF] [Code]

Introducing DeepSeek-VL, an open-source Vision-Language (VL) Model designed for real-world vision and language understanding applications.

Publications
DreamCraft3D++: Efficient Hierarchical 3D Generation with Multi-Plane Reconstruction Model
Jingxiang Sun, Cheng Peng, Ruizhi Shao, Yuan-Chen Guo, Xiaochen Zhao, Yangguang Li, Yanpei Cao, Bo Zhang, Yebin Liu
arXiv, 2024
[Project] [PDF] [BibTeX]

We present DreamCraft3D++, an extension of DreamCraft3D that enables efficient, high-quality generation of complex 3D assets in just 10 minutes.

Human4DiT: 360-degree Human Video Generation with 4D Diffusion Transformer
Ruizhi Shao*, Youxin Pang*, Zerong Zheng, Jingxiang Sun, Yebin Liu
ACM Transactions on Graphics (SIGGRAPH Asia 2024)
[Project] [PDF] [BibTeX]

Given a reference image, SMPL sequences, and camera parameters, our method can generate free-view dynamic human videos in 360 degrees.

DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior
Jingxiang Sun, Bo Zhang, Ruizhi Shao, Lizhen Wang, Wen Liu, Zhenda Xie, Yebin Liu
2024 International Conference on Learning Representations, ICLR 2024
[Project] [PDF] [Code] [BibTeX]

We present DreamCraft3D, a hierarchical 3D content generation method that produces high-fidelity and coherent 3D objects.

InvertAvatar: Incremental GAN Inversion for Generalized Head Avatars
Xiaochen Zhao*, Jingxiang Sun*, Lizhen Wang, Jinli Suo, Yebin Liu (* equal contribution)
ACM SIGGRAPH 2024 (conditionally accepted)
[Project] [PDF] [BibTeX]

We present the Incremental 3D GAN Inversion approach, which reconstructs photorealistic 3D facial avatars in under one second from single or multiple source images.

Control4D: Dynamic Portrait Editing by Learning 4D GAN from 2D Diffusion-based Editor
Ruizhi Shao, Jingxiang Sun, Cheng Peng, Zerong Zheng, Boyao Zhou, Hongwen Zhang, Yebin Liu
2024 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2024
[Project] [PDF] [Code] [BibTeX]

We propose Control4D, which enables high-fidelity and spatiotemporally consistent 4D portrait editing from only text instructions.

RAM-Avatar: Real-time Photo-Realistic Avatar from Monocular Videos with Full-body Control
Xiang Deng, Zerong Zheng, Yuxiang Zhang, Jingxiang Sun, Chao Xu, XiaoDong Yang, Lizhen Wang, Yebin Liu
2024 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2024
[Project] [PDF] [Code] [BibTeX]

We present RAM-Avatar, a real-time photorealistic human avatar method from monocular videos that supports high-fidelity rendering, full-body control (face and hands), and real-time animation.

VectorTalker: SVG Talking Face Generation with Progressive Vectorisation
Hao Hu, Xuan Wang, Jingxiang Sun, Yanbo Fan, Yu Guo, Caigui Jiang
arXiv, 2023
[Project] [PDF] [BibTeX]

We introduce VectorTalker, a novel method for creating high-fidelity, audio-driven talking heads using scalable vector graphics, effective for various styles.

HAvatar: High-Fidelity Head Avatar via a Facial Model Conditioned Neural Radiance Field
Xiaochen Zhao, Lizhen Wang, Jingxiang Sun, Hongwen Zhang, Jinli Suo, Yebin Liu
ACM Transactions on Graphics (ACM TOG 2023)
[Project] [PDF] [Code] [BibTeX]

We introduce the Facial Model Conditioned Neural Radiance Field, a hybrid 3D representation that combines the flexibility of NeRF with a parametric template, leveraging synthetic renderings for conditioning.

StyleAvatar: Real-time Photorealistic Portrait Avatars from a Single Video
Lizhen Wang, Xiaochen Zhao, Jingxiang Sun, Yuxiang Zhang, Hongwen Zhang, Tao Yu, Yebin Liu
ACM SIGGRAPH 2023
[Project] [PDF] [Code] [BibTeX]

We propose StyleAvatar, a real-time photorealistic portrait avatar reconstruction approach using StyleGAN-based networks, yielding high-fidelity avatars with faithful expression control.

Next3D: Generative Neural Texture Rasterization for 3D-Aware Head Avatars
Jingxiang Sun, Xuan Wang, Lizhen Wang, Xiaoyu Li, Yong Zhang, Hongwen Zhang, Yebin Liu
2023 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2023, Highlight
[Project] [PDF] [Code] [BibTeX]

We propose a novel 3D GAN framework for unsupervised learning of high-quality, 3D-consistent facial avatars from unstructured 2D images. Our approach introduces Generative Texture-Rasterized Tri-planes for accurate deformations and topological flexibility.

High-fidelity Facial Avatar Reconstruction from Monocular Video with Generative Priors
Yunpeng Bai, Yanbo Fan, Xuan Wang, Yong Zhang, Jingxiang Sun, Chun Yuan, Ying Shan
2023 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2023
[Project] [PDF] [Code] [BibTeX]

We propose an efficient method to build a personalized generative prior from a small set of facial images of a specific individual. This approach supports photo-realistic novel view synthesis and face reenactment using various input signals (images, 3DMM coefficients, or audio).

DiffuStereo: High Quality Human Reconstruction via Diffusion-based Stereo Using Sparse Cameras
Ruizhi Shao, Zerong Zheng, Hongwen Zhang, Jingxiang Sun, Yebin Liu
2022 IEEE European Conference on Computer Vision, ECCV 2022, Oral Presentation
[Project] [PDF] [Code] [BibTeX]

We present DiffuStereo, a novel system using only sparse cameras (8 in our setting) for high-quality 3D human reconstruction. Central to our method is a diffusion-based stereo module, which incorporates powerful generative diffusion models into iterative stereo matching.

IDE-3D: Interactive Disentangled Editing for High-Resolution 3D-aware Portrait Synthesis
Jingxiang Sun, Xuan Wang, Yichun Shi, Lizhen Wang, Jue Wang, Yebin Liu
ACM Transactions on Graphics (SIGGRAPH Asia 2022)
[Project] [PDF] [Code] [BibTeX]

We introduce a high-resolution, 3D-aware generative model that supports local control over facial shape and texture, with real-time interactive editing capabilities.

FENeRF: Face Editing in Neural Radiance Fields
Jingxiang Sun, Xuan Wang, Yong Zhang, Xiaoyu Li, Qi Zhang, Yebin Liu, Jue Wang
2022 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2022
[Project] [PDF] [Code] [BibTeX]

We propose FENeRF, a 3D-aware generator that produces view-consistent, locally-editable portraits. By employing two decoupled latent codes for facial semantics and texture, we achieve flexible and precise manipulation of geometry and appearance.

iMoCap: Motion Capture from Internet Videos
Junting Dong*, Qing Shuai*, Jingxiang Sun, Yuanqing Zhang, Hujun Bao, Xiaowei Zhou (* equal contribution)
2022 International Journal of Computer Vision, IJCV 2022
[PDF] [BibTeX]

We propose a novel optimization-based framework for multi-view motion capture from internet videos, recovering more precise and detailed poses than monocular pose estimation methods.

BusTime: Which is the Right Prediction Model for My Bus Arrival Time?
Dairui Liu, Jingxiang Sun, Shen Wang
2020 IEEE International Conference on Big Data Analytics, ICBDA 2020
[PDF] [BibTeX]

We present a general and practical evaluation framework for multiple widely used bus-arrival-time prediction models, including delay-based, k-NN, kernel regression, additive models, and LSTM-based neural networks.


This website template was adapted from Yu Deng.