Jingxiang Sun (孙景翔)

Email: sunjingxiang_stark[at]126.com      Google Scholar     Github     Twitter     Youtube

I am a second-year PhD student of Department of Automation, Tsinghua University in fall, 2022 and under the supervision of Prof. Yebin Liu. Prior to Tsinghua University I obtained M.S. from Department of ECE, University of Illinois at Urbana-Champaign. My research focuses on neural rendering, digital avatar and 3D generation.

I am very fortunate to have spent summer 2021 at Tencent AI Lab with Xuan Wang and Jue Wang; fall 2020 at State Key Laboratory of CAD&CG at Zhejiang University with Xiaowei Zhou.

profile photo
News

2024-10: We introduce DreamCraft3D++, a new technique for high-quality 3D content generation!
2024-08: One paper is accepted by SIGGRAPH ASIA 2024!
2024-03: One paper is accepted by SIGGRAPH 2024!
2024-02: Two papers are accepted by CVPR 2024!
2024-02: Start my internship at NVIDIA Research with Koki Nagano and Shalini De Mello.
2024-01: DreamCraft3D is accepted at ICLR 2024. See you in Vienna!
2023-09: HAvatar is accepted by ACM TOG 2023.
2023-03: StyleAvatar is accepted by ACM SIGGRAPH 2023.
2023-03: Next3D is selected as CVPR Highlight papers (10% of accepted papers, 2.5% of submissions).

Publications
DreamCraft3D++: Efficient Hierarchical 3D Generation with Multi-Plane Reconstruction Model
Jingxiang Sun, Cheng Peng, Ruizhi Shao, Yuan-Chen Guo, Xiaochen Zhao, Yangguang Li, Yanpei Cao, Bo Zhang, Yebin Liu
arXiv, 2024
[Project] [PDF] [BibTeX]

We present DreamCraft3D++, an extension of DreamCraft3D that enables efficient high-quality generation of complex 3D assets in 10 minutes.

Human4DiT: 360-degree Human Video Generation with 4D Diffusion Transformer
Ruizhi Shao*, Youxin Pang*, Zerong Zheng, Jingxiang Sun, Yebin Liu
ACM Transactions on Graphics (SIGGRAPH Asia 2024)
[Project] [PDF] [BibTeX]

Given a reference image, SMPL sequences and camera parameters, our method is capable of generating free-view dynamic human videos.

DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior
Jingxiang Sun, Bo Zhang, Ruizhi Shao, Lizhen Wang, Wen Liu, Zhenda Xie, Yebin Liu
2024 International Conference on Learning Representations, ICLR 2024
[Project] [PDF] [Code] [BibTeX]

We present DreamCraft3D, a hierarchical 3D content generation method that produces high-fidelity and coherent 3D object.

InvertAvatar: Incremental GAN Inversion for Generalized Head Avatars
Xiaochen Zhao*, Jingxiang Sun*, Lizhen Wang, Jinli Suo, Yebin Liu (* equal contribution)
ACM SIGGRAPH 2024 (conditionally accepted)
[Project] [PDF] [BibTeX]

We present the Incremental 3D GAN Inversion, which efficiently reconstructs photorealistic 3D facial avatars in under 1s, using single or multiple source images.

Control4D: Dynamic Portrait Editing by Learning 4D GAN from 2D Diffusion-based Editor
Ruizhi Shao, Jingxiang Sun, Cheng Peng, Zerong Zheng, Boyao Zhou, Hongwen Zhang, Yebin Liu
2024 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2024
[Project] [PDF] [Code] [BibTeX]

We propose Control4D, an approach to high-fidelity and spatiotemporal-consistent 4D portrait editing with only text instructions.

RAM-Avatar: Real-time Photo-Realistic Avatar from Monocular Videos with Full-body Control
Xiang Deng, Zerong Zheng, Yuxiang Zhang, Jingxiang Sun, Chao Xu, XiaoDong Yang, Lizhen Wang, Yebin Liu
2024 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2024
[Project] [PDF] [Code] [BibTeX]

We present RAM-Avatar, a real-time photo-realistic human avatar learning method based on monocular videos, which not only achieves high-fidelity rendering with full-body control including the face and hands but also supports real-time animation.

VectorTalker: SVG Talking Face Generation with Progressive Vectorisation
Hao Hu, Xuan Wang, Jingxiang Sun, Yanbo Fan, Yu Guo, Caigui Jiang
arXiv, 2023
[Project] [PDF] [BibTeX]

We present VectorTalker, a novel method for creating high-fidelity, audio-driven talking heads using scalable vector graphics, effective for various image styles.

HAvatar: High-fidelity Head Avatar via Facial Model ConditionedNeural Radiance Field
Xiaochen Zhao, lizhen Wang, Jingxiang Sun, Hongwen Zhang, Jinli Suo, Yebin Liu
ACM Transactions on Graphics (ACM TOG 2023)
[Project] [PDF] [Code] [BibTeX]

We introduce the Facial Model Conditioned Neural Radiance Field, a hybrid 3D representation method that merges NeRF's expressiveness with parametric template data, enabling topological flexibility through synthetic-renderings-based conditioning.

StyleAvatar: Real-time Photo-realistic Portrait Avatar from a Single Video
Lizhen Wang, Xiaochen Zhao, Jingxiang Sun, Yuxiang Zhang, Hongwen Zhang, Tao Yu, Yebin Liu
ACM SIGGRAPH 2023
[Project] [PDF] [Code] [BibTeX]

We propose StyleAvatar, a real-time photo-realistic portrait avatar reconstruction method using StyleGAN-based networks, which can generate high-fidelity portrait avatars with faithful expression control.

Next3D: Generative Neural Texture Rasterization for 3D-Aware Head Avatars
Jingxiang Sun, Xuan Wang, Lizhen Wang, Xiaoyu Li, Yong Zhang, Hongwen Zhang, Yebin Liu
2023 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2023, Highlight
[Project] [PDF] [Code] [BibTeX]

We propose a novel 3D GAN framework for unsupervised learning of generative, high-quality and 3D-consistent facial avatars from unstructured 2D images. To achieve both deformation accuracy and topological flexibility, we present a 3D representation called Generative Texture-Rasterized Tri-planes.

High-fidelity Facial Avatar Reconstruction from Monocular Video with Generative Priors
Yunpeng Bai, Yanbo Fan, Xuan Wang, Yong Zhang, Jingxiang Sun, Chun Yuan, Ying Shan
2023 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2023
[Project] [PDF] [Code] [BibTeX]

We propose an efficient method to construct the personalized generative prior based on a small set of facial images of a given individual. After learning, it allows for photo-realistic rendering with novel views and the face reenactment can be realized by performing navigation in the latent space. Our proposed method is applicable for different driven signals, including RGB images, 3DMM coefficients, and audios.

DiffuStereo: High Quality Human Reconstruction via Diffusion-based Stereo Using Sparse Cameras
Ruizhi Shao, Zerong Zheng, Hongwen Zhang, Jingxiang Sun, Yebin Liu
2022 IEEE European Conference on Computer Vision, ECCV 2022, Oral Presentation
[Project] [PDF] [Code] [BibTeX]

We propose DiffuStereo, a novel system using only sparse cameras (8 in this work) for high-quality 3D human reconstruction. At its core is a novel diffusion-based stereo module, which introduces diffusion models, a type of powerful generative models, into the iterative stereo matching network.

IDE-3D: Interactive Disentangled Editing for High-Resolution 3D-aware Portrait Synthesis
Jingxiang Sun, Xuan Wang, Yichun Shi, Lizhen Wang, Jue Wang, Yebin Liu
ACM Transactions on Graphics (SIGGRAPH Asia 2022)
[Project] [PDF] [Code] [BibTeX]

We propose a high-resolution 3D-aware generative model that not only enables local control of the facial shape and texture, but also supports real-time, interactive editing.

FENeRF: Face Editing in Neural Radiance Fields
Jingxiang Sun, Xuan Wang, Yong Zhang, Xiaoyu Li, Qi Zhang, Yebin Liu, Jue Wang
2022 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2022
[Project] [PDF] [Code] [BibTeX]

We propose FENeRF, a 3D-aware generator that can produce view-consistent and locally-editable portrait images. Our method uses two decoupled latent codes to generate corresponding facial semantics and texture in a spatial aligned 3D volume with shared geometry. We also reveal that joint learning semantics and texture helps to generate finer geometry.

iMoCap: Motion Capture from Internet Videos
Junting Dong*, Qing Shuai*, Jingxiang Sun, Yuanqing Zhang, Hujun Bao, Xiaowei Zhou (* equal contribution)
2022 International Journal of Computer Vision , IJCV 2022
[PDF] [BibTeX]

We propose a novel optimization-based framework and experimentally demonstrate its ability to recover much more precise and detailed motion from multiple videos, compared against monocular pose estimation methods.

BusTime: Which is the Right Prediction Model for My Bus Arrival Time?
Dairui Liu, Jingxiang Sun, Shen Wang
2020 IEEE International Conference on Big Data Analytics, ICBDA 2020
[PDF] [BibTeX]

We propose a general and practical evaluation framework for analysing various widely used prediction models (i.e. delay, k- nearest-neighbor, kernel regression, additive model, and recur- rent neural network using long short term memory) for bus arrival time.

Technical Reports
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Haoyu Lu*, Wen Liu*, Bo Zhang**, Bingxuan Wang, Kai Dong, Bo Liu, Jingxiang Sun, Tongzheng Ren, Zhuoshu Li, Hao Yang, Yaofeng Sun, Chengqi Deng, Hanwei Xu, Zhenda Xie, Chong Ruan (*Equal Contribution, **Project Lead)
[Hugging Face] [PDF] [Code]

Introducing DeepSeek-VL, an open-source Vision-Language (VL) Model designed for real-world vision and language understanding applications.


The website template was adapted from Yu Deng.