Facebook

Meta 3D Gen | Research - AI at Meta

Our approach

Research

Product experiences

Llama

Blog

Try Meta AI

GRAPHICS

COMPUTER VISION

Meta 3D Gen

July 02, 2024

Abstract

We introduce Meta 3D Gen (3DGen), a new state-of-the-art, fast pipeline for text-to-3D asset generation. 3DGen offers 3D asset creation with high prompt fidelity and high-quality 3D shapes and textures in under a minute. It supports physically-based rendering (PBR), necessary for 3D asset relighting in real-world applications. Additionally, 3DGen supports generative retexturing of previously generated (or artist-created) 3D shapes using additional textual inputs provided by the user. 3DGen integrates key technical components, Meta 3D AssetGen and Meta 3D TextureGen, that we developed for text-to-3D and text-to-texture generation, respectively. By combining their strengths, 3DGen represents 3D objects simultaneously in three ways: in view space, in volumetric space, and in UV (or texture) space. The integration of these two techniques achieves a win rate of 68% with respect to the single-stage model. We compare 3DGen to numerous industry baselines, and show that it outperforms them in terms of prompt fidelity and visual quality for complex textual prompts, while being significantly faster.

Download the Paper

AUTHORS

Written by

Raphael Bensadoun

Tom Monnier

Yanir Kleiman

Filippos Kokkinos

Yawar Siddiqui

Mahendra Kariya

Omri Harosh

Roman Shapovalov

Emilien Garreau

Animesh Karnewar

Ang Cao

Idan Azuri

Iurii Makarov

Eric-Tuan Le

Antoine Toisoul

David Novotny

Oran Gafni

Natalia Neverova

Andrea Vedaldi

Publisher

Arxiv only

Research Topics

Graphics

Computer Vision

Related Publications

September 05, 2024

CONVERSATIONAL AI

NLP

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Chunting Zhou , Lili Yu , Arun Babu , Kushal Tirumala , Michihiro Yasunaga , Leonid Shamis , Jacob Kahn , Luke Zettlemoyer , Omer Levy , Xuezhe Ma

September 05, 2024

Read the Paper

August 20, 2024

CONVERSATIONAL AI

NLP

Lumos : Empowering Multimodal LLMs with Scene Text Recognition

Ashish Shenoy , Yichao Lu , Srihari Jayakumar , Debojeet Chatterjee , Mohsen Moslehpour , Pierce Chuang , Abhay Harpale , Vikas Bhardwaj , Di Xu (SWE) , Shicong Zhao , Ankit Ramchandani , Luna Dong , Anuj Kumar

August 20, 2024

Read the Paper

August 15, 2024

INTEGRITY

COMPUTER VISION

Guarantees of confidentiality via Hammersley-Chapman-Robbins bounds

Kamalika Chaudhuri , Chuan Guo , Laurens van der Maaten , Saeed Mahloujifar , Mark Tygert

August 15, 2024

Read the Paper

July 29, 2024

COMPUTER VISION

SAM 2: Segment Anything in Images and Videos

Nikhila Ravi , Valentin Gabeur , Yuan-Ting Hu , Ronghang Hu , Chay Ryali , Tengyu Ma , Haitham Khedr , Roman Rädle , Chloe Rolland , Laura Gustafson , Eric Mintun , Junting Pan , Kalyan Vasudev Alwala , Nicolas Carion , Chao-Yuan Wu , Ross Girshick , Piotr Dollar , Christoph Feichtenhofer