Semantic Enhanced Point-E

Enhancing 3D model generation with multimodal fusion of image and text embeddings.

Project Overview

In this project we introduce Semantic Enhanced Point-E (SEPE), an advancement on the Point-E framework for generating 3D point clouds from text prompts. By integrating image and text embeddings through a fusion module early in the generation process, SEPE enhances the semantic fidelity and controllability of the generated 3D models. Experiments demonstrate that this multimodal approach allows for better alignment with user intentions, such as modifying object properties based on text descriptions.