Blending-NeRF: Text-Driven Localized Editing in Neural Radiance Fields

ICCV 2023

Hyeonseop Song^1*, Seokhun Choi^1*, Hoseok Do¹, Chul Lee¹, Taehyeong Kim²

¹LG Electronics, ²Seoul National University, ^*Equal contribution

Blending-NeRF is an innovative localized 3D editing method that enables users to perform natural modifications to a source object by text prompts. It offers various modification options with explicitly predefined three types of editing operations: color change, density addition, and density removal.

Abstract

Text-driven localized editing of 3D objects is particularly difficult as locally mixing the original 3D object with the intended new object and style effects without distorting the object's form is not a straightforward process. To address this issue, we propose a novel NeRF-based model, Blending-NeRF, which consists of two NeRF networks: pretrained NeRF and editable NeRF. Additionally, we introduce new blending operations that allow Blending-NeRF to properly edit target regions which are localized by text. By using a pretrained vision-language aligned model, CLIP, we guide Blending-NeRF to add new objects with varying colors and densities, modify textures, and remove parts of the original object. Our extensive experiments demonstrate that Blending-NeRF produces naturally and locally edited 3D objects from various text prompts.

Video

Method Overview

We propose Blending-NeRF, which consists of pretrained NeRF $f_\theta$ for the original 3D model and editable NeRF $g_\phi$ for object editing. The weight parameter $\theta$ is frozen, and $\phi$ is learnable. The edited scene is synthesized by blending the volumetric information of two NeRFs. We use two kinds of natural language prompts: source text $T_{\text{source}}$ and target text $T_{\text{target}}$, describing the original and edited 3D model, respectively. Blending-NeRF performs text-driven editing using the CLIP losses with both prompts. However, using only the CLIP losses is not sufficient for localized editing as it does not serve to specify the target region. Thus, during training, we specify the editing region in the original rendered scene using the source text. Simultaneously, the editable NeRF is trained to edit the target region under the guidance of localized editing objective. For more details, please refer to our paper.

Editing on Synthetic Objects

"bulldozer" (source)

"boat" (source)

"brown-jar" (source)

"hotdog" (source)

"green-chair" (source)

"mic" (source)

"porous bulldozer"

editing operation (color change + density removal)

"boat exploding with a lot of smoke and blue flame"

"translucent brown-jar"

"oil pastel hotdog"

"iron throne"

"mic amber"

"boat" (source)

"cyberpunk neon boat"

"crystal boat"

"jelly boat"

"galaxy big bang
explosion on boat"

editing operation (density addition + density removal)

"boat attacked by octopus"

"firewood on fire, trending on artstation"

"fireworks on boat"

"boat inside blackhole, a DSLR photo"

"shipwreck"

"disappearing boat"

editing operation (color change + density addition + density removal)

"ghost ship"

Editing on Real-World Scenes

pinecone

"pinecone" (source)

"pineapple, trending on artstation"

"shining diamond pinecone, trending on artstation"

"snow on pinecone"

"burning pinecone"

"yard "

vasedeck

"flower" (source)

"swarovski blue crystal flower, trending on artstation"

"cyberpunk neon flower, highly detailed"

"burning flower, a DSLR photo"

"snow on flower"

"deck"

BibTeX

@InProceedings{song2023blending,
  author    = {Song, Hyeonseop and Choi, Seokhun and Do, Hoseok and Lee, Chul and Kim, Taehyeong},
  title     = {Blending-NeRF: Text-Driven Localized Editing in Neural Radiance Fields},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  month     = {October},
  year      = {2023},
  pages     = {14383-14393}
}

Blending-NeRF: Text-Driven Localized Editing in Neural Radiance Fields

Abstract

Video

Method Overview

Editing on Synthetic Objects

"bulldozer" (source)

"boat" (source)

"brown-jar" (source)

"hotdog" (source)

"green-chair" (source)

"mic" (source)

"porous bulldozer"

"boat exploding with a lot of smoke and blue flame"

"translucent brown-jar"

"oil pastel hotdog"

"iron throne"

"mic amber"

"boat" (source)

"cyberpunk neon boat"

"crystal boat"

"jelly boat"

"galaxy big bang explosion on boat"

"boat attacked by octopus"

"firewood on fire, trending on artstation"

"fireworks on boat"

"boat inside blackhole, a DSLR photo"

"shipwreck"

"disappearing boat"

"ghost ship"

Editing on Real-World Scenes

pinecone

"pinecone" (source)

"pineapple, trending on artstation"

"shining diamond pinecone, trending on artstation"

"snow on pinecone"

"burning pinecone"

"yard "

vasedeck

"flower" (source)

"swarovski blue crystal flower, trending on artstation"

"cyberpunk neon flower, highly detailed"

"burning flower, a DSLR photo"

"snow on flower"

"deck"

BibTeX

"galaxy big bang
explosion on boat"