Blending-NeRF: Text-Driven Localized Editing in Neural Radiance Fields

ICCV 2023
Hyeonseop Song1*, Seokhun Choi1*, Hoseok Do1, Chul Lee1, Taehyeong Kim2
1LG Electronics, 2Seoul National University, *Equal contribution

Blending-NeRF is an innovative localized 3D editing method that enables users to perform natural modifications to a source object by text prompts. It offers various modification options with explicitly predefined three types of editing operations: color change, density addition, and density removal.

Abstract

Text-driven localized editing of 3D objects is particularly difficult as locally mixing the original 3D object with the intended new object and style effects without distorting the object's form is not a straightforward process. To address this issue, we propose a novel NeRF-based model, Blending-NeRF, which consists of two NeRF networks: pretrained NeRF and editable NeRF. Additionally, we introduce new blending operations that allow Blending-NeRF to properly edit target regions which are localized by text. By using a pretrained vision-language aligned model, CLIP, we guide Blending-NeRF to add new objects with varying colors and densities, modify textures, and remove parts of the original object. Our extensive experiments demonstrate that Blending-NeRF produces naturally and locally edited 3D objects from various text prompts.

Video

Method Overview

Blending-NeRF architecture.

We propose Blending-NeRF, which consists of pretrained NeRF $f_\theta$ for the original 3D model and editable NeRF $g_\phi$ for object editing. The weight parameter $\theta$ is frozen, and $\phi$ is learnable. The edited scene is synthesized by blending the volumetric information of two NeRFs. We use two kinds of natural language prompts: source text $T_{\text{source}}$ and target text $T_{\text{target}}$, describing the original and edited 3D model, respectively. Blending-NeRF performs text-driven editing using the CLIP losses with both prompts. However, using only the CLIP losses is not sufficient for localized editing as it does not serve to specify the target region. Thus, during training, we specify the editing region in the original rendered scene using the source text. Simultaneously, the editable NeRF is trained to edit the target region under the guidance of localized editing objective. For more details, please refer to our paper.

Editing on Synthetic Objects

"bulldozer" (source)

"boat" (source)

"brown-jar" (source)

"hotdog" (source)

"green-chair" (source)

"mic" (source)



editing operation (density removal)

"porous bulldozer"

editing operation (color change + density removal)

"boat exploding with a lot of smoke and blue flame"

editing operation (density removal)

"translucent brown-jar"

editing operation (color change)

"oil pastel hotdog"

editing operation (density addition)

"iron throne"

editing operation (color change + density removal)

"mic amber"

editing operation (nothing)

"boat" (source)

editing operation (color change)

"cyberpunk neon boat"

editing operation (color change)

"crystal boat"

editing operation (color change)

"jelly boat"

editing operation (color change + density removal)

"galaxy big bang
explosion on boat"

editing operation (density addition + density removal)

"boat attacked by octopus"



editing operation (density addition + density removal)

"firewood on fire, trending on artstation"

editing operation (density addition + density removal)

"fireworks on boat"

editing operation (density addition + density removal)

"boat inside blackhole, a DSLR photo"

editing operation (color change + density removal)

"shipwreck"

editing operation (density removal)

"disappearing boat"

editing operation (color change + density addition + density removal)

"ghost ship"

Editing on Real-World Scenes

pinecone

editing operation (nothing)

"pinecone" (source)

editing operation (color change)

"pineapple, trending on artstation"

editing operation (color change)

"shining diamond pinecone, trending on artstation"

editing operation (color change + density removal)

"snow on pinecone"

editing operation (density addition)

"burning pinecone"

editing operation (density removal)

"yard "

vasedeck

editing operation (nothing)

"flower" (source)

editing operation (color change)

"swarovski blue crystal flower, trending on artstation"

editing operation (color change)

"cyberpunk neon flower, highly detailed"

editing operation (color change + density removal)

"burning flower, a DSLR photo"

editing operation (color change + density removal)

"snow on flower"

editing operation (density removal)

"deck"

BibTeX

@InProceedings{song2023blending,
  author    = {Song, Hyeonseop and Choi, Seokhun and Do, Hoseok and Lee, Chul and Kim, Taehyeong},
  title     = {Blending-NeRF: Text-Driven Localized Editing in Neural Radiance Fields},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  month     = {October},
  year      = {2023},
  pages     = {14383-14393}
}