Text-driven localized editing of 3D objects is particularly difficult as locally mixing the original 3D object with the intended new object and style effects without distorting the object's form is not a straightforward process. To address this issue, we propose a novel NeRF-based model, Blending-NeRF, which consists of two NeRF networks: pretrained NeRF and editable NeRF. Additionally, we introduce new blending operations that allow Blending-NeRF to properly edit target regions which are localized by text. By using a pretrained vision-language aligned model, CLIP, we guide Blending-NeRF to add new objects with varying colors and densities, modify textures, and remove parts of the original object. Our extensive experiments demonstrate that Blending-NeRF produces naturally and locally edited 3D objects from various text prompts.
Method Overview
We propose Blending-NeRF, which consists of pretrained NeRF $f_\theta$ for the original 3D model and editable NeRF $g_\phi$ for object editing. The weight parameter $\theta$ is frozen, and $\phi$ is learnable. The edited scene is synthesized by blending the volumetric information of two NeRFs. We use two kinds of natural language prompts: source text $T_{\text{source}}$ and target text $T_{\text{target}}$, describing the original and edited 3D model, respectively. Blending-NeRF performs text-driven editing using the CLIP losses with both prompts. However, using only the CLIP losses is not sufficient for localized editing as it does not serve to specify the target region. Thus, during training, we specify the editing region in the original rendered scene using the source text. Simultaneously, the editable NeRF is trained to edit the target region under the guidance of localized editing objective. For more details, please refer to our paper.
@InProceedings{song2023blending,
author = {Song, Hyeonseop and Choi, Seokhun and Do, Hoseok and Lee, Chul and Kim, Taehyeong},
title = {Blending-NeRF: Text-Driven Localized Editing in Neural Radiance Fields},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2023},
pages = {14383-14393}
}