A Surge in NeRF

YU YueOriginalAbout 3 min

A Surge in NeRF

News

My post on volume rendering is out! (link)
My post on NDC space is out! (link)

Overview

neural radiance field (NeRF)

extended to other fields such as generative modeling (GRAF, GIRAFFE, DreamField), relighting (?, NeRF-OSR), scene editing (CCNeRF), etc.

focus on nvs/reconstruction

Background

What do we want?

Novel view synthesis (NVS) refers to the problem of capturing a scene from a novel angle given a few input images. With downstream applications to modeling, animation, and mixed reality, NVS is fundamental to the computer vision (CV) and computer graphics (CG) fields.

The novel view synthesis problem

The problem is typically attacked in two stages: inverse rendering — constructing a 3D representation from input images — and rendering — mapping high-dimensional to pixel colors of a raster image.

Supervision in 3D

CV for inverse rendering A 3D model can be explicitly represented by mesh, point cloud, voxel grid, or multi-plane images (MPI). This renders learning-based solutions to other pertinent problems qualified for reconstruction, such as structure from motion (SfM) and multi-view stereo (MVS). Nonetheless, those approaches are often dependent on direct supervion, where 3D ground truths are time-consuming to obtain. Explicit representations are also memory-demanding. Hence, learning-based CV schemes hardly scale to real-world scenes.

Supervision in 2D?

Is there a way to learn from 2D supervision? Is there a way to reduce memory footprint of 3D representations? This is where NeRF steps in. Before proceeding to its thrival, there is a (significant) obstacle to overcome — the rendering process must be differentiable. Otherwise, gradients cannot propagate back to the geometric representation, and the network never congerges.

Differentiable rendering

Differentiable rendering

Classical graphics pipelines leverage matrix operations on triangular (or polygonal) meshes for a raster image. This process is not differentiable in that gradient w.r.t. geometry is either hard to compute or unhelpful. 3D representations for differentiable rendering tend to be implicit. DVRopen in new window, with a misleading[1] title, differentiably renders a pixel given implicit surfaces. An alternative is volume rendering, where a ray r=o+zd\boldsymbol{r} = \boldsymbol{o} + z\boldsymbol{d} "casts" to a volumetric representation, and colors are "cummulated" by

C(r)=C(z;o,d)=znzfT(z)σ(r(z))c(r(z),d) dz, T(z)=exp(znzσ(r(s)) ds) \mathbf{C}(\boldsymbol{r}) = \gray{\mathbf{C}(z; \boldsymbol{o}, \boldsymbol{d}) =} \int_{z_n}^{z_f} T\gray{(z)} \sigma \left( \boldsymbol{r}\gray{(z)} \right) \boldsymbol{c} \left(\boldsymbol{r}\gray{(z)}, \boldsymbol{d} \right) \ dz, \ T(z) = \exp \left(-\int_{z_n}^z \sigma \left(\boldsymbol{r} \gray{(s)} \right) \ ds \right)

given "volume density" σ\sigma and color c\boldsymbol{c}.

Ray tracing? Ray casting? Ray marching!

Volume rendering is an image-ordered approach. For every pixel, a ray ejects from the camera, passes through the pixel center, and "casts" to the volumetric representation. Unlink ray tracing, it does not reflect off surfaces. Rather, it marches through the entire volume. This is reminiscent of ray casting, widely applicable in medical imaging. On the constrary, it does not intend to reveal the internal structure of "volume data". What we want is the color of that pixel. Such a novel approach is referred to ray marching.

Info

personal view, may be controversial

not strictly "physical"

Analysis

The above integral demands the continuity of σ\sigma and c\boldsymbol{c}, making the volumetric representation essentially a scalar fieldopen in new window.

Coordinate-based MLP as radiance field

MLP as radiance field

Entirely implicit? A step back…

Introduction of feature grid for rapid convergenceFeature grids take several forms.

"Men are still good."[2]

mip-NeRF, mip-NeRF 360, ref-NeRF

Summary

References

CS184/284aopen in new window by UC Berkeleyopen in new window
CS348nopen in new window by Stanford Universityopen in new window
DIVeR: Real-time and Accurate Neural Radiance Fields with Deterministic Integration for Volume Renderingopen in new window
Instant Neural Graphics Primitives with a Multiresolution Hash Encodingopen in new window
Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fieldsopen in new window
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesisopen in new window
Neural Sparse Voxel Fieldsopen in new window
PlenOctrees for Real-time Neural Radiance Fieldsopen in new window
Plenoxels: Radiance Fields without Neural Networksopen in new window
Point-NeRF: Point-based Neural Radiance Fieldsopen in new window
Ref-Nerf: Structured View-Dependent Appearance for Neural Radiance Fieldsopen in new window
NeRF(神经辐射场)有相关的物理(光学)原理支撑吗?open in new window

Errata

TimeModification
Sep 18 2022Pre-release
Oct 16 2022Initial release

  1. ? ↩︎

  2. "Men are still good" is an ending line from the film Batman v Superman: Dawn of Justiceopen in new window conveying Bruce Wayne's faith in mankind. It is cited here to imply that pure MLP representations of radiance field are not (at all) inferior to "hybrid" rpresentations, in terms of quality, of course. ↩︎

Last update:
Contributors: YU Yue,Will Yu
Loading...