Decoding VGGT Features in 3D with Sparse Convolution

Course Project — Computer Vision, NYU Shanghai (Prof. Saining Xie)

Sep. 2025 – Dec. 2025

Overview

This project explores decoding VGGT / AnySplat features for 3D Gaussian Splatting using sparse 3D convolutions, replacing the original MLP-based decoder with a ResNet-style architecture that better exploits voxelized spatial structure.

Contributions

Feed-forward Gaussian Splatting refinement pipeline built on top of the VGGT/AnySplat backbone, introducing a ResNet-style sparse 3D convolutional decoder for voxelized feature aggregation.
Kernel normalization for sparse 3D convolutions to stabilize feature aggregation under sparse voxel occupancy, addressing the inconsistent receptive field problem caused by varying numbers of active voxels.

Results

Improved 3D reconstruction quality on standard benchmarks:

Metric	Baseline	Ours
PSNR ↑	22.05	23.22
SSIM ↑	0.692	0.744
LPIPS ↓	0.327	0.277

Stack

PyTorch · SpConv · VGGT / AnySplat · Gaussian Splatting