Implicit Neural Representations (INRs) employ neural networks to approximate discrete data as continuous functions. In the context of video data, such models can be utilized to transform the coordinates of pixel locations along with frame occurrence times (or indices) into RGB color values. Although INRs facilitate effective compression, they are unsuitable for editing purposes. One potential solution is to use a 3D Gaussian Splatting (3DGS) based model, such as the Video Gaussian Representation (VGR), which is capable of encoding video as a multitude of 3D Gaussians and is applicable for numerous video processing operations, including editing. Nevertheless, in this case, the capacity for modification is constrained to a limited set of basic transformations. To address this issue, we introduce the Video Gaussian Splatting (VeGaS) model, which enables realistic modifications of video data. To construct VeGaS, we propose a novel family of Folded-Gaussian distributions designed to capture nonlinear dynamics in a video stream and model consecutive frames by 2D Gaussians obtained as respective conditional distributions. Our experiments demonstrate that VeGaS outperforms state-of-the-art solutions in frame reconstruction tasks and allows realistic modifications of video data.
隐式神经表示(INRs)通过神经网络将离散数据近似为连续函数。在视频数据的背景下,此类模型可以将像素位置的坐标和帧的时间(或索引)转换为 RGB 颜色值。尽管 INRs 在压缩方面表现出色,但它们并不适合用于编辑操作。一种潜在的解决方案是使用基于三维高斯点绘制(3DGS)的模型,例如视频高斯表示(VGR),它能够将视频编码为多个三维高斯点,并支持包括编辑在内的多种视频处理操作。然而,这种情况下的修改能力仅限于少量基本变换。 为了解决这一问题,我们提出了 视频高斯点绘制(VeGaS) 模型,能够实现对视频数据的真实感修改。为了构建 VeGaS,我们设计了一种新的折叠高斯分布(Folded-Gaussian distributions)家族,用于捕捉视频流中的非线性动态,并通过条件分布将连续帧建模为相应的二维高斯分布。 实验表明,VeGaS 在帧重建任务中优于现有的最先进方法,同时能够对视频数据进行真实感的修改,显著拓展了视频编辑的可能性。