-
-
Notifications
You must be signed in to change notification settings - Fork 403
/
Copy pathawesome_3dgs_papers.yaml
14962 lines (14824 loc) · 857 KB
/
awesome_3dgs_papers.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
- id: condor2024dsyg
title: 'Don''t Splat your Gaussians: Volumetric Ray-Traced Primitives for
Modeling and Rendering Scattering and Emissive Media'
authors: Jorge Condor, Sebastien Speierer, Lukas Bode, Aljaz Bozic, Simon Green, Piotr Didyk, Adrian Jarabo
year: '2024'
abstract: 'Banking on the popularity of rasterized 3D Gaussian Splatting methods,
we formalize the ray-tracing of volumes composed of kernel mixture models (Gaussian
or otherwise). Our physically-based, path-traced formulation allows us to render and
optimize both scattering and emissive volumes, as well as radiance fields, in an
extremely efficient and compact manner. We also introduce the Epanechnikov kernel
as an efficient alternative for the Gaussian kernel in radiance field rendering,
and showcase the advantages of a ray-traced framework, while maintaining real-time
performance.
'
project_page: https://arcanous98.github.io/projectPages/gaussianVolumes.html
paper: https://arcanous98.github.io/assets/data/papers/Gaussian_tracing_meta_TOG-compressed.pdf
code: https://github.com/facebookresearch/volumetric_primitives
video: null
tags:
- Physics
- Ray Tracing
- Relight
- Rendering
- Project
- Code
- 360 degree
- Antialiasing
- Perspective-correct
thumbnail: assets/thumbnails/condor2024dsyg.jpg
publication_date: '2024-05-24T10:42:05+00:00'
date_source: arxiv
- id: lin2025diffsplat
title: 'DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat
Generation'
authors: Chenguo Lin, Panwang Pan, Bangbang Yang, Zeming Li, Yadong Mu
year: '2025'
abstract: 'Recent advancements in 3D content generation from text or a single image
struggle with limited high-quality 3D datasets and inconsistency from 2D multi-view
generation. We introduce DiffSplat, a novel 3D generative framework that natively
generates 3D Gaussian splats by taming large-scale text-to-image diffusion models.
It differs from previous 3D generative models by effectively utilizing web-scale
2D priors while maintaining 3D consistency in a unified model. To bootstrap the
training, a lightweight reconstruction model is proposed to instantly produce
multi-view Gaussian splat grids for scalable dataset curation. In conjunction
with the regular diffusion loss on these grids, a 3D rendering loss is introduced
to facilitate 3D coherence across arbitrary views. The compatibility with image
diffusion models enables seamless adaptions of numerous techniques for image generation
to the 3D realm. Extensive experiments reveal the superiority of DiffSplat in
text- and image-conditioned generation tasks and downstream applications. Thorough
ablation studies validate the efficacy of each critical design choice and provide
insights into the underlying mechanism.
'
project_page: https://chenguolin.github.io/projects/DiffSplat/
paper: https://arxiv.org/pdf/2501.16764.pdf
code: https://github.com/chenguolin/DiffSplat
video: null
tags:
- Diffusion
- Project
thumbnail: assets/thumbnails/lin2025diffsplat.jpg
publication_date: '2025-01-28T07:38:59+00:00'
date_source: arxiv
- id: armagan2025trickgs
title: 'Trick-GS: A Balanced Bag of Tricks for Efficient Gaussian Splatting'
authors: Anil Armagan, Albert Saà-Garriga, Bruno Manganelli, Mateusz Nowak, Mehmet
Kerim Yucel
year: '2025'
abstract: 'Gaussian splatting (GS) for 3D reconstruction has become quite popular
due to their fast training, inference speeds and high quality reconstruction.
However, GS-based reconstructions generally consist of millions of Gaussians,
which makes them hard to use on computationally constrained devices such as smartphones.
In this paper, we first propose a principled analysis of advances in efficient
GS methods. Then, we propose Trick-GS, which is a careful combination of several
strategies including (1) progressive training with resolution, noise and Gaussian
scales, (2) learning to prune and mask primitives and SH bands by their significance,
and (3) accelerated GS training framework. Trick-GS takes a large step towards
resource-constrained GS, where faster run-time, smaller and faster-convergence
of models is of paramount concern. Our results on three datasets show that Trick-GS
achieves up to 2x faster training, 40x smaller disk size and 2x faster rendering
speed compared to vanilla GS, while having comparable accuracy.
'
project_page: null
paper: https://arxiv.org/pdf/2501.14534.pdf
code: null
video: null
tags:
- Acceleration
thumbnail: assets/thumbnails/armagan2025trickgs.jpg
publication_date: '2025-01-24T14:40:40+00:00'
date_source: arxiv
- id: lee2025densesfm
title: 'Dense-SfM: Structure from Motion with Dense Consistent Matching'
authors: JongMin Lee, Sungjoo Yoo
year: '2025'
abstract: 'We present Dense-SfM, a novel Structure from Motion (SfM) framework designed
for dense and accurate 3D reconstruction from multi-view images. Sparse keypoint
matching, which traditional SfM methods often rely on, limits both accuracy and
point density, especially in texture-less areas. Dense-SfM addresses this limitation
by integrating dense matching with a Gaussian Splatting (GS) based track extension
which gives more consistent, longer feature tracks. To further improve reconstruction
accuracy, Dense-SfM is equipped with a multi-view kernelized matching module leveraging
transformer and Gaussian Process architectures, for robust track refinement across
multi-views. Evaluations on the ETH3D and Texture-Poor SfM datasets show that
Dense-SfM offers significant improvements in accuracy and density over state-of-the-art
methods.
'
project_page: null
paper: https://arxiv.org/pdf/2501.14277.pdf
code: null
video: null
tags:
- Point Cloud
- Poses
thumbnail: assets/thumbnails/lee2025densesfm.jpg
publication_date: '2025-01-24T06:45:12+00:00'
date_source: arxiv
- id: li2025micromacro
title: Micro-macro Wavelet-based Gaussian Splatting for 3D Reconstruction from Unconstrained
Images
authors: Yihui Li, Chengxin Lv, Hongyu Yang, Di Huang
year: '2025'
abstract: '3D reconstruction from unconstrained image collections presents substantial
challenges due to varying appearances and transient occlusions. In this paper,
we introduce Micro-macro Wavelet-based Gaussian Splatting (MW-GS), a novel approach
designed to enhance 3D reconstruction by disentangling scene representations into
global, refined, and intrinsic components. The proposed method features two key
innovations: Micro-macro Projection, which allows Gaussian points to capture details
from feature maps across multiple scales with enhanced diversity; and Wavelet-based
Sampling, which leverages frequency domain information to refine feature representations
and significantly improve the modeling of scene appearances. Additionally, we
incorporate a Hierarchical Residual Fusion Network to seamlessly integrate these
features. Extensive experiments demonstrate that MW-GS delivers state-of-the-art
rendering performance, surpassing existing methods.
'
project_page: null
paper: https://arxiv.org/pdf/2501.14231.pdf
code: null
video: null
tags:
- In the Wild
thumbnail: assets/thumbnails/li2025micromacro.jpg
publication_date: '2025-01-24T04:37:57+00:00'
date_source: arxiv
- id: yu2025hammer
title: 'HAMMER: Heterogeneous, Multi-Robot Semantic Gaussian Splatting'
authors: Javier Yu, Timothy Chen, Mac Schwager
year: '2025'
abstract: '3D Gaussian Splatting offers expressive scene reconstruction, modeling
a broad range of visual, geometric, and semantic information. However, efficient
real-time map reconstruction with data streamed from multiple robots and devices
remains a challenge. To that end, we propose HAMMER, a server-based collaborative
Gaussian Splatting method that leverages widely available ROS communication infrastructure
to generate 3D, metric-semantic maps from asynchronous robot data-streams with
no prior knowledge of initial robot positions and varying on-device pose estimators.
HAMMER consists of (i) a frame alignment module that transforms local SLAM poses
and image data into a global frame and requires no prior relative pose knowledge,
and (ii) an online module for training semantic 3DGS maps from streaming data.
HAMMER handles mixed perception modes, adjusts automatically for variations in
image pre-processing among different devices, and distills CLIP semantic codes
into the 3D scene for open-vocabulary language queries. In our real-world experiments,
HAMMER creates higher-fidelity maps (2x) compared to competing baselines and is
useful for downstream tasks, such as semantic goal-conditioned navigation (e.g.,
``go to the couch"). Accompanying content available at hammer-project.github.io.
'
project_page: https://hammer-project.github.io/
paper: https://arxiv.org/pdf/2501.14147.pdf
code: null
video: null
tags:
- Project
- Robotics
- SLAM
thumbnail: assets/thumbnails/yu2025hammer.jpg
publication_date: '2025-01-24T00:21:10+00:00'
date_source: arxiv
- id: yang2025fast3r
title: 'Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass'
authors: Jianing Yang, Alexander Sax, Kevin J. Liang, Mikael Henaff, Hao Tang, Ang
Cao, Joyce Chai, Franziska Meier, Matt Feiszli
year: '2025'
abstract: 'Multi-view 3D reconstruction remains a core challenge in computer vision,
particularly in applications requiring accurate and scalable representations across
diverse perspectives. Current leading methods such as DUSt3R employ a fundamentally
pairwise approach, processing images in pairs and necessitating costly global
alignment procedures to reconstruct from multiple views. In this work, we propose
Fast 3D Reconstruction (Fast3R), a novel multi-view generalization to DUSt3R that
achieves efficient and scalable 3D reconstruction by processing many views in
parallel. Fast3R''s Transformer-based architecture forwards N images in a single
forward pass, bypassing the need for iterative alignment. Through extensive experiments
on camera pose estimation and 3D reconstruction, Fast3R demonstrates state-of-the-art
performance, with significant improvements in inference speed and reduced error
accumulation. These results establish Fast3R as a robust alternative for multi-view
applications, offering enhanced scalability without compromising reconstruction
accuracy.
'
project_page: https://fast3r-3d.github.io/
paper: https://arxiv.org/pdf/2501.13928.pdf
code: null
video: null
tags:
- 3ster-based
- Project
thumbnail: assets/thumbnails/yang2025fast3r.jpg
publication_date: '2025-01-23T18:59:55+00:00'
date_source: arxiv
- id: sario2025gode
title: 'GoDe: Gaussians on Demand for Progressive Level of Detail and Scalable Compression'
authors: Francesco Di Sario, Riccardo Renzulli, Marco Grangetto, Akihiro Sugimoto,
Enzo Tartaglione
year: '2025'
abstract: '3D Gaussian Splatting enhances real-time performance in novel view synthesis
by representing scenes with mixtures of Gaussians and utilizing differentiable
rasterization. However, it typically requires large storage capacity and high
VRAM, demanding the design of effective pruning and compression techniques. Existing
methods, while effective in some scenarios, struggle with scalability and fail
to adapt models based on critical factors such as computing capabilities or bandwidth,
requiring to re-train the model under different configurations. In this work,
we propose a novel, model-agnostic technique that organizes Gaussians into several
hierarchical layers, enabling progressive Level of Detail (LoD) strategy. This
method, combined with recent approach of compression of 3DGS, allows a single
model to instantly scale across several compression ratios, with minimal to none
impact to quality compared to a single non-scalable model and without requiring
re-training. We validate our approach on typical datasets and benchmarks, showcasing
low distortion and substantial gains in terms of scalability and adaptability.
'
project_page: null
paper: https://arxiv.org/pdf/2501.13558.pdf
code: null
video: null
tags:
- Compression
- LoD
thumbnail: assets/thumbnails/sario2025gode.jpg
publication_date: '2025-01-23T11:05:45+00:00'
date_source: arxiv
- id: lan20253dgs2
title: '3DGS$^2$: Near Second-order Converging 3D Gaussian Splatting'
authors: Lei Lan, Tianjia Shao, Zixuan Lu, Yu Zhang, Chenfanfu Jiang, Yin Yang
year: '2025'
abstract: '3D Gaussian Splatting (3DGS) has emerged as a mainstream solution for
novel view synthesis and 3D reconstruction. By explicitly encoding a 3D scene
using a collection of Gaussian kernels, 3DGS achieves high-quality rendering with
superior efficiency. As a learning-based approach, 3DGS training has been dealt
with the standard stochastic gradient descent (SGD) method, which offers at most
linear convergence. Consequently, training often requires tens of minutes, even
with GPU acceleration. This paper introduces a (near) second-order convergent
training algorithm for 3DGS, leveraging its unique properties. Our approach is
inspired by two key observations. First, the attributes of a Gaussian kernel contribute
independently to the image-space loss, which endorses isolated and local optimization
algorithms. We exploit this by splitting the optimization at the level of individual
kernel attributes, analytically constructing small-size Newton systems for each
parameter group, and efficiently solving these systems on GPU threads. This achieves
Newton-like convergence per training image without relying on the global Hessian.
Second, kernels exhibit sparse and structured coupling across input images. This
property allows us to effectively utilize spatial information to mitigate overshoot
during stochastic training. Our method converges an order faster than standard
GPU-based 3DGS training, requiring over $10\times$ fewer iterations while maintaining
or surpassing the quality of the compared with the SGD-based 3DGS reconstructions.
'
project_page: null
paper: https://arxiv.org/pdf/2501.13975.pdf
code: null
video: null
tags:
- Optimization
thumbnail: assets/thumbnails/lan20253dgs2.jpg
publication_date: '2025-01-22T22:28:11+00:00'
date_source: arxiv
- id: shi2025sketch
title: 'Sketch and Patch: Efficient 3D Gaussian Representation for Man-Made Scenes'
authors: Yuang Shi, Simone Gasparini, Géraldine Morin, Chenggang Yang, Wei Tsang
Ooi
year: '2025'
abstract: '3D Gaussian Splatting (3DGS) has emerged as a promising representation
for photorealistic rendering of 3D scenes. However, its high storage requirements
pose significant challenges for practical applications. We observe that Gaussians
exhibit distinct roles and characteristics that are analogous to traditional artistic
techniques -- Like how artists first sketch outlines before filling in broader
areas with color, some Gaussians capture high-frequency features like edges and
contours; While other Gaussians represent broader, smoother regions, that are
analogous to broader brush strokes that add volume and depth to a painting. Based
on this observation, we propose a novel hybrid representation that categorizes
Gaussians into (i) Sketch Gaussians, which define scene boundaries, and (ii) Patch
Gaussians, which cover smooth regions. Sketch Gaussians are efficiently encoded
using parametric models, leveraging their geometric coherence, while Patch Gaussians
undergo optimized pruning, retraining, and vector quantization to maintain volumetric
consistency and storage efficiency. Our comprehensive evaluation across diverse
indoor and outdoor scenes demonstrates that this structure-aware approach achieves
up to 32.62% improvement in PSNR, 19.12% in SSIM, and 45.41% in LPIPS at equivalent
model sizes, and correspondingly, for an indoor scene, our model maintains the
visual quality with 2.3% of the original model size.
'
project_page: null
paper: https://arxiv.org/pdf/2501.13045.pdf
code: null
video: null
tags:
- Densification
thumbnail: assets/thumbnails/shi2025sketch.jpg
publication_date: '2025-01-22T17:52:45+00:00'
date_source: arxiv
- id: arunan2025darbsplatting
title: 'DARB-Splatting: Generalizing Splatting with Decaying Anisotropic Radial
Basis Functions'
authors: Vishagar Arunan, Saeedha Nazar, Hashiru Pramuditha, Vinasirajan Viruthshaan,
Sameera Ramasinghe, Simon Lucey, Ranga Rodrigo
year: '2025'
abstract: 'Splatting-based 3D reconstruction methods have gained popularity with
the advent of 3D Gaussian Splatting, efficiently synthesizing high-quality novel
views. These methods commonly resort to using exponential family functions, such
as the Gaussian function, as reconstruction kernels due to their anisotropic nature,
ease of projection, and differentiability in rasterization. However, the field
remains restricted to variations within the exponential family, leaving generalized
reconstruction kernels largely underexplored, partly due to the lack of easy integrability
in 3D to 2D projections. In this light, we show that a class of decaying anisotropic
radial basis functions (DARBFs), which are non-negative functions of the Mahalanobis
distance, supports splatting by approximating the Gaussian function''s closed-form
integration advantage. With this fresh perspective, we demonstrate up to 34% faster
convergence during training and a 15% reduction in memory consumption across various
DARB reconstruction kernels, while maintaining comparable PSNR, SSIM, and LPIPS
results. We will make the code available.
'
project_page: https://randomnerds.github.io/darbs.github.io/
paper: https://arxiv.org/pdf/2501.12369.pdf
code: null
video: null
tags:
- Project
- Rendering
thumbnail: assets/thumbnails/arunan2025darbsplatting.jpg
publication_date: '2025-01-21T18:49:06+00:00'
date_source: arxiv
- id: chen2025hac
title: 'HAC++: Towards 100X Compression of 3D Gaussian Splatting'
authors: Yihang Chen, Qianyi Wu, Weiyao Lin, Mehrtash Harandi, Jianfei Cai
year: '2025'
abstract: '3D Gaussian Splatting (3DGS) has emerged as a promising framework for
novel view synthesis, boasting rapid rendering speed with high fidelity. However,
the substantial Gaussians and their associated attributes necessitate effective
compression techniques. Nevertheless, the sparse and unorganized nature of the
point cloud of Gaussians (or anchors in our paper) presents challenges for compression.
To achieve a compact size, we propose HAC++, which leverages the relationships
between unorganized anchors and a structured hash grid, utilizing their mutual
information for context modeling. Additionally, HAC++ captures intra-anchor contextual
relationships to further enhance compression performance. To facilitate entropy
coding, we utilize Gaussian distributions to precisely estimate the probability
of each quantized attribute, where an adaptive quantization module is proposed
to enable high-precision quantization of these attributes for improved fidelity
restoration. Moreover, we incorporate an adaptive masking strategy to eliminate
invalid Gaussians and anchors. Overall, HAC++ achieves a remarkable size reduction
of over 100X compared to vanilla 3DGS when averaged on all datasets, while simultaneously
improving fidelity. It also delivers more than 20X size reduction compared to
Scaffold-GS. Our code is available at https://github.com/YihangChen-ee/HAC-plus.
'
project_page: https://yihangchen-ee.github.io/project_hac++/
paper: https://arxiv.org/pdf/2501.12255.pdf
code: https://github.com/YihangChen-ee/HAC-plus
video: null
tags:
- Code
- Compression
- Project
thumbnail: assets/thumbnails/chen2025hac.jpg
publication_date: '2025-01-21T16:23:05+00:00'
date_source: arxiv
- id: li2025cargs
title: 'Car-GS: Addressing Reflective and Transparent Surface Challenges in 3D Car
Reconstruction'
authors: Congcong Li, Jin Wang, Xiaomeng Wang, Xingchen Zhou, Wei Wu, Yuzhi Zhang,
Tongyi Cao
year: '2025'
abstract: '3D car modeling is crucial for applications in autonomous driving systems,
virtual and augmented reality, and gaming. However, due to the distinctive properties
of cars, such as highly reflective and transparent surface materials, existing
methods often struggle to achieve accurate 3D car reconstruction.To address these
limitations, we propose Car-GS, a novel approach designed to mitigate the effects
of specular highlights and the coupling of RGB and geometry in 3D geometric and
shading reconstruction (3DGS). Our method incorporates three key innovations:
First, we introduce view-dependent Gaussian primitives to effectively model surface
reflections. Second, we identify the limitations of using a shared opacity parameter
for both image rendering and geometric attributes when modeling transparent objects.
To overcome this, we assign a learnable geometry-specific opacity to each 2D Gaussian
primitive, dedicated solely to rendering depth and normals. Third, we observe
that reconstruction errors are most prominent when the camera view is nearly orthogonal
to glass surfaces. To address this issue, we develop a quality-aware supervision
module that adaptively leverages normal priors from a pre-trained large-scale
normal model.Experimental results demonstrate that Car-GS achieves precise reconstruction
of car surfaces and significantly outperforms prior methods. The project page
is available at https://lcc815.github.io/Car-GS.
'
project_page: null
paper: https://arxiv.org/pdf/2501.11020.pdf
code: https://lcc815.github.io/Car-GS/
video: null
tags:
- Code
- Meshing
- Rendering
thumbnail: assets/thumbnails/li2025cargs.jpg
publication_date: '2025-01-19T11:49:35+00:00'
date_source: arxiv
- id: zheng2025gstar
title: 'GSTAR: Gaussian Surface Tracking and Reconstruction'
authors: Chengwei Zheng, Lixin Xue, Juan Zarate, Jie Song
year: '2025'
abstract: '3D Gaussian Splatting techniques have enabled efficient photo-realistic
rendering of static scenes. Recent works have extended these approaches to support
surface reconstruction and tracking. However, tracking dynamic surfaces with 3D
Gaussians remains challenging due to complex topology changes, such as surfaces
appearing, disappearing, or splitting. To address these challenges, we propose
GSTAR, a novel method that achieves photo-realistic rendering, accurate surface
reconstruction, and reliable 3D tracking for general dynamic scenes with changing
topology. Given multi-view captures as input, GSTAR binds Gaussians to mesh faces
to represent dynamic objects. For surfaces with consistent topology, GSTAR maintains
the mesh topology and tracks the meshes using Gaussians. In regions where topology
changes, GSTAR adaptively unbinds Gaussians from the mesh, enabling accurate registration
and the generation of new surfaces based on these optimized Gaussians. Additionally,
we introduce a surface-based scene flow method that provides robust initialization
for tracking between frames. Experiments demonstrate that our method effectively
tracks and reconstructs dynamic surfaces, enabling a range of applications. Our
project page with the code release is available at https://eth-ait.github.io/GSTAR/.
'
project_page: chengwei-zheng.github.io/GSTAR/
paper: https://arxiv.org/pdf/2501.10283.pdf
code: null
video: https://www.youtube.com/watch?v=Fwby4PrjFeM
tags:
- Avatar
- Dynamic
- Meshing
- Project
- Video
thumbnail: assets/thumbnails/zheng2025gstar.jpg
publication_date: '2025-01-17T16:26:24+00:00'
date_source: arxiv
- id: ma2025cityloc
title: 'CityLoc: 6 DoF Localization of Text Descriptions in Large-Scale Scenes with
Gaussian Representation'
authors: Qi Ma, Runyi Yang, Bin Ren, Ender Konukoglu, Luc Van Gool, Danda Pani Paudel
year: '2025'
abstract: 'Localizing text descriptions in large-scale 3D scenes is inherently an
ambiguous task. This nonetheless arises while describing general concepts, e.g.
all traffic lights in a city. To facilitate reasoning based on such concepts,
text localization in the form of distribution is required. In this paper, we generate
the distribution of the camera poses conditioned upon the textual description.
To facilitate such generation, we propose a diffusion-based architecture that
conditionally diffuses the noisy 6DoF camera poses to their plausible locations.
The conditional signals are derived from the text descriptions, using the pre-trained
text encoders. The connection between text descriptions and pose distribution
is established through pretrained Vision-Language-Model, i.e. CLIP. Furthermore,
we demonstrate that the candidate poses for the distribution can be further refined
by rendering potential poses using 3D Gaussian splatting, guiding incorrectly
posed samples towards locations that better align with the textual description,
through visual reasoning. We demonstrate the effectiveness of our method by
comparing it with both standard retrieval methods and learning-based approaches.
Our proposed method consistently outperforms these baselines across all five large-scale
datasets. Our source code and dataset will be made publicly available.
'
project_page: null
paper: https://arxiv.org/pdf/2501.08982.pdf
code: null
video: null
tags:
- Language Embedding
- Large-Scale
thumbnail: assets/thumbnails/ma2025cityloc.jpg
publication_date: '2025-01-15T17:59:32+00:00'
date_source: arxiv
- id: hong2025gslivo
title: 'GS-LIVO: Real-Time LiDAR, Inertial, and Visual Multi-sensor Fused Odometry
with Gaussian Mapping'
authors: Sheng Hong, Chunran Zheng, Yishu Shen, Changze Li, Fu Zhang, Tong Qin,
Shaojie Shen
year: '2025'
abstract: 'In recent years, 3D Gaussian splatting (3D-GS) has emerged as a novel
scene representation approach. However, existing vision-only 3D-GS methods often
rely on hand-crafted heuristics for point-cloud densification and face challenges
in handling occlusions and high GPU memory and computation consumption. LiDAR-Inertial-Visual
(LIV) sensor configuration has demonstrated superior performance in localization
and dense mapping by leveraging complementary sensing characteristics: rich texture
information from cameras, precise geometric measurements from LiDAR, and high-frequency
motion data from IMU. Inspired by this, we propose a novel real-time Gaussian-based
simultaneous localization and mapping (SLAM) system. Our map system comprises
a global Gaussian map and a sliding window of Gaussians, along with an IESKF-based
odometry. The global Gaussian map consists of hash-indexed voxels organized in
a recursive octree, effectively covering sparse spatial volumes while adapting
to different levels of detail and scales. The Gaussian map is initialized through
multi-sensor fusion and optimized with photometric gradients. Our system incrementally
maintains a sliding window of Gaussians, significantly reducing GPU computation
and memory consumption by only optimizing the map within the sliding window. Moreover,
we implement a tightly coupled multi-sensor fusion odometry with an iterative
error state Kalman filter (IESKF), leveraging real-time updating and rendering
of the Gaussian map. Our system represents the first real-time Gaussian-based
SLAM framework deployable on resource-constrained embedded systems, demonstrated
on the NVIDIA Jetson Orin NX platform. The framework achieves real-time performance
while maintaining robust multi-sensor fusion capabilities. All implementation
algorithms, hardware designs, and CAD models will be publicly available.
'
project_page: null
paper: https://arxiv.org/pdf/2501.08672.pdf
code: null
video: null
tags:
- Large-Scale
- Lidar
thumbnail: assets/thumbnails/hong2025gslivo.jpg
publication_date: '2025-01-15T09:04:56+00:00'
date_source: arxiv
- id: wu2025vingsmono
title: 'VINGS-Mono: Visual-Inertial Gaussian Splatting Monocular SLAM in Large Scenes'
authors: Ke Wu, Zicheng Zhang, Muer Tie, Ziqing Ai, Zhongxue Gan, Wenchao Ding
year: '2025'
abstract: 'VINGS-Mono is a monocular (inertial) Gaussian Splatting (GS) SLAM framework
designed for large scenes. The framework comprises four main components: VIO Front
End, 2D Gaussian Map, NVS Loop Closure, and Dynamic Eraser. In the VIO Front End,
RGB frames are processed through dense bundle adjustment and uncertainty estimation
to extract scene geometry and poses. Based on this output, the mapping module
incrementally constructs and maintains a 2D Gaussian map. Key components of the
2D Gaussian Map include a Sample-based Rasterizer, Score Manager, and Pose Refinement,
which collectively improve mapping speed and localization accuracy. This enables
the SLAM system to handle large-scale urban environments with up to 50 million
Gaussian ellipsoids. To ensure global consistency in large-scale scenes, we design
a Loop Closure module, which innovatively leverages the Novel View Synthesis (NVS)
capabilities of Gaussian Splatting for loop closure detection and correction of
the Gaussian map. Additionally, we propose a Dynamic Eraser to address the inevitable
presence of dynamic objects in real-world outdoor scenes. Extensive evaluations
in indoor and outdoor environments demonstrate that our approach achieves localization
performance on par with Visual-Inertial Odometry while surpassing recent GS/NeRF
SLAM methods. It also significantly outperforms all existing methods in terms
of mapping and rendering quality. Furthermore, we developed a mobile app and verified
that our framework can generate high-quality Gaussian maps in real time using
only a smartphone camera and a low-frequency IMU sensor. To the best of our knowledge,
VINGS-Mono is the first monocular Gaussian SLAM method capable of operating in
outdoor environments and supporting kilometer-scale large scenes.
'
project_page: null
paper: https://arxiv.org/pdf/2501.08286.pdf
code: null
video: null
tags:
- Large-Scale
- Meshing
- SLAM
thumbnail: assets/thumbnails/wu2025vingsmono.jpg
publication_date: '2025-01-14T18:01:15+00:00'
date_source: arxiv
- id: rogge2025objectcentric
title: 'Object-Centric 2D Gaussian Splatting: Background Removal and Occlusion-Aware
Pruning for Compact Object Models'
authors: Marcel Rogge, Didier Stricker
year: '2025'
abstract: 'Current Gaussian Splatting approaches are effective for reconstructing
entire scenes but lack the option to target specific objects, making them computationally
expensive and unsuitable for object-specific applications. We propose a novel
approach that leverages object masks to enable targeted reconstruction, resulting
in object-centric models. Additionally, we introduce an occlusion-aware pruning
strategy to minimize the number of Gaussians without compromising quality. Our
method reconstructs compact object models, yielding object-centric Gaussian and
mesh representations that are up to 96\% smaller and up to 71\% faster to train
compared to the baseline while retaining competitive quality. These representations
are immediately usable for downstream applications such as appearance editing
and physics simulation without additional processing.
'
project_page: null
paper: https://arxiv.org/pdf/2501.08174.pdf
code: null
video: null
tags:
- Compression
- Densification
- Editing
thumbnail: assets/thumbnails/rogge2025objectcentric.jpg
publication_date: '2025-01-14T14:56:31+00:00'
date_source: arxiv
- id: liu2025uncommon
title: UnCommon Objects in 3D
authors: Xingchen Liu, Piyush Tayal, Jianyuan Wang, Jesus Zarzar, Tom Monnier, Konstantinos
Tertikas, Jiali Duan, Antoine Toisoul, Jason Y. Zhang, Natalia Neverova, Andrea
Vedaldi, Roman Shapovalov, David Novotny
year: '2025'
abstract: 'We introduce Uncommon Objects in 3D (uCO3D), a new object-centric dataset
for 3D deep learning and 3D generative AI. uCO3D is the largest publicly-available
collection of high-resolution videos of objects with 3D annotations that ensures
full-360$^{\circ}$ coverage. uCO3D is significantly more diverse than MVImgNet
and CO3Dv2, covering more than 1,000 object categories. It is also of higher quality,
due to extensive quality checks of both the collected videos and the 3D annotations.
Similar to analogous datasets, uCO3D contains annotations for 3D camera poses,
depth maps and sparse point clouds. In addition, each object is equipped with
a caption and a 3D Gaussian Splat reconstruction. We train several large 3D models
on MVImgNet, CO3Dv2, and uCO3D and obtain superior results using the latter, showing
that uCO3D is better for learning applications.
'
project_page: https://uco3d.github.io/
paper: https://arxiv.org/pdf/2501.07574.pdf
code: https://github.com/facebookresearch/uco3d
video: null
tags:
- Code
- Project
thumbnail: assets/thumbnails/liu2025uncommon.jpg
publication_date: '2025-01-13T18:59:20+00:00'
date_source: arxiv
- id: stuart20253dgstopc
title: '3DGS-to-PC: Convert a 3D Gaussian Splatting Scene into a Dense Point Cloud
or Mesh'
authors: Lewis A G Stuart, Michael P Pound
year: '2025'
abstract: '3D Gaussian Splatting (3DGS) excels at producing highly detailed 3D reconstructions,
but these scenes often require specialised renderers for effective visualisation.
In contrast, point clouds are a widely used 3D representation and are compatible
with most popular 3D processing software, yet converting 3DGS scenes into point
clouds is a complex challenge. In this work we introduce 3DGS-to-PC, a flexible
and highly customisable framework that is capable of transforming 3DGS scenes
into dense, high-accuracy point clouds. We sample points probabilistically from
each Gaussian as a 3D density function. We additionally threshold new points using
the Mahalanobis distance to the Gaussian centre, preventing extreme outliers.
The result is a point cloud that closely represents the shape encoded into the
3D Gaussian scene. Individual Gaussians use spherical harmonics to adapt colours
depending on view, and each point may contribute only subtle colour hints to the
resulting rendered scene. To avoid spurious or incorrect colours that do not fit
with the final point cloud, we recalculate Gaussian colours via a customised image
rendering approach, assigning each Gaussian the colour of the pixel to which it
contributes most across all views. 3DGS-to-PC also supports mesh generation through
Poisson Surface Reconstruction, applied to points sampled from predicted surface
Gaussians. This allows coloured meshes to be generated from 3DGS scenes without
the need for re-training. This package is highly customisable and capability of
simple integration into existing 3DGS pipelines. 3DGS-to-PC provides a powerful
tool for converting 3DGS data into point cloud and surface-based formats.
'
project_page: null
paper: https://arxiv.org/pdf/2501.07478.pdf
code: https://github.com/Lewis-Stuart-11/3DGS-to-PC
video: null
tags:
- Code
- Point Cloud
thumbnail: assets/thumbnails/stuart20253dgstopc.jpg
publication_date: '2025-01-13T16:52:28+00:00'
date_source: arxiv
- id: zhang2025evaluating
title: 'Evaluating Human Perception of Novel View Synthesis: Subjective Quality
Assessment of Gaussian Splatting and NeRF in Dynamic Scenes'
authors: Yuhang Zhang, Joshua Maraval, Zhengyu Zhang, Nicolas Ramin, Shishun Tian,
Lu Zhang
year: '2025'
abstract: 'Gaussian Splatting (GS) and Neural Radiance Fields (NeRF) are two groundbreaking
technologies that have revolutionized the field of Novel View Synthesis (NVS),
enabling immersive photorealistic rendering and user experiences by synthesizing
multiple viewpoints from a set of images of sparse views. The potential applications
of NVS, such as high-quality virtual and augmented reality, detailed 3D modeling,
and realistic medical organ imaging, underscore the importance of quality assessment
of NVS methods from the perspective of human perception. Although some previous
studies have explored subjective quality assessments for NVS technology, they
still face several challenges, especially in NVS methods selection, scenario coverage,
and evaluation methodology. To address these challenges, we conducted two subjective
experiments for the quality assessment of NVS technologies containing both GS-based
and NeRF-based methods, focusing on dynamic and real-world scenes. This study
covers 360{\deg}, front-facing, and single-viewpoint videos while providing a
richer and greater number of real scenes. Meanwhile, it''s the first time to explore
the impact of NVS methods in dynamic scenes with moving objects. The two types
of subjective experiments help to fully comprehend the influences of different
viewing paths from a human perception perspective and pave the way for future
development of full-reference and no-reference quality metrics. In addition, we
established a comprehensive benchmark of various state-of-the-art objective metrics
on the proposed database, highlighting that existing methods still struggle to
accurately capture subjective quality. The results give us some insights into
the limitations of existing NVS methods and may promote the development of new
NVS methods.
'
project_page: null
paper: https://arxiv.org/pdf/2501.08072.pdf
code: null
video: null
tags:
- Dynamic
thumbnail: assets/thumbnails/zhang2025evaluating.jpg
publication_date: '2025-01-13T10:01:27+00:00'
date_source: arxiv
- id: peng2025rmavatar
title: 'RMAvatar: Photorealistic Human Avatar Reconstruction from Monocular Video
Based on Rectified Mesh-embedded Gaussians'
authors: Sen Peng, Weixing Xie, Zilong Wang, Xiaohu Guo, Zhonggui Chen, Baorong
Yang, Xiao Dong
year: '2025'
abstract: 'We introduce RMAvatar, a novel human avatar representation with Gaussian
splatting embedded on mesh to learn clothed avatar from a monocular video. We
utilize the explicit mesh geometry to represent motion and shape of a virtual
human and implicit appearance rendering with Gaussian Splatting. Our method consists
of two main modules: Gaussian initialization module and Gaussian rectification
module. We embed Gaussians into triangular faces and control their motion through
the mesh, which ensures low-frequency motion and surface deformation of the avatar.
Due to the limitations of LBS formula, the human skeleton is hard to control complex
non-rigid transformations. We then design a pose-related Gaussian rectification
module to learn fine-detailed non-rigid deformations, further improving the realism
and expressiveness of the avatar. We conduct extensive experiments on public datasets,
RMAvatar shows state-of-the-art performance on both rendering quality and quantitative
evaluations. Please see our project page at https://rm-avatar.github.io.
'
project_page: https://rm-avatar.github.io/
paper: https://arxiv.org/pdf/2501.07104.pdf
code: https://github.com/RMAvatar/RMAvatar
video: null
tags:
- Avatar
- Code
- Dynamic
- Meshing
- Monocular
- Project
thumbnail: assets/thumbnails/peng2025rmavatar.jpg
publication_date: '2025-01-13T07:32:44+00:00'
date_source: arxiv
- id: zielonka2025synthetic
title: Synthetic Prior for Few-Shot Drivable Head Avatar Inversion
authors: Wojciech Zielonka, Stephan J. Garbin, Alexandros Lattas, George Kopanas,
Paulo Gotardo, Thabo Beeler, Justus Thies, Timo Bolkart
year: '2025'
abstract: 'We present SynShot, a novel method for the few-shot inversion of a drivable
head avatar based on a synthetic prior. We tackle two major challenges. First,
training a controllable 3D generative network requires a large number of diverse
sequences, for which pairs of images and high-quality tracked meshes are not always
available. Second, state-of-the-art monocular avatar models struggle to generalize
to new views and expressions, lacking a strong prior and often overfitting to
a specific viewpoint distribution. Inspired by machine learning models trained
solely on synthetic data, we propose a method that learns a prior model from a
large dataset of synthetic heads with diverse identities, expressions, and viewpoints.
With few input images, SynShot fine-tunes the pretrained synthetic prior to bridge
the domain gap, modeling a photorealistic head avatar that generalizes to novel
expressions and viewpoints. We model the head avatar using 3D Gaussian splatting
and a convolutional encoder-decoder that outputs Gaussian parameters in UV texture
space. To account for the different modeling complexities over parts of the head
(e.g., skin vs hair), we embed the prior with explicit control for upsampling
the number of per-part primitives. Compared to state-of-the-art monocular methods
that require thousands of real training images, SynShot significantly improves
novel view and expression synthesis.
'
project_page: https://zielon.github.io/synshot/
paper: https://arxiv.org/pdf/2501.06903.pdf
code: null
video: https://www.youtube.com/watch?v=4KQQatkaSgc
tags:
- Avatar
- Dynamic
- Project
- Sparse
- Video
thumbnail: assets/thumbnails/zielonka2025synthetic.jpg
publication_date: '2025-01-12T19:01:05+00:00'
date_source: arxiv
- id: chen2025generalized
title: Generalized and Efficient 2D Gaussian Splatting for Arbitrary-scale Super-Resolution
authors: Du Chen, Liyi Chen, Zhengqiang Zhang, Lei Zhang
year: '2025'
abstract: 'Equipped with the continuous representation capability of Multi-Layer
Perceptron (MLP), Implicit Neural Representation (INR) has been successfully employed
for Arbitrary-scale Super-Resolution (ASR). However, the limited receptive field
of the linear layers in MLP restricts the representation capability of INR, while
it is computationally expensive to query the MLP numerous times to render each
pixel. Recently, Gaussian Splatting (GS) has shown its advantages over INR in
both visual quality and rendering speed in 3D tasks, which motivates us to explore
whether GS can be employed for the ASR task. However, directly applying GS to
ASR is exceptionally challenging because the original GS is an optimization-based
method through overfitting each single scene, while in ASR we aim to learn a single
model that can generalize to different images and scaling factors. We overcome
these challenges by developing two novel techniques. Firstly, to generalize GS
for ASR, we elaborately design an architecture to predict the corresponding image-conditioned
Gaussians of the input low-resolution image in a feed-forward manner. Secondly,
we implement an efficient differentiable 2D GPU/CUDA-based scale-aware rasterization
to render super-resolved images by sampling discrete RGB values from the predicted
contiguous Gaussians. Via end-to-end training, our optimized network, namely GSASR,
can perform ASR for any image and unseen scaling factors. Extensive experiments
validate the effectiveness of our proposed method. The project page can be found
at \url{https://mt-cly.github.io/GSASR.github.io/}.
'
project_page: https://mt-cly.github.io/GSASR.github.io/
paper: https://arxiv.org/pdf/2501.06838.pdf
code: null
video: null
tags:
- Project
- Super Resolution
thumbnail: assets/thumbnails/chen2025generalized.jpg
publication_date: '2025-01-12T15:14:58+00:00'
date_source: arxiv
- id: wang2025f3dgaus
title: 'F3D-Gaus: Feed-forward 3D-aware Generation on ImageNet with Cycle-Consistent
Gaussian Splatting'
authors: Yuxin Wang, Qianyi Wu, Dan Xu
year: '2025'
abstract: 'This paper tackles the problem of generalizable 3D-aware generation from
monocular datasets, e.g., ImageNet. The key challenge of this task is learning
a robust 3D-aware representation without multi-view or dynamic data, while ensuring
consistent texture and geometry across different viewpoints. Although some baseline
methods are capable of 3D-aware generation, the quality of the generated images
still lags behind state-of-the-art 2D generation approaches, which excel in producing
high-quality, detailed images. To address this severe limitation, we propose a
novel feed-forward pipeline based on pixel-aligned Gaussian Splatting, coined
as F3D-Gaus, which can produce more realistic and reliable 3D renderings from
monocular inputs. In addition, we introduce a self-supervised cycle-consistent
constraint to enforce cross-view consistency in the learned 3D representation.
This training strategy naturally allows aggregation of multiple aligned Gaussian
primitives and significantly alleviates the interpolation limitations inherent
in single-view pixel-aligned Gaussian Splatting. Furthermore, we incorporate video
model priors to perform geometry-aware refinement, enhancing the generation of
fine details in wide-viewpoint scenarios and improving the model''s capability
to capture intricate 3D textures. Extensive experiments demonstrate that our approach
not only achieves high-quality, multi-view consistent 3D-aware generation from
monocular datasets, but also significantly improves training and inference efficiency.
'
project_page: https://arxiv.org/abs/2501.06714
paper: https://arxiv.org/pdf/2501.06714.pdf
code: https://github.com/W-Ted/F3D-Gaus
video: null
tags:
- Code
- Feed-Forward
- Monocular
- Project
thumbnail: assets/thumbnails/wang2025f3dgaus.jpg
publication_date: '2025-01-12T04:44:44+00:00'
date_source: arxiv
- id: asim2025met3r
title: 'MEt3R: Measuring Multi-View Consistency in Generated Images'
authors: Mohammad Asim, Christopher Wewer, Thomas Wimmer, Bernt Schiele, Jan Eric
Lenssen
year: '2025'
abstract: 'We introduce MEt3R, a metric for multi-view consistency in generated
images. Large-scale generative models for multi-view image generation are rapidly
advancing the field of 3D inference from sparse observations. However, due to
the nature of generative modeling, traditional reconstruction metrics are not
suitable to measure the quality of generated outputs and metrics that are independent
of the sampling procedure are desperately needed. In this work, we specifically
address the aspect of consistency between generated multi-view images, which can
be evaluated independently of the specific scene. Our approach uses DUSt3R to
obtain dense 3D reconstructions from image pairs in a feed-forward manner, which
are used to warp image contents from one view into the other. Then, feature maps
of these images are compared to obtain a similarity score that is invariant to
view-dependent effects. Using MEt3R, we evaluate the consistency of a large set
of previous methods for novel view and video generation, including our open, multi-view
latent diffusion model.
'
project_page: https://geometric-rl.mpi-inf.mpg.de/met3r/
paper: https://arxiv.org/pdf/2501.06336.pdf
code: https://github.com/mohammadasim98/MEt3R
video: https://geometric-rl.mpi-inf.mpg.de/met3r/static/videos/teaser.mp4
tags:
- 3ster-based
- Code
- Diffusion
- Project
- Video
thumbnail: assets/thumbnails/asim2025met3r.jpg
publication_date: '2025-01-10T20:43:33+00:00'
date_source: arxiv
- id: shin2025localityaware
title: Locality-aware Gaussian Compression for Fast and High-quality Rendering
authors: Seungjoo Shin, Jaesik Park, Sunghyun Cho
year: '2025'
abstract: 'We present LocoGS, a locality-aware 3D Gaussian Splatting (3DGS) framework
that exploits the spatial coherence of 3D Gaussians for compact modeling of volumetric
scenes. To this end, we first analyze the local coherence of 3D Gaussian attributes,
and propose a novel locality-aware 3D Gaussian representation that effectively
encodes locally-coherent Gaussian attributes using a neural field representation
with a minimal storage requirement. On top of the novel representation, LocoGS
is carefully designed with additional components such as dense initialization,
an adaptive spherical harmonics bandwidth scheme and different encoding schemes
for different Gaussian attributes to maximize compression performance. Experimental
results demonstrate that our approach outperforms the rendering quality of existing
compact Gaussian representations for representative real-world 3D datasets while
achieving from 54.6$\times$ to 96.6$\times$ compressed storage size and from 2.1$\times$
to 2.4$\times$ rendering speed than 3DGS. Even our approach also demonstrates
an averaged 2.4$\times$ higher rendering speed than the state-of-the-art compression
method with comparable compression performance.
'
project_page: null
paper: https://arxiv.org/pdf/2501.05757.pdf
code: null
video: null
tags:
- Compression
thumbnail: assets/thumbnails/shin2025localityaware.jpg
publication_date: '2025-01-10T07:19:41+00:00'
date_source: arxiv
- id: yan2025consistent
title: Consistent Flow Distillation for Text-to-3D Generation
authors: Runjie Yan, Yinbo Chen, Xiaolong Wang
year: '2025'
abstract: 'Score Distillation Sampling (SDS) has made significant strides in distilling
image-generative models for 3D generation. However, its maximum-likelihood-seeking
behavior often leads to degraded visual quality and diversity, limiting its effectiveness
in 3D applications. In this work, we propose Consistent Flow Distillation (CFD),
which addresses these limitations. We begin by leveraging the gradient of the
diffusion ODE or SDE sampling process to guide the 3D generation. From the gradient-based
sampling perspective, we find that the consistency of 2D image flows across different
viewpoints is important for high-quality 3D generation. To achieve this, we introduce
multi-view consistent Gaussian noise on the 3D object, which can be rendered from
various viewpoints to compute the flow gradient. Our experiments demonstrate that
CFD, through consistent flows, significantly outperforms previous methods in text-to-3D
generation.
'
project_page: https://runjie-yan.github.io/cfd/
paper: https://arxiv.org/pdf/2501.05445.pdf
code: https://github.com/runjie-yan/ConsistentFlowDistillation
video: null
tags:
- Code
- Diffusion
- Project
thumbnail: assets/thumbnails/yan2025consistent.jpg
publication_date: '2025-01-09T18:56:05+00:00'
date_source: arxiv
- id: meng2025zero1tog
title: 'Zero-1-to-G: Taming Pretrained 2D Diffusion Model for Direct 3D Generation'
authors: Xuyi Meng, Chen Wang, Jiahui Lei, Kostas Daniilidis, Jiatao Gu, Lingjie
Liu
year: '2025'
abstract: 'Recent advances in 2D image generation have achieved remarkable quality,largely
driven by the capacity of diffusion models and the availability of large-scale
datasets. However, direct 3D generation is still constrained by the scarcity and
lower fidelity of 3D datasets. In this paper, we introduce Zero-1-to-G, a novel
approach that addresses this problem by enabling direct single-view generation
on Gaussian splats using pretrained 2D diffusion models. Our key insight is that
Gaussian splats, a 3D representation, can be decomposed into multi-view images
encoding different attributes. This reframes the challenging task of direct 3D
generation within a 2D diffusion framework, allowing us to leverage the rich priors
of pretrained 2D diffusion models. To incorporate 3D awareness, we introduce cross-view
and cross-attribute attention layers, which capture complex correlations and enforce
3D consistency across generated splats. This makes Zero-1-to-G the first direct
image-to-3D generative model to effectively utilize pretrained 2D diffusion priors,
enabling efficient training and improved generalization to unseen objects. Extensive
experiments on both synthetic and in-the-wild datasets demonstrate superior performance
in 3D object generation, offering a new approach to high-quality 3D generation.
'
project_page: https://mengxuyigit.github.io/projects/zero-1-to-G/
paper: https://arxiv.org/pdf/2501.05427.pdf
code: null
video: null
tags:
- Diffusion
- Project
thumbnail: assets/thumbnails/meng2025zero1tog.jpg
publication_date: '2025-01-09T18:37:35+00:00'
date_source: arxiv
- id: gerogiannis2025arc2avatar
title: 'Arc2Avatar: Generating Expressive 3D Avatars from a Single Image via ID
Guidance'
authors: Dimitrios Gerogiannis, Foivos Paraperas Papantoniou, Rolandos Alexandros
Potamias, Alexandros Lattas, Stefanos Zafeiriou
year: '2025'
abstract: 'Inspired by the effectiveness of 3D Gaussian Splatting (3DGS) in reconstructing
detailed 3D scenes within multi-view setups and the emergence of large 2D human
foundation models, we introduce Arc2Avatar, the first SDS-based method utilizing
a human face foundation model as guidance with just a single image as input. To
achieve that, we extend such a model for diverse-view human head generation by
fine-tuning on synthetic data and modifying its conditioning. Our avatars maintain
a dense correspondence with a human face mesh template, allowing blendshape-based
expression generation. This is achieved through a modified 3DGS approach, connectivity
regularizers, and a strategic initialization tailored for our task. Additionally,
we propose an optional efficient SDS-based correction step to refine the blendshape
expressions, enhancing realism and diversity. Experiments demonstrate that Arc2Avatar
achieves state-of-the-art realism and identity preservation, effectively addressing
color issues by allowing the use of very low guidance, enabled by our strong identity
prior and initialization strategy, without compromising detail.
'
project_page: null
paper: https://arxiv.org/pdf/2501.05379.pdf
code: null
video: null
tags:
- Avatar
- Diffusion