-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
188 lines (159 loc) · 9.18 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
<script src="http://www.google.com/jsapi" type="text/javascript"></script>
<script type="text/javascript">google.load("jquery", "1.3.2");</script>
<link href="https://fonts.googleapis.com/css2?family=Open+Sans&display=swap"
rel="stylesheet">
<link rel="stylesheet" type="text/css" href="./resources/style.css" media="screen"/>
<html lang="en">
<head>
<title>SLAMP: Stochastic Latent Appearance and Motion Prediction</title>
<!-- Facebook automatically scrapes this. Go to https://developers.facebook.com/tools/debug/
if you update and want to force Facebook to re-scrape. -->
<meta property="og:image" content="Path to my teaser.jpg"/>
<meta property="og:title" content="SLAMP: Stochastic Latent Appearance and Motion Prediction" />
<meta property="og:description" content="Stochastic Video Prediction" />
<!-- Twitter automatically scrapes this. Go to https://cards-dev.twitter.com/validator?
if you update and want to force Twitter to re-scrape. -->
<meta property="twitter:card" content="Stochastic Video Prediction" />
<meta property="twitter:title" content="SLAMP: Stochastic Latent Appearance and Motion Prediction" />
<meta property="twitter:description" content="Stochastic Video Prediction" />
<meta property="twitter:image" content="Path to my teaser.jpg" />
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<!-- Add your Google Analytics tag here -->
<script async
src="https://www.googletagmanager.com/gtag/js?id=UA-97476543-1"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag() {
dataLayer.push(arguments);
}
gtag('js', new Date());
gtag('config', 'UA-97476543-1');
</script>
</head>
<body>
<div class="container">
<div class="title">
SLAMP: Stochastic Latent Appearance and Motion Prediction
</div>
<div class="venue">
In ICCV 2021
</div>
<br><br>
<div class="author">
<a href="https://kaanakan.github.io" target="_blank">Adil Kaan Akan</a><sup>1</sup>
</div>
<div class="author">
<a href="https://web.cs.hacettepe.edu.tr/~erkut/" target="_blank">Erkut Erdem</a><sup>2</sup>
</div>
<div class="author">
<a href="https://aykuterdem.github.io/" target="_blank">Aykut Erdem</a><sup>1</sup>
</div>
<div class="author">
<a href="https://mysite.ku.edu.tr/fguney/" target="_blank">Fatma Guney</a><sup>1</sup>
</div>
<br><br>
<div class="affiliation"><sup>1 </sup><a href="https://ai.ku.edu.tr" target="_blank"> Koc University Is Bank AI Center</a></div>
<div class="affiliation"><sup>2 </sup><a href="https://vision.cs.hacettepe.edu.tr/" target="_blank">Hacettepe University Computer Vision Lab</a></div>
<br><br>
<div class="links"><a href="https://arxiv.org/abs/2108.02760" target="_blank">[Paper]</a></div>
<div class="links"><a href="https://github.com/kaanakan/slamp" target="_blank">[Code]</a></div>
<div class="links"><a href="https://www.youtube.com/watch?v=b49Sh5tIU5s" target="_blank">[Video]</a></div>
<div class="links"><a href="resources/poster.pdf" target="_blank">[Poster]</a></div>
<br><br>
<img style="width: 100%;" src="./resources/teaser_mnist.gif" alt="MNIST."/> <br><br>
<img style="width: 100%;" src="./resources/teaser_kth.gif" alt="KTH."/> <br><br>
<img style="width: 100%;;" src="./resources/teaser_bair.gif" alt="BAIR."/> <br><br>
<img style="width: 100%;height: 25%" src="./resources/teaser_city.gif" alt="Cityscapes."/> <br><br>
<img style="width: 100%;height: 25%" src="./resources/teaser_kitti.gif" alt="KITTI."/>
<br>
<br>
<p style="width: 80%;text-align:center;">
Example predictions for MNIST, KTH, BAIR, Cityscapes and KITTI datasets</a>.
</p>
<br><br>
<hr>
<h1>Abstract</h1>
<p style="width: 80%;">
Motion is an important cue for video prediction and often utilized by separating video content into static and dynamic components. Most of the previous work utilizing motion is deterministic but there are stochastic methods that can model the inherent uncertainty of the future.
Existing stochastic models either do not reason about motion explicitly or make limiting assumptions about the static part. In this paper, we reason about appearance and motion in the video stochastically by predicting the future based on the motion history.
Explicit reasoning about motion without history already reaches the performance of current stochastic models. The motion history further improves the results by allowing to predict consistent dynamics several frames into the future. Our model performs comparably to the state-of-the-art models on the generic video prediction datasets, however, significantly outperforms them on two challenging real-world autonomous driving datasets with complex motion and dynamic background.
</p>
<br><br>
<hr>
<!-- <h1>Video</h1>
<div class="video-container">
<iframe src="https://www.youtube.com/embed/dQw4w9WgXcQ" frameBorder="0"
allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
allowfullscreen></iframe>
</div>
<br><br>
<hr> -->
<h1>Method Overview</h1>
<img style="width: 80%;" src="./resources/method.jpg"
alt="Method overview figure"/>
<br><br>
<p style="width: 80%;">
Instead of focusing on pixel space only, we also focused a more meaningful space by modelling the motion of the scene. We model the motion history explicitly by learning to predict the optical flow which is needed to go from the current frame to future frame besides predicting the future scene in the pixel space. At the end, we have two predictions, one coming from pixel prediction head, and the other one coming from optical flow warping, and we combine both predictions by predicting a mask to choose between these two predictions. The final prediction consists of best parts of both predictions.
</p>
<br>
<a class="links" href="https://github.com/kaanakan/slamp" target="_blank">[Code]</a>
<br><br>
<hr>
<h1>Results</h1>
<img style="width: 100%;" src="./resources/result_mnist.gif"
alt="Results figure"/>
<br><br>
<img style="width: 100%;" src="./resources/result_kth.gif"
alt="Results figure"/>
<br><br>
<img style="width: 100%;" src="./resources/result_bair.gif"
alt="Results figure"/>
<br><br>
<img style="width: 100%;height: 25%" src="./resources/result_city.gif"
alt="Results figure"/>
<br><br>
<img style="width: 100%;height: 25%" src="./resources/result_kitti.gif"
alt="Results figure"/>
<br><br>
<hr>
<h1>Dynamic and Static Latent Variables</h1>
<img style="width: 80%;" src="./resources/tsne_results.jpeg"
alt="Results figure"/>
<br><br>
<p style="width: 80%;">
We provide a visualization of stochastic latent variables of the dynamic component on KTH using t-SNE. Here, we provide both the static and the dynamic components for a comparison.
As can be seen from the Figure, static variables on the right are more scattered and do not from clusters according to semantic classes as in the dynamic variables on the left (and in the main paper). This shows that our model can learn video dynamics according to semantic classes with separate modelling of the dynamic component.
</p>
<br><br>
<hr>
<h1>Paper</h1>
<div class="paper-thumbnail">
<a href="https://arxiv.org">
<img class="layered-paper-big" width="100%" src="./resources/paper.jpg" alt="Paper thumbnail"/>
</a>
</div>
<div class="paper-info"style="width: 70%;">
<h3>SLAMP: Stochastic Latent Appearance and Motion Prediction</h3>
<p>Adil Kaan Akan, Erkut Erdem, Aykut Erdem and Fatma Guney</p>
<p>In ICCV, 2021.</p>
<pre><code>@InProceedings{Akan2021ICCV,
author = {Akan, Adil Kaan and Erdem, Erkut and Erdem, Aykut and Guney, Fatma},
title = {SLAMP: Stochastic Latent Appearance and Motion Prediction},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2021},
pages = {14728-14737}
}
}</code></pre>
</div>
<br><br>
<hr>
<h1>Acknowledgements</h1>
<p style="width: 80%;">
We would like to thank <a href="https://mlia.lip6.fr/franceschi/" target="_blank">Jean-Yves Franceschi</a> and <a href="https://github.com/edouardelasalles" target="_blank">Edouard Delasalles</a> for providing technical and numerical details of the baseline performances,
<a href="http://www.denizyuret.com/" target="_blank">Deniz Yuret</a> and <a href="https://salihkaragoz.github.io/" target="_blank">Salih Karagoz</a> for helpful discussions and comments. Kaan Akan was supported by KUIS AI Center fellowship, Fatma Güney by TUBITAK 2232 International Fellowship for Outstanding Researchers Programme, Erkut Erdem in part by GEBIP 2018 Award of the Turkish Academy of Sciences, Aykut Erdem by BAGEP 2021 Award of the Science Academy.</a>.
</p>
<br><br>
</div>
</body>
</html>