forked from rtcweb-wg/jsep
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathdraft-ietf-rtcweb-jsep.xml
6158 lines (5391 loc) · 289 KB
/
draft-ietf-rtcweb-jsep.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="us-ascii"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?xml-stylesheet type="text/xsl" href="rfc2629.xslt" ?>
<?rfc toc="yes" ?>
<?rfc symrefs="yes" ?>
<?rfc iprnotified="no" ?>
<?rfc strict="yes" ?>
<?rfc compact="yes" ?>
<?rfc sortrefs="yes" ?>
<?rfc colonspace="yes" ?>
<?rfc rfcedstyle="no" ?>
<?rfc docmapping="yes" ?>
<?rfc tocdepth="4"?>
<rfc category="std" docName="draft-ietf-rtcweb-jsep-latest"
ipr="trust200902">
<front>
<title abbrev="JSEP">Javascript Session Establishment
Protocol</title>
<author fullname="Justin Uberti" initials="J."
surname="Uberti">
<organization>Google</organization>
<address>
<postal>
<street>747 6th St S</street>
<city>Kirkland</city>
<region>WA</region>
<code>98033</code>
<country>USA</country>
</postal>
<email>[email protected]</email>
</address>
</author>
<author fullname="Cullen Jennings" initials="C."
surname="Jennings">
<organization>Cisco</organization>
<address>
<postal>
<street>170 West Tasman Drive</street>
<city>San Jose</city>
<region>CA</region>
<code>95134</code>
<country>USA</country>
</postal>
<email>[email protected]</email>
</address>
</author>
<author fullname="Eric Rescorla" initials="E.K."
surname="Rescorla" role="editor">
<organization>Mozilla</organization>
<address>
<postal>
<street>331 Evelyn Ave</street>
<city>Mountain View</city>
<region>CA</region>
<code>94041</code>
<country>USA</country>
</postal>
<email>[email protected]</email>
</address>
</author>
<date day="25" month="March" year="2016" />
<area>RAI</area>
<abstract>
<t>This document describes the mechanisms for allowing a
Javascript application to control the signaling plane of a
multimedia session via the interface specified in the W3C
RTCPeerConnection API, and discusses how this relates to
existing signaling protocols.</t>
</abstract>
</front>
<middle>
<section title="Introduction" anchor="sec.introduction">
<t>This document describes how the W3C WEBRTC
RTCPeerConnection interface <xref
target="W3C.WD-webrtc-20140617"></xref> is used to control
the setup, management and teardown of a multimedia
session.</t>
<section title="General Design of JSEP"
anchor="sec.general-design-of-jsep">
<t>The thinking behind WebRTC call setup has been to
fully specify and control the media plane, but to leave
the signaling plane up to the application as much as
possible. The rationale is that different applications
may prefer to use different protocols, such as the
existing SIP or Jingle call signaling protocols, or
something custom to the particular application, perhaps
for a novel use case. In this approach, the key
information that needs to be exchanged is the multimedia
session description, which specifies the necessary
transport and media configuration information necessary
to establish the media plane.</t>
<t>With these considerations in mind, this document
describes the Javascript Session Establishment Protocol
(JSEP) that allows for full control of the signaling
state machine from Javascript. JSEP removes the browser
almost entirely from the core signaling flow, which is
instead handled by the Javascript making use of two
interfaces: (1) passing in local and remote session
descriptions and (2) interacting with the ICE state
machine.</t>
<t>In this document, the use of JSEP is described as if
it always occurs between two browsers. Note though in
many cases it will actually be between a browser and
some kind of server, such as a gateway or MCU. This
distinction is invisible to the browser; it just follows
the instructions it is given via the API.</t>
<t>JSEP's handling of session descriptions is simple and
straightforward. Whenever an offer/answer exchange is
needed, the initiating side creates an offer by calling
a createOffer() API. The application optionally modifies
that offer, and then uses it to set up its local config
via the setLocalDescription() API. The offer is then
sent off to the remote side over its preferred signaling
mechanism (e.g., WebSockets); upon receipt of that
offer, the remote party installs it using the
setRemoteDescription() API.</t>
<t>To complete the offer/answer exchange, the remote
party uses the createAnswer() API to generate an
appropriate answer, applies it using the
setLocalDescription() API, and sends the answer back to
the initiator over the signaling channel. When the
initiator gets that answer, it installs it using the
setRemoteDescription() API, and initial setup is
complete. This process can be repeated for additional
offer/answer exchanges.</t>
<t>Regarding ICE <xref target="RFC5245"></xref>, JSEP
decouples the ICE state machine from the overall
signaling state machine, as the ICE state machine must
remain in the browser, because only the browser has the
necessary knowledge of candidates and other transport
info. Performing this separation also provides
additional flexibility; in protocols that decouple
session descriptions from transport, such as Jingle, the
session description can be sent immediately and the
transport information can be sent when available. In
protocols that don't, such as SIP, the information can
be used in the aggregated form. Sending transport
information separately can allow for faster ICE and DTLS
startup, since ICE checks can start as soon as any
transport information is available rather than waiting
for all of it.</t>
<t>Through its abstraction of signaling, the JSEP
approach does require the application to be aware of the
signaling process. While the application does not need
to understand the contents of session descriptions to
set up a call, the application must call the right APIs
at the right times, convert the session descriptions and
ICE information into the defined messages of its chosen
signaling protocol, and perform the reverse conversion
on the messages it receives from the other side.</t>
<t>One way to mitigate this is to provide a Javascript
library that hides this complexity from the developer;
said library would implement a given signaling protocol
along with its state machine and serialization code,
presenting a higher level call-oriented interface to the
application developer. For example, libraries exist to
adapt the JSEP API into an API suitable for a SIP or
XMPP. Thus, JSEP provides greater control for the
experienced developer without forcing any additional
complexity on the novice developer.</t>
</section>
<section title="Other Approaches Considered"
anchor="sec.other-approaches-consider">
<t>One approach that was considered instead of JSEP was
to include a lightweight signaling protocol. Instead of
providing session descriptions to the API, the API would
produce and consume messages from this protocol. While
providing a more high-level API, this put more control
of signaling within the browser, forcing the browser to
have to understand and handle concepts like signaling
glare. In addition, it prevented the application from
driving the state machine to a desired state, as is
needed in the page reload case.</t>
<t>A second approach that was considered but not chosen
was to decouple the management of the media control
objects from session descriptions, instead offering APIs
that would control each component directly. This was
rejected based on a feeling that requiring exposure of
this level of complexity to the application programmer
would not be beneficial; it would result in an API where
even a simple example would require a significant amount
of code to orchestrate all the needed interactions, as
well as creating a large API surface that needed to be
agreed upon and documented. In addition, these API
points could be called in any order, resulting in a more
complex set of interactions with the media subsystem
than the JSEP approach, which specifies how session
descriptions are to be evaluated and applied.</t>
<t>One variation on JSEP that was considered was to keep
the basic session description-oriented API, but to move
the mechanism for generating offers and answers out of
the browser. Instead of providing
createOffer/createAnswer methods within the browser,
this approach would instead expose a getCapabilities API
which would provide the application with the information
it needed in order to generate its own session
descriptions. This increases the amount of work that the
application needs to do; it needs to know how to
generate session descriptions from capabilities, and
especially how to generate the correct answer from an
arbitrary offer and the supported capabilities. While
this could certainly be addressed by using a library
like the one mentioned above, it basically forces the
use of said library even for a simple example.
Providing createOffer/createAnswer avoids this problem,
but still allows applications to generate their own
offers/answers (to a large extent) if they choose, using
the description generated by createOffer as an
indication of the browser's capabilities.</t>
</section>
</section>
<section title="Terminology" anchor="sec.terminology">
<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL",
"SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY",
and "OPTIONAL" in this document are to be interpreted as
described in <xref target="RFC2119"></xref>.</t>
</section>
<section title="Semantics and Syntax"
anchor="sec.semantics-and-syntax">
<section title="Signaling Model"
anchor="sec.signaling-model">
<t>JSEP does not specify a particular signaling model or
state machine, other than the generic need to exchange
session descriptions in the fashion described by <xref
target="RFC3264"></xref>(offer/answer) in order for both
sides of the session to know how to conduct the
session. JSEP provides mechanisms to create offers and
answers, as well as to apply them to a session.
However, the browser is totally decoupled from the
actual mechanism by which these offers and answers are
communicated to the remote side, including addressing,
retransmission, forking, and glare handling. These
issues are left entirely up to the application; the
application has complete control over which offers and
answers get handed to the browser, and when.</t>
<figure anchor="fig-sigModel"
title="JSEP Signaling Model">
<artwork>
<![CDATA[
+-----------+ +-----------+
| Web App |<--- App-Specific Signaling -->| Web App |
+-----------+ +-----------+
^ ^
| SDP | SDP
V V
+-----------+ +-----------+
| Browser |<----------- Media ------------>| Browser |
+-----------+ +-----------+
]]>
</artwork>
</figure>
</section>
<section title="Session Descriptions and State Machine"
anchor="sec.session-descriptions-and-">
<t>In order to establish the media plane, the user agent
needs specific parameters to indicate what to transmit
to the remote side, as well as how to handle the media
that is received. These parameters are determined by the
exchange of session descriptions in offers and answers,
and there are certain details to this process that must
be handled in the JSEP APIs.</t>
<t>Whether a session description applies to the local
side or the remote side affects the meaning of that
description. For example, the list of codecs sent to a
remote party indicates what the local side is willing to
receive, which, when intersected with the set of codecs
the remote side supports, specifies what the remote side
should send. However, not all parameters follow this
rule; for example, the DTLS-SRTP parameters <xref
target="RFC5763"></xref> sent to a remote party indicate
what certificate the local side will use in DTLS setup,
and thereby what the remote party should expect to
receive; the remote party will have to accept these
parameters, with no option to choose different
values.</t>
<t>In addition, various RFCs put different conditions on
the format of offers versus answers. For example, an
offer may propose an arbitrary number of media streams
(i.e. m= sections), but an answer must contain the exact
same number as the offer.</t>
<t>Lastly, while the exact media parameters are only
known only after an offer and an answer have been
exchanged, it is possible for the offerer to receive
media after they have sent an offer and before they have
received an answer. To properly process incoming media
in this case, the offerer's media handler must be aware
of the details of the offer before the answer
arrives.</t>
<t>Therefore, in order to handle session descriptions
properly, the user agent needs:
<list style="numbers">
<t>To know if a session description pertains to the
local or remote side.</t>
<t>To know if a session description is an offer or
an answer.</t>
<t>To allow the offer to be specified independently
of the answer.</t>
</list>
JSEP addresses this by adding both setLocalDescription
and setRemoteDescription methods and having session
description objects contain a type field indicating the
type of session description being supplied. This
satisfies the requirements listed above for both the
offerer, who first calls setLocalDescription(sdp
[offer]) and then later setRemoteDescription(sdp
[answer]), as well as for the answerer, who first calls
setRemoteDescription(sdp [offer]) and then later
setLocalDescription(sdp [answer]).</t>
<t>JSEP also allows for an answer to be treated as
provisional by the application. Provisional answers
provide a way for an answerer to communicate initial
session parameters back to the offerer, in order to
allow the session to begin, while allowing a final
answer to be specified later. This concept of a final
answer is important to the offer/answer model; when such
an answer is received, any extra resources allocated by
the caller can be released, now that the exact session
configuration is known. These "resources" can include
things like extra ICE components, TURN candidates, or
video decoders. Provisional answers, on the other hand,
do no such deallocation results; as a result, multiple
dissimilar provisional answers can be received and
applied during call setup.</t>
<t>In <xref target="RFC3264"></xref>, the constraint at
the signaling level is that only one offer can be
outstanding for a given session, but at the media stack
level, a new offer can be generated at any point. For
example, when using SIP for signaling, if one offer is
sent, then cancelled using a SIP CANCEL, another offer
can be generated even though no answer was received for
the first offer. To support this, the JSEP media layer
can provide an offer via the createOffer() method
whenever the Javascript application needs one for the
signaling. The answerer can send back zero or more
provisional answers, and finally end the offer-answer
exchange by sending a final answer. The state machine
for this is as follows:</t>
<t>
<figure anchor="fig-state-machine"
title="JSEP State Machine">
<artwork>
<![CDATA[
setRemote(OFFER) setLocal(PRANSWER)
/-----\ /-----\
| | | |
v | v |
+---------------+ | +---------------+ |
| |----/ | |----/
| | setLocal(PRANSWER) | |
| Remote-Offer |------------------- >| Local-Pranswer|
| | | |
| | | |
+---------------+ +---------------+
^ | |
| | setLocal(ANSWER) |
setRemote(OFFER) | |
| V setLocal(ANSWER) |
+---------------+ |
| | |
| |<---------------------------+
| Stable |
| |<---------------------------+
| | |
+---------------+ setRemote(ANSWER) |
^ | |
| | setLocal(OFFER) |
setRemote(ANSWER) | |
| V |
+---------------+ +---------------+
| | | |
| | setRemote(PRANSWER) | |
| Local-Offer |------------------- >|Remote-Pranswer|
| | | |
| |----\ | |----\
+---------------+ | +---------------+ |
^ | ^ |
| | | |
\-----/ \-----/
setLocal(OFFER) setRemote(PRANSWER)
]]>
</artwork>
</figure>
</t>
<t>Aside from these state transitions there is no other
difference between the handling of provisional
("pranswer") and final ("answer") answers.</t>
</section>
<section title="Session Description Format"
anchor="sec.session-description-forma">
<t>In the WebRTC specification, session descriptions are
formatted as SDP messages. While this format is not
optimal for manipulation from Javascript, it is widely
accepted, and frequently updated with new features. Any
alternate encoding of session descriptions would have to
keep pace with the changes to SDP, at least until the
time that this new encoding eclipsed SDP in
popularity. As a result, JSEP currently uses SDP as the
internal representation for its session
descriptions.</t>
<t>However, to simplify Javascript processing, and
provide for future flexibility, the SDP syntax is
encapsulated within a SessionDescription object, which
can be constructed from SDP, and be serialized out to
SDP. If future specifications agree on a JSON format for
session descriptions, we could easily enable this object
to generate and consume that JSON.</t>
<t>Other methods may be added to SessionDescription in
the future to simplify handling of SessionDescriptions
from Javascript. In the meantime, Javascript libraries
can be used to perform these manipulations.</t>
<t>Note that most applications should be able to treat
the SessionDescriptions produced and consumed by these
various API calls as opaque blobs; that is, the
application will not need to read or change them.</t>
</section>
<section title="Session Description Control"
anchor="sec.session-description-ctrl">
<t>In order to give the application control over various
common session parameters, JSEP provides control
surfaces which tell the browser how to generate session
descriptions. This avoids the need for Javascript to
modify session descriptions in most cases.</t>
<t>Changes to these objects result in changes to the
session descriptions generated by subsequent
createOffer/Answer calls.</t>
<section title="RtpTransceivers"
anchor="sec.rtptransceivers">
<t>RtpTransceivers allow the application to control
the RTP media associated with one m= section. Each
RtpTransceiver has an RtpSender and an RtpReceiver,
which an application can use to control the sending
and receiving of RTP media. The application may also
modify the RtpTransceiver directly, for instance, by
stopping it.</t>
<t>RtpTransceivers generally have a 1:1 mapping with
m= sections, although there may be more
RtpTransceivers than m= sections when
RtpTransceivers are created but not yet associated
with a m= section, or if RtpTransceivers have been
stopped and disassociated from m= sections. An
RtpTransceiver is never associated with more than
one m= section, and once a session description is
applied, a m= section is always associated with
exactly one RtpTransceiver.</t>
<t>RtpTransceivers can be created explicitly by the
application or implicitly by calling
setRemoteDescription with an offer that adds new m=
sections.</t>
</section>
<section title="RtpSenders" anchor="sec.rtpsenders">
<t>RtpSenders allow the application to control how
RTP media is sent. In particular, the application
can control whether an RtpSender is active or not,
which affects the directionality attribute of the
associated m= section.</t>
</section>
<section title="RtpReceivers"
anchor="sec.rtpreceivers">
<t>RtpReceivers allows the application to control
how RTP media is received. In particular, the
application can control whether an RtpReceiver is
active or not, which affects the directionality
attribute of the associated m= section.</t>
</section>
</section>
<section title="ICE" anchor="sec.ice">
<section title="ICE Gathering Overview"
anchor="sec.ice-gather-overview">
<t>JSEP gathers ICE candidates as needed by the
application. Collection of ICE candidates is
referred to as a gathering phase, and this is
triggered either by the addition of a new or
recycled m= line to the local session description,
or new ICE credentials in the description,
indicating an ICE restart. Use of new ICE
credentials can be triggered explicitly by the
application, or implicitly by the browser in
response to changes in the ICE configuration.</t>
<t>When the ICE configuration changes in a way that
requires a new gathering phase, a
'needs-ice-restart' bit is set. When this bit is
set, calls to the createOffer API will generate new
ICE credentials. This bit is cleared by a call to
the setLocalDescription API with new ICE credentials
from either an offer or an answer, i.e., from either
a local- or remote-initiated ICE restart.</t>
<t>When a new gathering phase starts, the ICE Agent
will notify the application that gathering is
occurring through an event. Then, when each new ICE
candidate becomes available, the ICE Agent will
supply it to the application via an additional
event; these candidates will also automatically be
added to the current and/or pending local session
description. Finally, when all candidates have been
gathered, an event will be dispatched to signal that
the gathering process is complete.</t>
<t>Note that gathering phases only gather the
candidates needed by new/recycled/restarting m=
lines; other m= lines continue to use their existing
candidates. Also, when bundling is active,
candidates are only gathered (and exchanged) for the
m= lines referenced in BUNDLE-tags, as described in
<xref
target="I-D.ietf-mmusic-sdp-bundle-negotiation"
/>.</t>
</section>
<section title="ICE Candidate Trickling"
anchor="sec.ice-candidate-trickling">
<t>Candidate trickling is a technique through which
a caller may incrementally provide candidates to the
callee after the initial offer has been dispatched;
the semantics of "Trickle ICE" are defined in <xref
target="I-D.ietf-ice-trickle"></xref>. This process
allows the callee to begin acting upon the call and
setting up the ICE (and perhaps DTLS) connections
immediately, without having to wait for the caller
to gather all possible candidates. This results in
faster media setup in cases where gathering is not
performed prior to initiating the call.</t>
<t>JSEP supports optional candidate trickling by
providing APIs, as described above, that provide
control and feedback on the ICE candidate gathering
process. Applications that support candidate
trickling can send the initial offer immediately
and send individual candidates when they get the
notified of a new candidate; applications that do
not support this feature can simply wait for the
indication that gathering is complete, and then
create and send their offer, with all the
candidates, at this time.</t>
<t>Upon receipt of trickled candidates, the
receiving application will supply them to its ICE
Agent. This triggers the ICE Agent to start using
the new remote candidates for connectivity
checks.</t>
<section title="ICE Candidate Format"
anchor="sec.ice-candidate-format">
<t>As with session descriptions, the syntax of
the IceCandidate object provides some
abstraction, but can be easily converted to and
from the SDP candidate lines.</t>
<t>The candidate lines are the only SDP
information that is contained within
IceCandidate, as they represent the only
information needed that is not present in the
initial offer (i.e., for trickle candidates).
This information is carried with the same
syntax as the "candidate-attribute" field
defined for ICE. For example:</t>
<figure>
<artwork>
<![CDATA[
candidate:1 1 UDP 1694498815 192.0.2.33 10000 typ host
]]>
</artwork>
</figure>
<t>The IceCandidate object also contains fields
to indicate which m= line it should be
associated with. The m= line can be identified
in one of two ways; either by a m= line index,
or a MID. The m= line index is a zero-based
index, with index N referring to the N+1th m=
line in the SDP sent by the entity which sent
the IceCandidate. The MID uses the "media stream
identification" attribute, as defined in <xref
target="RFC5888"></xref>, Section 4, to identify
the m= line. JSEP implementations creating an
ICE Candidate object MUST populate both of these
fields. Implementations receiving an ICE
Candidate object MUST use the MID if present, or
the m= line index, if not (as it could have come
from a non-JSEP endpoint).</t>
</section>
</section>
<section title="ICE Candidate Policy"
anchor="sec.ice-candidate-policy">
<t>Typically, when gathering ICE candidates, the
browser will gather all possible forms of initial
candidates - host, server reflexive, and relay.
However, in certain cases, applications may want to
have more specific control over the gathering
process, due to privacy or related concerns. For
example, one may want to suppress the use of host
candidates, to avoid exposing information about the
local network, or go as far as only using relay
candidates, to leak as little location information
as possible (note that these choices come with
corresponding operational costs). To accomplish
this, the browser MUST allow the application to
restrict which ICE candidates are used in a
session. Note that this filtering is applied on top
of any restrictions the browser chooses to enforce
regarding which IP addresses are permitted for the
application, as discussed in <xref
target="I-D.ietf-rtcweb-ip-handling" />.</t>
<t>There may also be cases where the application
wants to change which types of candidates are used
while the session is active. A prime example is
where a callee may initially want to use only relay
candidates, to avoid leaking location information
to an arbitrary caller, but then change to use all
candidates (for lower operational cost) once the
user has indicated they want to take the call. For
this scenario, the browser MUST allow the candidate
policy to be changed in mid-session, subject to the
aforementioned interactions with local policy.</t>
<t>To administer the ICE candidate policy, the
browser will determine the current setting at the
start of each gathering phase. Then, during the
gathering phase, the browser MUST NOT expose
candidates disallowed by the current policy to the
application, use them as the source of connectivity
checks, or indirectly expose them via other fields,
such as the raddr/rport attributes for other ICE
candidates. Later, if a different policy is
specified by the application, the application can
apply it by kicking off a new gathering phase via
an ICE restart.</t>
</section>
<section title="ICE Candidate Pool"
anchor="sec.ice-candidate-pool">
<t>JSEP applications typically inform the browser
to begin ICE gathering via the information supplied
to setLocalDescription, as this is where the app
specifies the number of media streams, and thereby
ICE components, for which to gather candidates.
However, to accelerate cases where the application
knows the number of ICE components to use ahead of
time, it may ask the browser to gather a pool of
potential ICE candidates to help ensure rapid media
setup.</t>
<t>When setLocalDescription is eventually called,
and the browser goes to gather the needed ICE
candidates, it SHOULD start by checking if any
candidates are available in the pool. If there are
candidates in the pool, they SHOULD be handed to
the application immediately via the ICE candidate
event. If the pool becomes depleted, either because
a larger-than-expected number of ICE components is
used, or because the pool has not had enough time
to gather candidates, the remaining candidates are
gathered as usual.</t>
<t>One example of where this concept is useful is
an application that expects an incoming call at
some point in the future, and wants to minimize the
time it takes to establish connectivity, to avoid
clipping of initial media. By pre-gathering
candidates into the pool, it can exchange and start
sending connectivity checks from these candidates
almost immediately upon receipt of a call. Note
though that by holding on to these pre-gathered
candidates, which will be kept alive as long as
they may be needed, the application will consume
resources on the STUN/TURN servers it is using.</t>
</section>
</section>
<section anchor="sec.imageattr"
title="Video Size Negotiation">
<t>Video size negotiation is the process through which a
receiver can use the "a=imageattr" SDP attribute <xref
target="RFC6236" /> to indicate what video frame sizes
it is capable of receiving. A receiver may have hard
limits on what its video decoder can process, or it may
wish to constrain what it receives due to application
preferences, e.g. a specific size for the window in
which the video will be displayed.</t>
<section title="Creating an imageattr Attribute">
<t>In order to determine the limits on what video
resolution a receiver wants to receive, it will
intersect its decoder hard limits with any
mandatory constraints that have been applied to the
associated MediaStreamTrack. If the decoder limits
are unknown, e.g. when using a software decoder,
the mandatory constraints are used directly. For
the answerer, these mandatory constraints can be
applied to the remote MediaStreamTracks that are
created by a setRemoteDescription call, and will
affect the output of the ensuing createAnswer call.
Any constraints set after setLocalDescription is
used to set the answer will result in a new
offer-answer exchange. For the offerer, because it
does not know about any remote MediaStreamTracks
until it receives the answer, the offer can only
reflect decoder hard limits. If the offerer wishes
to set mandatory constraints on video resolution,
it must do so after receiving the answer, and the
result will be a new offer-answer to communicate
them.</t>
<t>If there are no known decoder limits or
mandatory constraints, the "a=imageattr" attribute
SHOULD be omitted.</t>
<t>Otherwise, an "a=imageattr" attribute is created
with "recv" direction, and the resulting resolution
space formed by intersecting the decoder limits and
constraints is used to specify its minimum and
maximum x= and y= values. If the intersection is
the null set, i.e., there are no resolutions that
are permitted by both the decoder and the mandatory
constraints, this SHOULD be represented by x=0 and
y=0 values.</t>
<t>The rules here express a single set of
preferences, and therefore, the "a=imageattr" q=
value is not important. It SHOULD be set to
1.0.</t>
<t>The "a=imageattr" field is payload type
specific. When all video codecs supported have the
same capabilities, use of a single attribute, with
the wildcard payload type (*), is RECOMMENDED.
However, when the supported video codecs have
differing capabilities, specific "a=imageattr"
attributes MUST be inserted for each payload
type.</t>
<t>As an example, consider a system with a
HD-capable, multiformat video decoder, where the
application has constrained the received track to
at most 360p. In this case, the implementation
would generate this attribute:</t>
<t>a=imageattr:* recv
[x=[16:640],y=[16:360],q=1.0]</t>
<t>This declaration indicates that the receiver is
capable of decoding any image resolution from 16x16
up to 640x360 pixels.</t>
</section>
<section title="Interpreting an imageattr Attribute">
<t><xref target="RFC6236" /> defines "a=imageattr"
to be an advisory field. This means that it does not
absolutely constrain the video formats that the
sender can use, but gives an indication of the
preferred values.</t>
<t>This specification prescribes more specific
behavior. When a sender of a given
MediaStreamTrack, which is producing video of a
certain resolution, receives an "a=imageattr recv"
attribute, it MUST check to see if the original
resolution meets the size criteria specified in the
attribute, and adapt the resolution accordingly by
scaling (if appropriate). Note that when
considering a MediaStreamTrack that is producing
rotated video, the unrotated resolution MUST be
used. This is required regardless of whether the
receiver supports performing receive-side rotation
(e.g., through CVO), as it significantly simplifies
the matching logic.</t>
<t>For an "a=imageattr recv" attribute, only size
limits are considered. Any other values, e.g.
aspect ratio, MUST be ignored.</t>
<t>When communicating with a non-JSEP endpoint,
multiple relevant "a=imageattr recv" attributes may
be received. If this occurs, attributes other than
the one with the highest "q=" value MUST be
ignored.</t>
<t>If an "a=imageattr recv" attribute references a
different video codec than what has been selected
for the MediaStreamTrack, it MUST be ignored.</t>
<t>If the original resolution matches the size
limits in the attribute, the track MUST be
transmitted untouched.</t>
<t>If the original resolution exceeds the size
limits in the attribute, the sender SHOULD apply
downscaling to the output of the MediaStreamTrack
in order to satisfy the limits. Downscaling MUST
NOT change the track aspect ratio.</t>
<t>If the original resolution is less than the size
limits in the attribute, upscaling is needed, but
this may not be appropriate in all cases. To
address this concern, the application can set an
upscaling policy for each sent track. For this
case, if upscaling is permitted by policy, the
sender SHOULD apply upscaling in order to provide
the desired resolution. Otherwise, the sender MUST
NOT apply upscaling. The sender SHOULD NOT upscale
in other cases, even if the policy permits it.
Upscaling MUST NOT change the track aspect
ratio.</t>
<t>If there is no appropriate and permitted scaling
mechanism that allows the received size limits to
be satisfied, the sender MUST NOT transmit the
track.</t>
<t>In the special case of receiving a maximum
resolution of [0, 0], as described above, the
sender MUST NOT transmit the track.</t>
</section>
</section>
<section title="Interactions With Forking"
anchor="sec.interactions-with-forking">
<t>Some call signaling systems allow various types of
forking where an SDP Offer may be provided to more than
one device. For example, SIP <xref
target="RFC3261"></xref> defines both a "Parallel
Search" and "Sequential Search". Although these are
primarily signaling level issues that are outside the
scope of JSEP, they do have some impact on the
configuration of the media plane that is relevant. When
forking happens at the signaling layer, the Javascript
application responsible for the signaling needs to make
the decisions about what media should be sent or
received at any point of time, as well as which remote
endpoint it should communicate with; JSEP is used to
make sure the media engine can make the RTP and media
perform as required by the application. The basic
operations that the applications can have the media
engine do are:
<list style="symbols">
<t>Start exchanging media with a given remote peer,
but keep all the resources reserved in the
offer.</t>
<t>Start exchanging media with a given remote peer,
and free any resources in the offer that are not
being used.</t>
</list></t>
<section title="Sequential Forking"
anchor="sec.sequential-forking">
<t>Sequential forking involves a call being
dispatched to multiple remote callees, where each
callee can accept the call, but only one active
session ever exists at a time; no mixing of
received media is performed.</t>
<t>JSEP handles sequential forking well, allowing
the application to easily control the policy for
selecting the desired remote endpoint. When an
answer arrives from one of the callees, the
application can choose to apply it either as a
provisional answer, leaving open the possibility of
using a different answer in the future, or apply it
as a final answer, ending the setup flow.</t>
<t>In a "first-one-wins" situation, the first
answer will be applied as a final answer, and the
application will reject any subsequent answers. In
SIP parlance, this would be ACK + BYE.</t>
<t>In a "last-one-wins" situation, all answers
would be applied as provisional answers, and any
previous call leg will be terminated. At some
point, the application will end the setup process,
perhaps with a timer; at this point, the
application could reapply the pending remote
description as a final answer.</t>
</section>
<section title="Parallel Forking"
anchor="sec.parallel-forking">
<t>Parallel forking involves a call being dispatched
to multiple remote callees, where each callee can
accept the call, and multiple simultaneous active
signaling sessions can be established as a
result. If multiple callees send media at the same
time, the possibilities for handling this are
described in Section 3.1 of <xref
target="RFC3960"></xref>. Most SIP devices today
only support exchanging media with a single device
at a time, and do not try to mix multiple early
media audio sources, as that could result in a
confusing situation. For example, consider having a
European ringback tone mixed together with the North
American ringback tone - the resulting sound would
not be like either tone, and would confuse the
user. If the signaling application wishes to only
exchange media with one of the remote endpoints at a
time, then from a media engine point of view, this
is exactly like the sequential forking case.</t>
<t>In the parallel forking case where the Javascript
application wishes to simultaneously exchange media
with multiple peers, the flow is slightly more
complex, but the Javascript application can follow
the strategy that <xref target="RFC3960"></xref>
describes using UPDATE. The UPDATE approach allows
the signaling to set up a separate media flow for
each peer that it wishes to exchange media with. In
JSEP, this offer used in the UPDATE would be formed
by simply creating a new PeerConnection and making
sure that the same local media streams have been
added into this new PeerConnection. Then the new
PeerConnection object would produce a SDP offer that
could be used by the signaling to perform the UPDATE
strategy discussed in <xref
target="RFC3960"></xref>.</t>
<t>As a result of sharing the media streams, the
application will end up with N parallel
PeerConnection sessions, each with a local and
remote description and their own local and remote
addresses. The media flow from these sessions can
be managed by specifying SDP direction attributes
in the descriptions, or the application can choose
to play out the media from all sessions mixed
together. Of course, if the application wants to
only keep a single session, it can simply terminate
the sessions that it no longer needs.</t>
</section>
</section>
</section>
<section title="Interface" anchor="sec.interface">
<t>This section details the basic operations that must be