-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate turn type dynamics #1
Changes from 46 commits
8dbd860
477b7e9
214a7b3
2d1a6f0
d2b11b0
6289d45
da8ef4d
1f77d4d
f76aaba
3736596
99a18d2
808e3e2
be49481
5d8862b
8d55ced
ea95de3
bf1bb60
4edab6b
31a98d3
642c941
308ab94
d5c62f9
12a3d3a
1fb7323
51f1a9e
f7db0f6
47dc536
aaff48a
7ee747f
c0593cb
319ca96
b8ca53c
6a34489
c5c3430
7802063
3bbeb00
cc91bbc
61a1950
f79b5ea
d4c9880
20c65db
00ad82a
90c4ac6
69912f6
d62a868
9229e8a
574a729
473e753
e076b6c
3af95fc
4c7ac0c
ada29ce
13829bb
2f64c94
b1e096f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -1,11 +1,12 @@ | ||||||||||||
import warnings | ||||||||||||
from typing import Optional | ||||||||||||
from .utterance import Utterance | ||||||||||||
from .write.writer import Writer | ||||||||||||
|
||||||||||||
|
||||||||||||
class Conversation(Writer): | ||||||||||||
def __init__( | ||||||||||||
self, utterances: list["Utterance"], metadata: dict = None # noqa: F821 | ||||||||||||
self, utterances: list["Utterance"], metadata: Optional[dict] = None, suppress_warnings: bool = False # noqa: F821 | ||||||||||||
) -> None: | ||||||||||||
"""Representation of a transcribed conversation | ||||||||||||
|
||||||||||||
|
@@ -26,7 +27,7 @@ def __init__( | |||||||||||
if not isinstance(utterance, Utterance): | ||||||||||||
raise TypeError(errormsg) | ||||||||||||
# The list can be empty. This would be weird and the user needs to be warned. | ||||||||||||
if not self._utterances: | ||||||||||||
if not self._utterances and not suppress_warnings: | ||||||||||||
warnings.warn( | ||||||||||||
"This conversation appears to be empty: no Utterances are read.") | ||||||||||||
|
||||||||||||
|
@@ -68,3 +69,169 @@ def asdict(self): | |||||||||||
dict: dictionary containing Conversation metadata and Utterances | ||||||||||||
""" | ||||||||||||
return self._metadata | {"Utterances": [u.asdict() for u in self._utterances]} | ||||||||||||
|
||||||||||||
def _subconversation_by_index(self, | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A function defined but never used. |
||||||||||||
index: int, | ||||||||||||
before: int = 0, | ||||||||||||
after: Optional[int] = None) -> "Conversation": | ||||||||||||
"""Select utterances to provide context as a sub-conversation | ||||||||||||
|
||||||||||||
Args: | ||||||||||||
index (int): The index of the utterance for which to provide context | ||||||||||||
before (int, optional): The number of utterances prior to indicated utterance. Defaults to 0. | ||||||||||||
after (int, optional): The number of utterances after the indicated utterance. Defaults to None, | ||||||||||||
which then assumes the same value as `before`. | ||||||||||||
|
||||||||||||
Raises: | ||||||||||||
IndexError: Index provided must be within range of utterances | ||||||||||||
|
||||||||||||
Returns: | ||||||||||||
Conversation: Conversation object without metadata, containing a reduced set of utterances | ||||||||||||
""" | ||||||||||||
if index < 0 or index >= len(self._utterances): | ||||||||||||
raise IndexError("Index out of range") | ||||||||||||
bvreede marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||
if after is None: | ||||||||||||
after = before | ||||||||||||
if index - before < 0: | ||||||||||||
before = index | ||||||||||||
if index + after + 1 > len(self._utterances): | ||||||||||||
after = len(self._utterances) - index - 1 | ||||||||||||
returned_utterances = self._utterances[index-before:index+after+1] | ||||||||||||
bvreede marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||
return Conversation(utterances=returned_utterances, suppress_warnings=True) | ||||||||||||
|
||||||||||||
def _subconversation_by_time(self, | ||||||||||||
index: int, | ||||||||||||
before: int = 0, | ||||||||||||
after: Optional[int] = None, | ||||||||||||
exclude_utterance_overlap: bool = False) -> "Conversation": | ||||||||||||
"""Select utterances to provide context as a sub-conversation | ||||||||||||
|
||||||||||||
Args: | ||||||||||||
index (int): The index of the utterance for which to provide context | ||||||||||||
before (int, optional): The time in ms preceding the utterance's begin. Defaults to 0. | ||||||||||||
after (int, optional): The time in ms following the utterance's end. Defaults to None, | ||||||||||||
which then assumes the same value as `before`. | ||||||||||||
exclude_utterance_overlap (bool, optional): If True, the duration of the | ||||||||||||
utterance itself is not used to identify overlapping utterances, and only | ||||||||||||
the window before or after the utterance is used. Defaults to False. | ||||||||||||
|
||||||||||||
Returns: | ||||||||||||
Conversation: Conversation object without metadata, containing a reduced set of utterances | ||||||||||||
""" | ||||||||||||
if index < 0 or index >= len(self._utterances): | ||||||||||||
raise IndexError("Index out of range") | ||||||||||||
bvreede marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||
if after is None: | ||||||||||||
after = before | ||||||||||||
try: | ||||||||||||
begin = self._utterances[index].time[0] - before | ||||||||||||
end = self._utterances[index].time[1] + after | ||||||||||||
if exclude_utterance_overlap and before == 0: # only overlap with window following utterance | ||||||||||||
begin = self._utterances[index].time[1] | ||||||||||||
elif exclude_utterance_overlap and after == 0: # only overlap with window preceding utterance | ||||||||||||
end = self._utterances[index].time[0] | ||||||||||||
returned_utterances = [ | ||||||||||||
u for u in self._utterances if self.overlap(begin, end, u.time) or u == self._utterances[index]] | ||||||||||||
except (TypeError, IndexError): | ||||||||||||
bvreede marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||
returned_utterances = [] | ||||||||||||
return Conversation(utterances=returned_utterances, suppress_warnings=True) | ||||||||||||
|
||||||||||||
def count_participants(self, except_none: bool = False) -> int: | ||||||||||||
"""Count the number of participants in a conversation | ||||||||||||
|
||||||||||||
Importantly: if one of the utterances has no participant, it is counted | ||||||||||||
as a separate participant (None). If you want to exclude these, set | ||||||||||||
`except_none` to True. | ||||||||||||
|
||||||||||||
Args: | ||||||||||||
except_none (bool, optional): if `True`, utterances without a participant are not counted. Defaults to `False`. | ||||||||||||
|
||||||||||||
Returns: | ||||||||||||
int: number of participants | ||||||||||||
""" | ||||||||||||
participants = [u.participant for u in self.utterances] | ||||||||||||
if except_none: | ||||||||||||
participants = [p for p in participants if p is not None] | ||||||||||||
Comment on lines
+161
to
+163
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To simplify:
Suggested change
|
||||||||||||
return len(set(participants)) | ||||||||||||
|
||||||||||||
def _update(self, field: str, values: list, **kwargs): | ||||||||||||
""" | ||||||||||||
Update the all utterances in the conversation with calculated values | ||||||||||||
|
||||||||||||
This function also stores relevant arguments in the Conversation metadata. | ||||||||||||
|
||||||||||||
Args: | ||||||||||||
field (str): field of the Utterance to update | ||||||||||||
values (list): list of values to update each utterance with | ||||||||||||
kwargs (dict): information about the calculation to store in the Conversation metadata | ||||||||||||
""" | ||||||||||||
if len(values) != len(self.utterances): | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A minor thing, but I would suggest using the protected variable instead:
Suggested change
|
||||||||||||
raise ValueError( | ||||||||||||
"The number of values must match the number of utterances") | ||||||||||||
metadata = {field: kwargs} | ||||||||||||
try: | ||||||||||||
self._metadata["Calculations"].update(metadata) | ||||||||||||
except KeyError: | ||||||||||||
self._metadata = {"Calculations": metadata} | ||||||||||||
Comment on lines
+181
to
+184
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To simplify:
Suggested change
|
||||||||||||
for index, utterance in enumerate(self.utterances): | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||
setattr(utterance, field, values[index]) | ||||||||||||
|
||||||||||||
def calculate_FTO(self, window: int = 10000, planning_buffer: int = 200, n_participants: int = 2): | ||||||||||||
"""Calculate Floor Transfer Offset (FTO) per utterance | ||||||||||||
|
||||||||||||
FTO is defined as the difference between the time that a turn starts and the | ||||||||||||
end of the most relevant prior turn by the other participant, which is not | ||||||||||||
necessarily the prior utterance. | ||||||||||||
|
||||||||||||
An utterance does not receive an FTO if there are preceding utterances | ||||||||||||
within the window that do not have timing information, or if it lacks | ||||||||||||
timing information itself. | ||||||||||||
|
||||||||||||
To be a relevant prior turn, the following conditions must be met, respective to utterance U: | ||||||||||||
- the utterance must be by another speaker than U | ||||||||||||
- the utterance by the other speaker must be the most recent utterance by that speaker | ||||||||||||
- the utterance must have started before utterance U, more than `planning_buffer` ms before. | ||||||||||||
- the utterance must be partly or entirely within the context window (`window` ms prior | ||||||||||||
to the start of utterance U) | ||||||||||||
- within the context window, there must be a maximum of `n_participants` speakers. | ||||||||||||
|
||||||||||||
Args: | ||||||||||||
window (int, optional): the time in ms prior to utterance in which a | ||||||||||||
relevant preceding utterance can be found. Defaults to 10000. | ||||||||||||
planning_buffer (int, optional): minimum speaking time in ms to allow for a response. | ||||||||||||
Defaults to 200. | ||||||||||||
n_participants (int, optional): maximum number of participants overlapping with | ||||||||||||
the utterance and preceding window. Defaults to 2. | ||||||||||||
""" | ||||||||||||
values = [] | ||||||||||||
for index, utterance in enumerate(self.utterances): | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||
sub = self._subconversation_by_time( | ||||||||||||
index=index, | ||||||||||||
before=window, | ||||||||||||
after=0, | ||||||||||||
exclude_utterance_overlap=True) | ||||||||||||
if not 2 <= sub.count_participants() <= n_participants: | ||||||||||||
values.append(None) | ||||||||||||
continue | ||||||||||||
potentials = [ | ||||||||||||
u for u in sub.utterances if utterance.relevant_for_fto(u, planning_buffer)] | ||||||||||||
try: | ||||||||||||
relevant = potentials[-1] | ||||||||||||
values.append(relevant.until(utterance)) | ||||||||||||
except IndexError: | ||||||||||||
values.append(None) | ||||||||||||
bvreede marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||
self._update("FTO", values, | ||||||||||||
window=window, | ||||||||||||
planning_buffer=planning_buffer, | ||||||||||||
n_participants=n_participants) | ||||||||||||
|
||||||||||||
@staticmethod | ||||||||||||
def overlap(begin: int, end: int, time: list): | ||||||||||||
# there is overlap if: | ||||||||||||
# time[0] falls between begin and end | ||||||||||||
# time[1] falls between and end | ||||||||||||
# time[0] is before begin and time[1] is after end | ||||||||||||
if time is None: | ||||||||||||
return False | ||||||||||||
if begin <= time[0] <= end or begin <= time[1] <= end: | ||||||||||||
return True | ||||||||||||
return time[0] <= begin and time[1] >= end | ||||||||||||
bvreede marked this conversation as resolved.
Show resolved
Hide resolved
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A minor one: from a semantics and OOP perspective, I'm having a bit of a hard time imagining what/how a conversation can inherit from a writer. Is there any special reason that the base class is call writer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this question, and I hope my answer makes sense, this is all quite new to me too!
Writer
is not a proper parent class, but a class that is used to collect writer functionality that can be used by different objects. Is this weird architecture? Please let me know 😅 CC @carschnoThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fully understand the conceptual doubts, as
Writer
really does not make sense as a parent class for aConversation
. However, theWriter
here takes the role of a MixIn in Python terminology (similar to an Interface in other languages). Python does not make a technical distinction between a class serving as a parent class in the strict sense, and a class serving as a MixIn.The distinction is typically visible by the order of inheritance, where the parent class is the first, followed by one or multiple MixIns (e.g.
Conversation(AbstractConversation, Writer)
). In this case, however,Conversation
does not have a parent class in the common sense, but only a MixIn.Specifically, the
Writer
class enables other classes to inherit and/or override common serialization methods -- for writing, in this case.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice explanation @bvreede and @carschno! I didn't really check the other modules/classes defined outside this PR, so mostly I missed the context there. But since you called it a MixIn, I'm expecting to either see some other similar classes provided as optional features to be inherited, or classes rather than
Covnersation
who will also use this particular feature (and maybe some other features). Otherwise, makingWriter
a MixIn may not make a lot of sense (at least to myself). But I also understand that this can be a future work.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation @carschno! I didn't know about MixIns yet, but having a separate class that is used to provide some default methods here is logical to me. We indeed have another class that inherits from
Writer
:Corpus
. There is some common functionality that we want to keep DRY as bothConversation
andCorpus
are objects that e.g. will be saved as json (#21) and csv (#47).