-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate turn type dynamics #1
Conversation
Issue #8 is relevant here too — FTO (Floor Transfer Offset) is computed as part of the turn dynamics. Since there's already some windowed computation done —which needs checking by the way, as that code is not our production code— the number of participants in the prior 10s window should be known, which also means we know it's dyadic or triadic. |
scikit-talk/turndynamics.py
Outdated
# create a 'window' for each utterance | ||
# The window looks at 10s prior the begin of the current utterance (lookback) | ||
# Only turns that begin within this lookback are included | ||
# in the window. This means that if the prior turn began later | ||
# than 10s before the current utterance, then the prior turn is | ||
# not included in the window. | ||
def _createwindow(begin, participant): | ||
lookback = 10000 | ||
lookfwd = 0 | ||
filter = (df_transitions['begin'] >= (begin - lookback)) & (df_transitions['begin'] <= (begin + lookfwd)) | ||
window = df_transitions.loc[filter] | ||
# identify who produced the utterance | ||
window['turnby'] = np.where(window['participant'] == participant, 'self', | ||
'other') | ||
# calculate duration of all turns in window | ||
stretch = window['end'].max() - window['begin'].min() | ||
# calculate sum of all turn durations | ||
talk_all = window['duration'].sum() | ||
# calculate amount of talk produced by the participant in relation | ||
# to the total amount of talk in the window | ||
try: | ||
talk_rel = window.loc[window['turnby'] == 'self']['duration'].sum() / talk_all | ||
except ZeroDivisionError: | ||
talk_rel = pd.NA | ||
# calculate amount of loading of the channel | ||
# (1 = no empty space > overlap, < silences) | ||
load = talk_all / stretch | ||
# calculate total amount of turns in this time window | ||
turns_all = len(window.index) | ||
# calculate amount of turns by this participant relative to turns by others | ||
try: | ||
turns_rel = (len(window[window['turnby'] == 'self'].index)) / turns_all | ||
except ZeroDivisionError: | ||
turns_rel = pd.NA | ||
|
||
participants = window['participant'].nunique() | ||
# create list of all measures computed | ||
measures = [talk_all, talk_rel, load, turns_all, turns_rel, participants] | ||
return measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a first (untested) python port of my R 'windowed transitions' code. As it needs to run on every single row in the db (creating a 10s window and computing these summary measures for it) it's likely to be quite a drag on performance — optimization and modularization needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[also just testing the review mechanics — don't mind me if I'm doing this wrong)
SonarCloud Quality Gate failed. |
Here's some code that is based on trawling the diverse corpora. First a bit I use to set
Then some code I use to classify types of conduct other than talk (this should really not be hardcoded, ideally)
Then some code I use to generate
|
Also this reminds me that we have two more
E.g. below you can see that the Debatable whether this is truly an utterance feature, but we do use it for a lot of things, e.g. to pull up the most frequent utterance formats in a language, identify 'streaks' of similarly frequent utterances, etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @bvreede, I have left mostly minor questions and remarks.
Let me know if you want to further discuss any of the comments.
@@ -1,11 +1,12 @@ | |||
import warnings | |||
from typing import Optional | |||
from .utterance import Utterance | |||
from .write.writer import Writer | |||
|
|||
|
|||
class Conversation(Writer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A minor one: from a semantics and OOP perspective, I'm having a bit of a hard time imagining what/how a conversation can inherit from a writer. Is there any special reason that the base class is call writer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this question, and I hope my answer makes sense, this is all quite new to me too!
Writer
is not a proper parent class, but a class that is used to collect writer functionality that can be used by different objects. Is this weird architecture? Please let me know 😅 CC @carschno
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fully understand the conceptual doubts, as Writer
really does not make sense as a parent class for a Conversation
. However, the Writer
here takes the role of a MixIn in Python terminology (similar to an Interface in other languages). Python does not make a technical distinction between a class serving as a parent class in the strict sense, and a class serving as a MixIn.
The distinction is typically visible by the order of inheritance, where the parent class is the first, followed by one or multiple MixIns (e.g. Conversation(AbstractConversation, Writer)
). In this case, however, Conversation
does not have a parent class in the common sense, but only a MixIn.
Specifically, the Writer
class enables other classes to inherit and/or override common serialization methods -- for writing, in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice explanation @bvreede and @carschno! I didn't really check the other modules/classes defined outside this PR, so mostly I missed the context there. But since you called it a MixIn, I'm expecting to either see some other similar classes provided as optional features to be inherited, or classes rather than Covnersation
who will also use this particular feature (and maybe some other features). Otherwise, making Writer
a MixIn may not make a lot of sense (at least to myself). But I also understand that this can be a future work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation @carschno! I didn't know about MixIns yet, but having a separate class that is used to provide some default methods here is logical to me. We indeed have another class that inherits from Writer
: Corpus
. There is some common functionality that we want to keep DRY as both Conversation
and Corpus
are objects that e.g. will be saved as json (#21) and csv (#47).
Co-authored-by: Ji Qi <[email protected]>
Co-authored-by: Ji Qi <[email protected]>
Co-authored-by: Ji Qi <[email protected]>
@bvreede some belated responses to this
So my thinking at the time was that folks might want to choose between the most conversative measure versus a slightly looser one. And for ElPaCo we went with the looser one:
|
Kudos, SonarCloud Quality Gate passed! |
review comments were used to provide clarifications
Adding utterance properties:
nwords
)nchar
)FTO
)Overlap propertiesmoved to Overlap calculations #46Closes #5
Closes #8
Closes #42