Generate turn type dynamics #1

bvreede · 2023-03-31T09:24:43Z

Adding utterance properties:

Number of words in an utterance (nwords)
Number of characters in an utterance (nchar)
list of words in the utterance
FTO; compound of (FTO)
- Dyadic conversation in window; boolean -- needs to be True to return an FTO
- FTO calculation
~~Overlap properties~~ moved to Overlap calculations #46

Closes #5
Closes #8
Closes #42

mdingemanse · 2023-04-02T21:53:05Z

Issue #8 is relevant here too — FTO (Floor Transfer Offset) is computed as part of the turn dynamics.

Since there's already some windowed computation done —which needs checking by the way, as that code is not our production code— the number of participants in the prior 10s window should be known, which also means we know it's dyadic or triadic.

mdingemanse · 2023-04-02T21:58:34Z

scikit-talk/turndynamics.py

+  # create a 'window' for each utterance
+  # The window looks at 10s prior the begin of the current utterance (lookback)
+  # Only turns that begin within this lookback are included
+  # in the window. This means that if the prior turn began later
+  # than 10s before the current utterance, then the prior turn is
+  # not included in the window.
+  def _createwindow(begin, participant):
+    lookback = 10000
+    lookfwd = 0
+    filter = (df_transitions['begin'] >= (begin - lookback)) & (df_transitions['begin'] <= (begin + lookfwd))
+    window = df_transitions.loc[filter]
+    # identify who produced the utterance
+    window['turnby'] = np.where(window['participant'] == participant, 'self',
+                                'other')
+    # calculate duration of all turns in window
+    stretch = window['end'].max() - window['begin'].min()
+    # calculate sum of all turn durations
+    talk_all = window['duration'].sum()
+    # calculate amount of talk produced by the participant in relation
+    # to the total amount of talk in the window
+    try:
+      talk_rel = window.loc[window['turnby'] == 'self']['duration'].sum() / talk_all
+    except ZeroDivisionError:
+      talk_rel = pd.NA
+    # calculate amount of loading of the channel
+    # (1 = no empty space > overlap, < silences)
+    load = talk_all / stretch
+    # calculate total amount of turns in this time window
+    turns_all = len(window.index)
+    # calculate amount of turns by this participant relative to turns by others
+    try:
+      turns_rel = (len(window[window['turnby'] == 'self'].index)) / turns_all
+    except ZeroDivisionError:
+      turns_rel = pd.NA
+
+    participants = window['participant'].nunique()
+    # create list of all measures computed
+    measures = [talk_all, talk_rel, load, turns_all, turns_rel, participants]
+    return measures


This is a first (untested) python port of my R 'windowed transitions' code. As it needs to run on every single row in the db (creating a 10s window and computing these summary measures for it) it's likely to be quite a drag on performance — optimization and modularization needed.

[also just testing the review mechanics — don't mind me if I'm doing this wrong)

sonarqubecloud · 2023-04-07T17:36:28Z

SonarCloud Quality Gate failed.

0 Bugs
0 Vulnerabilities
0 Security Hotspots
37 Code Smells

0.0% Coverage
0.0% Duplication

mdingemanse · 2023-11-03T14:49:13Z

Here's some code that is based on trawling the diverse corpora.

First a bit I use to set utterance to NA:

    na_strings <- c("[unk_utterance]","[unk_noise]","[distortion]","[background]",
                    "[background] M","[static]","untranscribed",
                    "[noise]","[inintel]","[distorted]","tlyam kanəw")

Then some code I use to classify types of conduct other than talk (this should really not be hardcoded, ideally)

    # add talk / other conduct classification
    d <- d %>%
      mutate(nature = case_when(
        utterance == "[laugh]" ~ "laugh",
        utterance == "[breath]" ~ "breath",
        utterance %in% c("[cough]","[sneeze]","[nod]","[blow]","[sigh]",
                         "[yawn]","[sniff]","[clearsthroat]","[lipsmack]",
                         "[inhales]","[groan]") ~ utterance,
        is.na(utterance) ~ as.character(NA),
        TRUE ~ "talk"
      ))

Then some code I use to generate nchar

    # create stripped and squished version of utterance and add length measures (note that
    # "\\W+" doesn't play well with unicode, so we count spaces)
    d <- d %>%
      mutate(utterance_stripped = gsub("\\[+[^()]*\\]+?", "", trimws(utterance))) %>%
      mutate(utterance_stripped = gsub("[\\(\\)]+", "", utterance_stripped)) %>%
      mutate(utterance_stripped = ifelse(utterance_stripped == "",NA,str_squish(utterance_stripped))) %>%
      mutate(
           nwords = str_count(utterance_stripped, " ") + 1, # crudely count spaces
           nchar = str_count(utterance_stripped)            # NOTE that nchar = bad for ideographs
           )

mdingemanse · 2023-11-03T14:54:03Z

Also this reminds me that we have two more utterance-level column that are really quite useful, but that relate to the corpus (and therefore often the language) as a whole:

rank : the rank of this utterance in the frequency distribution of utterances in this language
n : the number of tokens of this utterance type in this language

E.g. below you can see that the n of î (third row) is 53: there are 53 utterances like this in the language. Its rank is 1: it is the most frequently attested turn format in the language.

Debatable whether this is truly an utterance feature, but we do use it for a lot of things, e.g. to pull up the most frequent utterance formats in a language, identify 'streaks' of similarly frequent utterances, etc.

jiqicn

Hi @bvreede, I have left mostly minor questions and remarks.

Let me know if you want to further discuss any of the comments.

sktalk/corpus/conversation.py

sktalk/corpus/parsing/cha.py

tests/corpus/test_conversation.py

jiqicn · 2023-11-28T13:21:10Z

sktalk/corpus/conversation.py

@@ -1,11 +1,12 @@
 import warnings
+from typing import Optional
 from .utterance import Utterance
 from .write.writer import Writer


 class Conversation(Writer):


A minor one: from a semantics and OOP perspective, I'm having a bit of a hard time imagining what/how a conversation can inherit from a writer. Is there any special reason that the base class is call writer?

Thanks for this question, and I hope my answer makes sense, this is all quite new to me too!
Writer is not a proper parent class, but a class that is used to collect writer functionality that can be used by different objects. Is this weird architecture? Please let me know 😅 CC @carschno

I fully understand the conceptual doubts, as Writer really does not make sense as a parent class for a Conversation. However, the Writer here takes the role of a MixIn in Python terminology (similar to an Interface in other languages). Python does not make a technical distinction between a class serving as a parent class in the strict sense, and a class serving as a MixIn.
The distinction is typically visible by the order of inheritance, where the parent class is the first, followed by one or multiple MixIns (e.g. Conversation(AbstractConversation, Writer)). In this case, however, Conversation does not have a parent class in the common sense, but only a MixIn.

Specifically, the Writer class enables other classes to inherit and/or override common serialization methods -- for writing, in this case.

Nice explanation @bvreede and @carschno! I didn't really check the other modules/classes defined outside this PR, so mostly I missed the context there. But since you called it a MixIn, I'm expecting to either see some other similar classes provided as optional features to be inherited, or classes rather than Covnersation who will also use this particular feature (and maybe some other features). Otherwise, making Writer a MixIn may not make a lot of sense (at least to myself). But I also understand that this can be a future work.

Thanks for the explanation @carschno! I didn't know about MixIns yet, but having a separate class that is used to provide some default methods here is logical to me. We indeed have another class that inherits from Writer: Corpus. There is some common functionality that we want to keep DRY as both Conversation and Corpus are objects that e.g. will be saved as json (#21) and csv (#47).

Co-authored-by: Ji Qi <[email protected]>

mdingemanse · 2023-11-30T07:11:43Z

@bvreede some belated responses to this

you have access to that basecamp thread now
I agree that your Scenario 1 is perfectly sensible, and that Scenarios 2 and 3 are increasingly more questionable. However, there is nothing we can do about that without making another arbitrary determination of when an utterance stops being "short". This is why I introduced prior_by (link to basecamp comment). Quoting from my Basecamp thread:

this allows us to decide whether FTO should only be done when prior is by "other" (most conservative), versus perhaps also when prior row in db is by "self during other" (also reasonable)

So my thinking at the time was that folks might want to choose between the most conversative measure versus a slightly looser one. And for ElPaCo we went with the looser one:

next we deal with self during other. The difference is hard to see but now turns whose prior is a self-during-other are also getting an FTO, timed relative to the nearest prior turn by other (instead of being treated as a self-transition)

Co-authored-by: Ji Qi <[email protected]>

sonarqubecloud · 2023-12-01T13:38:10Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
1 Code Smell

93.3% Coverage
0.0% Duplication

review comments were used to provide clarifications

code from notebook

8dbd860

bvreede changed the title ~~code from notebook~~ Generate turn type dynamics Mar 31, 2023

mdingemanse previously requested changes Apr 2, 2023

View reviewed changes

bvreede added 4 commits April 7, 2023 15:44

merge main

477b7e9

add turndynamics code from notebook

214a7b3

instructions for disabling the bloody githook

2d1a6f0

move code from test to notebook

d2b11b0

bvreede added 3 commits November 2, 2023 12:28

Merge branch 'main' into first-functions

6289d45

remove notebook from repository

da8ef4d

start adding utterance functions

1f77d4d

bvreede added 16 commits November 3, 2023 18:50

add calculated fields to dataclass

f76aaba

add python 3.6 to show that it breaks

3736596

add python 3.8 to show that it breaks

99a18d2

add python 3.6 to show that it breaks

808e3e2

add python 3.8 to show that it breaks

be49481

remove earlier python versions again

5d8862b

add fields to initial data class

8d55ced

implement until method calculating time differences between utterances

ea95de3

inmplement subconversation and until next method at conversation object

bf1bb60

autopep8

4edab6b

elaborate subconversation

31a98d3

move time processing to utterance

642c941

object oriented and small fixes

308ab94

refactor post init

d5c62f9

add subconversation functionaliry

12a3d3a

subconversation can select based on index or time

1fb7323

bvreede mentioned this pull request Nov 21, 2023

Overlap calculations #46

Open

bvreede added 8 commits November 21, 2023 23:17

calculations update in metadata corrected

61a1950

allow warning supppression on empty conversations inside subconversation

f79b5ea

refer to hidden _utterances instead of property

d4c9880

allow participant counting to exclude None

20c65db

add test for FTO calculation

00ad82a

ensure participant count does not include future utterances

90c4ac6

split subconversation into two functions

69912f6

fix linter issue

d62a868

bvreede marked this pull request as ready for review November 28, 2023 10:07

update example notebook

9229e8a

bvreede mentioned this pull request Nov 28, 2023

compute FTO only for dyadic or triadic stretches #8

Closed

jiqicn reviewed Nov 28, 2023

View reviewed changes

bvreede and others added 4 commits November 29, 2023 09:14

Update sktalk/corpus/conversation.py

574a729

Co-authored-by: Ji Qi <[email protected]>

Update sktalk/corpus/conversation.py

473e753

Co-authored-by: Ji Qi <[email protected]>

add comments re: error

e076b6c

Update sktalk/corpus/conversation.py

3af95fc

Co-authored-by: Ji Qi <[email protected]>

bvreede and others added 5 commits November 30, 2023 17:26

rewrite FTO calculation

4c7ac0c

rename overlap function to make it available

ada29ce

update FTO calculation to account for partial overlap

13829bb

refactor overlap functions

2f64c94

Update sktalk/corpus/parsing/cha.py

b1e096f

Co-authored-by: Ji Qi <[email protected]>

bvreede requested a review from mdingemanse December 1, 2023 14:03

liesenf approved these changes Dec 1, 2023

View reviewed changes

bvreede merged commit ccd4bc4 into main Dec 1, 2023
9 checks passed

bvreede deleted the first-functions branch December 1, 2023 14:06

mdingemanse mentioned this pull request Sep 4, 2024

Add n and rank to corpus-level turn dynamics #68

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate turn type dynamics #1

Generate turn type dynamics #1

bvreede commented Mar 31, 2023 •

edited

Loading

mdingemanse commented Apr 2, 2023

mdingemanse Apr 2, 2023

mdingemanse Apr 2, 2023

sonarqubecloud bot commented Apr 7, 2023

mdingemanse commented Nov 3, 2023

mdingemanse commented Nov 3, 2023 •

edited

Loading

jiqicn left a comment

jiqicn Nov 28, 2023

bvreede Dec 1, 2023

carschno Dec 1, 2023

jiqicn Dec 1, 2023

bvreede Dec 1, 2023

mdingemanse commented Nov 30, 2023

sonarqubecloud bot commented Dec 1, 2023

Generate turn type dynamics #1

Generate turn type dynamics #1

Conversation

bvreede commented Mar 31, 2023 • edited Loading

mdingemanse commented Apr 2, 2023

mdingemanse Apr 2, 2023

Choose a reason for hiding this comment

mdingemanse Apr 2, 2023

Choose a reason for hiding this comment

sonarqubecloud bot commented Apr 7, 2023

mdingemanse commented Nov 3, 2023

mdingemanse commented Nov 3, 2023 • edited Loading

jiqicn left a comment

Choose a reason for hiding this comment

jiqicn Nov 28, 2023

Choose a reason for hiding this comment

bvreede Dec 1, 2023

Choose a reason for hiding this comment

carschno Dec 1, 2023

Choose a reason for hiding this comment

jiqicn Dec 1, 2023

Choose a reason for hiding this comment

bvreede Dec 1, 2023

Choose a reason for hiding this comment

mdingemanse commented Nov 30, 2023

sonarqubecloud bot commented Dec 1, 2023

bvreede commented Mar 31, 2023 •

edited

Loading

mdingemanse commented Nov 3, 2023 •

edited

Loading