Generate turn type dynamics (#1)

* code from notebook * add turndynamics code from notebook * instructions for disabling the bloody githook * move code from test to notebook * remove notebook from repository * start adding utterance functions * add calculated fields to dataclass * add python 3.6 to show that it breaks * add python 3.8 to show that it breaks * add python 3.6 to show that it breaks * add python 3.8 to show that it breaks * remove earlier python versions again * add fields to initial data class * implement until method calculating time differences between utterances * inmplement subconversation and until next method at conversation object * autopep8 * elaborate subconversation * move time processing to utterance * object oriented and small fixes * refactor post init * add subconversation functionaliry * subconversation can select based on index or time * make linter happy * also pack arguments for second subconversation test * fix linting issues * add dyadic property * apply conversation wide calculations dyadic and time to nxt * subconversation is internal * count number of participants * calculate FTO * remove old code * address linter comments * address linter issues and update fto calculation * fix noqa * fix noqa * calculations update in metadata corrected * allow warning supppression on empty conversations inside subconversation * refer to hidden _utterances instead of property * allow participant counting to exclude None * add test for FTO calculation * ensure participant count does not include future utterances * split subconversation into two functions * fix linter issue * update example notebook * Update sktalk/corpus/conversation.py Co-authored-by: Ji Qi <[email protected]> * Update sktalk/corpus/conversation.py Co-authored-by: Ji Qi <[email protected]> * add comments re: error * Update sktalk/corpus/conversation.py Co-authored-by: Ji Qi <[email protected]> * rewrite FTO calculation * rename overlap function to make it available * update FTO calculation to account for partial overlap * refactor overlap functions * Update sktalk/corpus/parsing/cha.py Co-authored-by: Ji Qi <[email protected]> --------- Co-authored-by: Ji Qi <[email protected]>
elpaco-escience · Dec 1, 2023 · ccd4bc4 · ccd4bc4
1 parent 9399445
commit ccd4bc4
Show file tree

Hide file tree

Showing 11 changed files with 745 additions and 101 deletions.
diff --git a/.githooks/pre-commit b/.githooks/pre-commit
@@ -2,6 +2,8 @@
 
 ### To enable this githook, run:
 ### git config --local core.hooksPath .githooks
+### to disable:
+### git config --unset core.hooksPath
 
 echo "Script $0 triggered ..."
 

diff --git a/docs/notebooks/example.ipynb b/docs/notebooks/example.ipynb
@@ -63,7 +63,7 @@
     {
      "data": {
       "text/plain": [
-       "<sktalk.corpus.conversation.Conversation at 0x10ea2bd60>"
+       "<sktalk.corpus.conversation.Conversation at 0x116bc4af0>"
       ]
      },
      "execution_count": 2,
@@ -92,16 +92,16 @@
     {
      "data": {
       "text/plain": [
-       "[Utterance(utterance='0', participant='S', time=(0, 1500), begin='00:00:00.000', end='00:00:01.500', metadata=None),\n",
-       " Utterance(utterance=\"mm I'm glad I saw you⇗\", participant='S', time=(1500, 2775), begin='00:00:01.500', end='00:00:02.775', metadata=None),\n",
-       " Utterance(utterance=\"I thought I'd lost you (0.3)\", participant='S', time=(2775, 3773), begin='00:00:02.775', end='00:00:03.773', metadata=None),\n",
-       " Utterance(utterance=\"⌈no I've been here for a whi:le⌉,\", participant='H', time=(4052, 5515), begin='00:00:04.052', end='00:00:05.515', metadata=None),\n",
-       " Utterance(utterance='⌊xxx⌋ (0.3)', participant='S', time=(4052, 5817), begin='00:00:04.052', end='00:00:05.817', metadata=None),\n",
-       " Utterance(utterance=\"⌊hm:: (.) if ʔI couldn't boʔrrow, (1.3) the second (0.2) book of readings fo:r\", participant='S', time=(6140, 9487), begin='00:00:06.140', end='00:00:09.487', metadata=None),\n",
-       " Utterance(utterance='commu:nicating acro-', participant='H', time=(12888, 14050), begin='00:00:12.888', end='00:00:14.050', metadata=None),\n",
-       " Utterance(utterance='no: for family gender and sexuality', participant='H', time=(14050, 17014), begin='00:00:14.050', end='00:00:17.014', metadata=None),\n",
-       " Utterance(utterance=\"+≋ ah: that's the second on is itʔ\", participant='S', time=(17014, 18611), begin='00:00:17.014', end='00:00:18.611', metadata=None),\n",
-       " Utterance(utterance=\"+≋ I think it's s⌈ame family gender⌉ has a second book\", participant='H', time=(18611, 21090), begin='00:00:18.611', end='00:00:21.090', metadata=None)]"
+       "[Utterance(utterance='0', participant='S', time=[0, 1500], begin='00:00:00.000', end='00:00:01.500', metadata=None, utterance_clean='S x150_1500x15', utterance_list=['S', 'x150_1500x15'], n_words=2, n_characters=13, time_to_next=None, dyadic=None, FTO=None),\n",
+       " Utterance(utterance=\"mm I'm glad I saw you⇗\", participant='S', time=[1500, 2775], begin='00:00:01.500', end='00:00:02.775', metadata=None, utterance_clean='S mm Im glad I saw you x151500_2775x15', utterance_list=['S', 'mm', 'Im', 'glad', 'I', 'saw', 'you', 'x151500_2775x15'], n_words=8, n_characters=31, time_to_next=None, dyadic=None, FTO=None),\n",
+       " Utterance(utterance=\"I thought I'd lost you (0.3)\", participant='S', time=[2775, 3773], begin='00:00:02.775', end='00:00:03.773', metadata=None, utterance_clean='S I thought Id lost you x152775_3773x15 x153773_4052x15', utterance_list=['S', 'I', 'thought', 'Id', 'lost', 'you', 'x152775_3773x15', 'x153773_4052x15'], n_words=8, n_characters=48, time_to_next=None, dyadic=None, FTO=None),\n",
+       " Utterance(utterance=\"⌈no I've been here for a whi:le⌉,\", participant='H', time=[4052, 5515], begin='00:00:04.052', end='00:00:05.515', metadata=None, utterance_clean='H no Ive been here for a while x154052_5515x15', utterance_list=['H', 'no', 'Ive', 'been', 'here', 'for', 'a', 'while', 'x154052_5515x15'], n_words=9, n_characters=38, time_to_next=None, dyadic=None, FTO=None),\n",
+       " Utterance(utterance='⌊xxx⌋ (0.3)', participant='S', time=[4052, 5817], begin='00:00:04.052', end='00:00:05.817', metadata=None, utterance_clean='S xxx x154052_5817x15 x155817_6140x15', utterance_list=['S', 'xxx', 'x154052_5817x15', 'x155817_6140x15'], n_words=4, n_characters=34, time_to_next=None, dyadic=None, FTO=None),\n",
+       " Utterance(utterance=\"⌊hm:: (.) if ʔI couldn't boʔrrow, (1.3) the second (0.2) book of readings fo:r\", participant='S', time=[6140, 9487], begin='00:00:06.140', end='00:00:09.487', metadata=None, utterance_clean='S hm  if ʔI couldnt boʔrrow x156140_9487x15 the second book of readings for x159487_12888x15', utterance_list=['S', 'hm', 'if', 'ʔI', 'couldnt', 'boʔrrow', 'x156140_9487x15', 'the', 'second', 'book', 'of', 'readings', 'for', 'x159487_12888x15'], n_words=14, n_characters=78, time_to_next=None, dyadic=None, FTO=None),\n",
+       " Utterance(utterance='commu:nicating acro-', participant='H', time=[12888, 14050], begin='00:00:12.888', end='00:00:14.050', metadata=None, utterance_clean='H communicating acro x1512888_14050x15', utterance_list=['H', 'communicating', 'acro', 'x1512888_14050x15'], n_words=4, n_characters=35, time_to_next=None, dyadic=None, FTO=None),\n",
+       " Utterance(utterance='no: for family gender and sexuality', participant='H', time=[14050, 17014], begin='00:00:14.050', end='00:00:17.014', metadata=None, utterance_clean='H no for family gender and sexuality x1514050_17014x15', utterance_list=['H', 'no', 'for', 'family', 'gender', 'and', 'sexuality', 'x1514050_17014x15'], n_words=8, n_characters=47, time_to_next=None, dyadic=None, FTO=None),\n",
+       " Utterance(utterance=\"+≋ ah: that's the second on is itʔ\", participant='S', time=[17014, 18611], begin='00:00:17.014', end='00:00:18.611', metadata=None, utterance_clean='S  ah thats the second on is itʔ x1517014_18611x15', utterance_list=['S', 'ah', 'thats', 'the', 'second', 'on', 'is', 'itʔ', 'x1517014_18611x15'], n_words=9, n_characters=41, time_to_next=None, dyadic=None, FTO=None),\n",
+       " Utterance(utterance=\"+≋ I think it's s⌈ame family gender⌉ has a second book\", participant='H', time=[18611, 21090], begin='00:00:18.611', end='00:00:21.090', metadata=None, utterance_clean='H  I think its same family gender has a second book x1518611_21090x15', utterance_list=['H', 'I', 'think', 'its', 'same', 'family', 'gender', 'has', 'a', 'second', 'book', 'x1518611_21090x15'], n_words=12, n_characters=57, time_to_next=None, dyadic=None, FTO=None)]"
       ]
      },
      "execution_count": 3,
@@ -225,7 +225,7 @@
     {
      "data": {
       "text/plain": [
-       "[<sktalk.corpus.conversation.Conversation at 0x10ea2bd60>]"
+       "[<sktalk.corpus.conversation.Conversation at 0x116bc4af0>]"
       ]
      },
      "execution_count": 7,
@@ -256,6 +256,92 @@
    "source": [
     "GCSAusE.write_json(path = \"CGSAusE.json\")\n"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Analyzing turn-taking dynamics\n",
+    "\n",
+    "When creating a `Conversation` object, a number of calculations and transformations are performed on the `Utterance` objects within.\n",
+    "For example, the number of words in each utterance is calculated, and stored under `Utterance.n_words`.\n",
+    "You can see this for a specific utterance as follows:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "2"
+      ]
+     },
+     "execution_count": 9,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "cha01.utterances[0].n_words"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "More sophisticated calculations can be performed, but do not happen automatically.\n",
+    "An example of this is the calculation of the Floor Transfer Offset (FTO) per utterance.\n",
+    "FTO is defined as the difference between the time that a turn starts, and the end of the most relevant prior turn by the other participant.\n",
+    "If there is overlap between these turns, the FTO is negative.\n",
+    "If there is a pause between these utterances, the FTO is positive.\n",
+    "\n",
+    "We can calculate the FTOs of the utterances in a conversation:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[0, 1500] S - FTO: None\n",
+      "[1500, 2775] S - FTO: None\n",
+      "[2775, 3773] S - FTO: None\n",
+      "[4052, 5515] H - FTO: 279\n",
+      "[4052, 5817] S - FTO: None\n",
+      "[6140, 9487] S - FTO: 625\n",
+      "[12888, 14050] H - FTO: 3401\n",
+      "[14050, 17014] H - FTO: 4563\n",
+      "[17014, 18611] S - FTO: 0\n",
+      "[18611, 21090] H - FTO: 0\n"
+     ]
+    }
+   ],
+   "source": [
+    "cha01.calculate_FTO()\n",
+    "\n",
+    "for utterance in cha01.utterances[:10]:\n",
+    "    print(f'{utterance.time} {utterance.participant} - FTO: {utterance.FTO}')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To determine which prior turn is the relevant turn for FTO calculation, the following criteria are used to find a relevant utterance prior to an utterance U:\n",
+    "\n",
+    "- the relevant utterance must be by another participant\n",
+    "- the relevant utterance must be the most recent utterance by that participant\n",
+    "- the relevant utterance must have started more than a specified number of ms before the start of U. This time defaults to 200 ms, but can be changed with the `planning_buffer` argument.\n",
+    "- the relevant utterance must be partly or entirely within the context window. The context window is defined as 10s (or 10000ms) prior to the utterance U. The size of this window can be changed with the `window` argument.\n",
+    "- within the context window, there must be a maximum of 2 speakers, which can be changed to 3 with the `n_participants` argument."
+   ]
   }
  ],
  "metadata": {