Add `n` and `rank` to corpus-level turn dynamics #68

mdingemanse · 2024-09-04T07:57:00Z

Back when the turn dynamics PR was merged and closed, some issues fell by the wayside. One is this:

we have two more utterance-level column that are really quite useful, but that relate to the corpus (and therefore often the language) as a whole:
rank : the rank of this utterance in the frequency distribution of utterances in this language
n : the number of tokens of this utterance type in this language
E.g. below you can see that the n of î (third row) is 53: there are 53 utterances like this in the language. Its rank is 1: it is the most frequently attested turn format in the language.

Debatable whether this is truly an utterance feature, but we do use it for a lot of things, e.g. to pull up the most frequent utterance formats in a language, identify 'streaks' of similarly frequent utterances, etc.

Right now, neither scikit-talk nor talkr provide this functionality, meaning that most of the basic examples in our publications cannot be replicated yet with production code as it currently stands. I would argue these features are basic enough to be included somewhere. Open to discussion about best place to do it. As the issue notes, these are really best done at corpus level (we are usually not interested in the relative frequency of utterances in a single source).

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `n` and `rank` to corpus-level turn dynamics #68

Add `n` and `rank` to corpus-level turn dynamics #68

mdingemanse commented Sep 4, 2024 •

edited

Loading

Add n and rank to corpus-level turn dynamics #68

Add n and rank to corpus-level turn dynamics #68

Comments

mdingemanse commented Sep 4, 2024 • edited Loading

Add `n` and `rank` to corpus-level turn dynamics #68

Add `n` and `rank` to corpus-level turn dynamics #68

mdingemanse commented Sep 4, 2024 •

edited

Loading