You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently there are some issues related to converting between formats.
One problem with formats is that converting between them is always lossy. Even between CoNLL-U and CG3, quite a bit is lost. For example, only CoNLL-U supports enhanced dependencies and a difference between X/UPOSTAGS, and CG3 and CoNLL-U handle subtokens differently (and store different information about them, I think?).
So if the user would like to edit the corpus in a different format, and we try to preserve some of the information not native to that format in an underlying format, then when they modify the number or position of tokens, or modify information related to non-visible information, then things could easily get lost, or at least lost track of.
We have a few options for how to deal with this:
We could just leave it as is, where data loss just always happens,
We could make it harder to switch formats—or at least to switch formats and edit the new format. Perhaps make different formats view-only by default, and then display a modal when the user tries to start editing in a different format than the corpus is "stored in" (or was originally in), along the lines of "You will lose data—only proceed if you're okay with that!"
We could try to keep track of data that is going to be lost more carefully so that it's only really ever lost if the user does something that disrupts a particular token or the ability to keep track of associated data. As opposed to just replacing the stored corpus with the new format. This would require implementing a better "format-neutral" way of storing data than what is already in notatrix.
What is preferred? Other ideas?
The text was updated successfully, but these errors were encountered:
Currently there are some issues related to converting between formats.
One problem with formats is that converting between them is always lossy. Even between CoNLL-U and CG3, quite a bit is lost. For example, only CoNLL-U supports enhanced dependencies and a difference between X/UPOSTAGS, and CG3 and CoNLL-U handle subtokens differently (and store different information about them, I think?).
So if the user would like to edit the corpus in a different format, and we try to preserve some of the information not native to that format in an underlying format, then when they modify the number or position of tokens, or modify information related to non-visible information, then things could easily get lost, or at least lost track of.
We have a few options for how to deal with this:
What is preferred? Other ideas?
The text was updated successfully, but these errors were encountered: