You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Expected behaviour
The Pympi's exported ELAN file should be opened by the Annotated Corpus Toolkit (ACT) or should be formatted as original ELAN file.
Actual behaviour
The exported ELAN file should be able to be processed by ACT or should be formatted as original ELAN.
System information
python version: 3.10
os: Linux Mint 21.3
are you up to date with the latest master?: yes 1.70.2
Additional context
I work both with Pympi and Oliver Ehmer's Annotated Corpus Tollkit for R (ACT) that are too great pieces of code for linguists working with ELAN.
I noticed that the ELAN files exported with pympi (with or without "pretty" parameter) could not be processed directly by ACT (see below).
However, they can if this file has been opened then saved in ELAN.
So I took a look at diffs between the pympi's fresh export and the ELAN overwrite and found these two located issues when importing pympi file in ACT :
the file would not be loaded at all : apparently this error is due to the EAF version statement of the file for the attribute xsi:noNamespaceSchemaLocation (3.0 will be loaded, not 2.8).
if issue 1 is corrected (2.8>3.0), the file is loaded but then the time values are not found by ACT : however it works if the "space" character before the TIME_SLOT closing tag is removed.
Workaround found
If I bulk replace version number (2.8>3.0) and if I bulk remove the space character before every closing XML tag, then the file is successfully processed by ACT.
Since the original ELAN files are not formatted as such, I though it was more a "pympi" issue rather than an "ACT" issue.
So maybe some slight export modifications are welcome in pympi ?
Thank you for your work,
Lucien
The text was updated successfully, but these errors were encountered:
Extra spacing before closing XML tags signals a fragile XML parser from ACT's side. However, I'm not opposed to generating stricter XML without this spacing as it doesn't change the semantics of the file.
Increasing the version can be done, but we have to make sure that the generated file really is 3.0 compliant. Since the major version is increased, I assume there are some backwards incompatible changes between 2.x and 3.x.
The specification can be found here: https://www.mpi.nl/tools/elan/EAF_Annotation_Format_3.0_and_ELAN.pdf
In a previous issue we found that it probably is compatible though (#29).
So in short, yes please, I'm would be happy to accept merge requests for this.
Totally agree, when I have time I will have a look into the differences between 2.8 and 3.0 before trying to propose something (still a beginner in python but learning by doing). Also maybe proposing ACT to treat space character cases as it's compliant with XML syntax.
In the meantime I hope some people may find the workaround useful it they are blocked.
Expected behaviour
The Pympi's exported ELAN file should be opened by the Annotated Corpus Toolkit (ACT) or should be formatted as original ELAN file.
Actual behaviour
The exported ELAN file should be able to be processed by ACT or should be formatted as original ELAN.
System information
Additional context
I work both with Pympi and Oliver Ehmer's Annotated Corpus Tollkit for R (ACT) that are too great pieces of code for linguists working with ELAN.
I noticed that the ELAN files exported with pympi (with or without "pretty" parameter) could not be processed directly by ACT (see below).
However, they can if this file has been opened then saved in ELAN.
So I took a look at diffs between the pympi's fresh export and the ELAN overwrite and found these two located issues when importing pympi file in ACT :
xsi:noNamespaceSchemaLocation
(3.0
will be loaded, not2.8
).2.8
>3.0
), the file is loaded but then the time values are not found by ACT : however it works if the "space" character before the TIME_SLOT closing tag is removed.Workaround found
If I bulk replace version number (
2.8
>3.0
) and if I bulk remove the space character before every closing XML tag, then the file is successfully processed by ACT.Since the original ELAN files are not formatted as such, I though it was more a "pympi" issue rather than an "ACT" issue.
So maybe some slight export modifications are welcome in pympi ?
Thank you for your work,
Lucien
The text was updated successfully, but these errors were encountered: