Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is this valid FoLiA? #104

Open
kosloot opened this issue Apr 4, 2022 · 3 comments
Open

Is this valid FoLiA? #104

kosloot opened this issue Apr 4, 2022 · 3 comments
Assignees

Comments

@kosloot
Copy link
Collaborator

kosloot commented Apr 4, 2022

Is this FoLiA valid?
Both folialint and foliavalidator reject it (on different grounds)

<?xml version="1.0" encoding="UTF-8"?>
<FoLiA xmlns="http://ilk.uvt.nl/folia" xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="WR-P-E-J-0000000001" version="2.5" generator="libfolia-v0.4">
  <metadata>
    <annotations>
      <division-annotation annotator="ko" set="div"/>
      <sentence-annotation annotator="ko" set="sent"/>
      <text-annotation annotator="ko" set="aset"/>
      <text-annotation annotator="iemand" set="bset"/>
    </annotations>
  </metadata>
  <text xml:id="WR-P-E-J-0000000001.text">
    <div xml:id="WR-P-E-J-0000000001.div0.1" class="test">
      <t set="aset" class="a">Dit is test. zin 2.</t>
      <t set="bset" class="a">Dit is test. zin 3.</t>
      <s xml:id="WR-P-E-J-0000000001.head.1.s.1">
	<t set="aset" class="a">Dit is test.</t>
	<t set="bset" class="a">Dit is test.</t>
      </s>
      <s xml:id="WR-P-E-J-0000000001.head.1.s.2">
	<t set="aset" class="a">zin 2.</t>
	<t set="bset" class="a">zin 3.</t>
      </s>
    </div>
  </text>
</FoLiA>

folialint says:

failed: inconsistent text: conflicting text (class=a) from node: t() with value
'Dit is test. zin 3.'
 with parent: div(WR-P-E-J-0000000001.div0.1) which already has text in that class and value: 
'Dit is test. zin 2.'

folivalidator:

VALIDATION ERROR on full parse by library (stage 2/3), in folia-bug.xml
ParseError: FoLiA exception in handling of <s> @ line 15 (in parent <div> @ parent line 12) : [NameError] name 'cls' is not defined

When I replace the class for the 'bset' by "b" in all the 3 cases, there is no problem

@kosloot
Copy link
Collaborator Author

kosloot commented Apr 5, 2022

The main problem is, that libfolia (and supposedly also FoLiApy) doesn't have a provision for handling text with the same textclass from different sets. (as far as I know, the text-checking code NEVER takes sets into account)

I have no objection to require all text in a document to stem from one set only, but I assume that a better solution would be to amend the code, and take the set names seriously.

@kosloot kosloot added the bug label Apr 6, 2022
@kosloot
Copy link
Collaborator Author

kosloot commented Apr 6, 2022

So we have some serious questions here:

  1. may there be more then one <text-annotation> in a document?
    I suppose YES, (with different setnames, of course )
  2. Having more then one <text-annotation>, may there be the same class names in different sets?
    I suppose YES too

But this has great ramifications for the current code, which really has NO support for more then one text-annotation.
e.g. functions like hastext() only look at the class name. And ALL checks for text consistency will fail when 2 sets are in sight.
This is true for libfolia, at least. But I have NO indication that FoLiApy behaves any better.

kosloot added a commit to LanguageMachines/libfolia that referenced this issue Apr 7, 2022
@kosloot
Copy link
Collaborator Author

kosloot commented Apr 7, 2022

For now, libfolia has some code added to prevent these problems. Only one text_annotation may be declared.
It remains a wish to relax this limitation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants