output sentences to file #918
Replies: 16 comments 7 replies
-
|
Beta Was this translation helpful? Give feedback.
-
separate command:
output_file.write("\n")
…On Fri, Jan 7, 2022 at 2:42 AM SteveBrodie ***@***.***> wrote:
It's so close, but when I write the output to file it compresses it into a
chunk. I tried adding output_file.write(sentence.text sep='\n') but it
throws an error: TypeError: write() takes no keyword arguments. New to
Python and new to Stanford Core today so please bear with me - it's amazing.
—
Reply to this email directly, view it on GitHub
<#918 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA2AYWN6UK32WSRENMDZ623UU27SVANCNFSM5LOCATYA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Bit more clarification on what you need would be helpful... |
Beta Was this translation helpful? Give feedback.
-
Happy New Year. Found some time to get back to this. I'm running NER on a file and trying to output to a file as before. I have the NER results printing but not saving to file. Keep getting error: AttributeError: 'str' object has no attribute 'ent' nlp = stanza.Pipeline(lang='en', processors='tokenize,ner') with open("output/names","w") as waiter_names: |
Beta Was this translation helpful? Give feedback.
-
Suggestion: use three backticks (backwards apostrophes) to format code. It will look like this:
--->
|
Beta Was this translation helpful? Give feedback.
-
Thanks again for your help. I see now - I was complicating the output. waiter_names is a strange name but is evocative when taken in the context of the source file - it's is a selection of fiction that I grep'd for "waiter" and now I am creating a list of the characters names from within the results to see what characters were involved. Hope that makes some kind of sense. Is there way to filter the entity's before outputting them? For example I only need the PERSON filed for this task. Thanks. |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
Awesome - I have all of that running now. Now I have a bash script that I'm trying to embed the stanza python script in. The bash script creates an array of filenames (books) and loops over them with simple linux commands (grep | sort | uniq) and outputs the results to separate files using the filename variable in the naming. Here is the script so far. Lines 20-22 are where I'd need to feed the bash variable in as a filename and then pipe the output to grep somehow. I'm not even sure if this is doable so I thought I'd ask first before trying to figure it out. No idea how to run the loop in Python with those variables in that way sorry. |
Beta Was this translation helpful? Give feedback.
-
Ahh good point. Thanks! |
Beta Was this translation helpful? Give feedback.
-
This is certainly something we could do ourselves, but there are a couple
questions:
for #text, what to do with sentences with newlines? remove the newlines?
for #sent_id, is there some pattern to follow, or just count 1, 2, 3, ...
The easiest thing to do would be if you put the exact comments you want on
the sentences before calling output
…On Fri, Mar 18, 2022 at 3:58 AM Luigi Talamo ***@***.***> wrote:
Hello, this is probably related to the discussion.
I am using the CoNLL.write_doc2coll() method from the dev branch to write
down CoNLL to documents, but I see that no sent_id and text (i.e.,
comments) are written. I have tried to print the sentence.comments from a
parsed doc, but I see it's empty. Is there an option to pass to the nlp
pipeline to fill the comments?
Thank you !
Luigi
—
Reply to this email directly, view it on GitHub
<#918 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA2AYWIQRXKQ53UOTSAZLXLVAROWVANCNFSM5LOCATYA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
I think that the output provided by the UDPipe parser may be a correct pattern, so something like While reading the text line by line, I have tried to put the enumerate(line) as sent_id and the line as text before parsing, but the result is non-consistent, as several times the line is split into more than one sentence... So, the comment has definitely to be taken from the NLP pipeline. |
Beta Was this translation helpful? Give feedback.
-
i assume the conll tests will throw some error, but the changelist i just posted should do it once those tests are fixed |
Beta Was this translation helpful? Give feedback.
-
so, the requested feature is already in the dev branch? |
Beta Was this translation helpful? Give feedback.
-
For I haven't figured out a great way to have it index sentence id. I mean, obviously they could just count 1, 2, 3, etc, but most of the datasets have some kind of prefix or format on the ID |
Beta Was this translation helpful? Give feedback.
-
ok thank you , I have tested it and now it prints the text in comment. yes, UD treebanks have for instance the pattern: acronym of the treebank-1,2,3 e.g., # sent_id = isst_tanl-2 for the UD Italian ISST treebank Documents are also introduced by a newdoc attribute: #newdoc = tanl which for instance can correspond to the filename, or other attributes. |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
I would like like to be able to print the output from the following command to new lines. At the moment I can only get the output as a single array. Is there a way to output this as a list of sentences? I would like be able to number each sentence. The current output would mean that I would have to split the sentences in AWK or SED, which seems counter-intuitive.
print([sentence.text for sentence in doc.sentences]
output_example.txt
Thankyou.
Beta Was this translation helpful? Give feedback.
All reactions