-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
15 Adding CoreferenceProcessor #41
base: master
Are you sure you want to change the base?
Conversation
…_coreference_processor
Please review the latest commits. |
Merge conflict for |
@@ -83,12 +83,6 @@ jobs: | |||
run: | | |||
pip install --use-feature=2020-resolver --progress-bar off .[test] | |||
|
|||
- name: Install Forte-wrappers-spacy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we shouldn't remove the installation directly from the workflow, let's actually use the workflow matrix set to run different dependencies as different runs.
Here is an example: https://github.com/asyml/forte-wrappers/blob/main/.github/workflows/main.yml#L84
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"use the workflow matrix set to run different dependencies as different runs."
Do you mean in the future we will have different versions of forte-wrapper, and you want to define a matrix to test them all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- No, not different versions of
forte-wrapper
. I show you theforte-wrapper
repository workflow as an example of how to set up a workflow matrix. - In this repo, ForteHealth, we can set up a similar matrix, since not all processors in this repo can be installed and run at the same time, they can be tested individually. Note that you could use the
extra_install
method insetup.py
to specify different sets of dependencies.
You may wonder how we make them work together in the future, that's another issue (we have solutions such as RemoteProcessor or Docker images). Right now we can focus on making them each testable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still have a question: @Piyush13y seems to want to set forte.spacy as an extra_install. If so, forte.spacy and neurocoref will be in the same extra_install list, example:
setuptools.setup(
....
install_requires=[
"forte~=0.2.0",
],
extras_require={
.....
"coreference": [
"forte.spacy", # TODO: version
"cython>=0.25",
"neuralcoref @ git+https://[email protected]/huggingface/[email protected]#egg=neuralcoref",
],
},
include_package_data=True,
python_requires=">=3.6",
)
I think pip won't make sure that forte.spacy is install ahead of neuralcoref.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Piyush13y seems to want to set forte.spacy as an extra_install.
Emmm... It's a little bit confusing because now forte.spacy is not even in the setup.py in master branch: https://github.com/asyml/ForteHealth/blob/master/setup.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Installing a spacy version compatible with neuralcoref causes the failure of SpacyProcessor
. Since SpacyProcessor
is needed in unit test, this seems to be a dead end before we can manage to run the modules separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now we come back to my initial suggestion, use the matrix
set up to run the module separately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bad news. A spacy version compatible with neuralcoref is not compatible with my RTX 3060.
I checked the issues in neuralcoref regarding the spacy version problem, almost everyone is suggesting building neuralcoref from source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest release of neuralcoref is in 2019. Maybe it is worth considering changing a better package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huh, actually for now we are simply looking for a coreference module that works. Maybe we can directly add this function to AllenNLPProcessor, or other packages you find easy
# It is annoying that if we install neuralcoref and spacy at the same | ||
# time, neuralcoref will throw "Cython failed" during building. | ||
# Therefore, we must install neuralcoref after spacy is installed. | ||
# git+https://[email protected]/huggingface/[email protected]#egg=neuralcoref |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can deal with this through setup.py
using extra requires.
# properly. | ||
# Therefore, we must install neuralcoref after cython and spacy | ||
# are installed. | ||
p = subprocess.call( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use more standard way in setup.py
This PR fixes #15.
Description of changes
Added
CoreferenceProcessor
and its unit test. Updatedsetup.py
,requirements.txt
, and ci workflow.CoreferenceProcessor
: a wrapper forNeuralCoref
. It translates the original output toCoreferenceGroup
andMedicalEntityMention
and puts them into data pack.setup.py
andrequirements.txt
: addedcython
, andpytest
, which are dependencies ofneuralcoref
.fortex.spacy
. Sinceneuralcoref
needs to be built from source. See the reason in NeuralCoref Installation Issue.Possible influences of this PR.
cython
andfortex.spacy
.Test Conducted
Tested on new unit test.
NeuralCoref Installation Issue
Installing
neuralcoref
from PyPI causes segmentation fault, since the binary file is built withspacy
2.1, while thespacy
used byforte.spacy
is 2.3. We need to build from source to use customizedspacy
version.Building from source needs
spacy
andcython
being already installed. Puttingneuralcoref
together withspacy
andcython
inrequirements.txt
andsetup.py
can cause "Cython failed" exception, since building binary files comes ahead of the installation ofcython
andspacy
. Therefore we need to delay the installation ofneuralcoref
.But I don't know how to do this in an elegant way.
requirements.txt
seems not to support this kind of two stage installation.setup.py
seems to support, but I don't know how to. So I added a new stage in CI after the installation ofrequirements.txt
andfortex.spacy
, treatingneuralcoref
as an optional dependency likefortex.spacy
.