Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

modified get_stop_words(), preventing being changed from outside. #31

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

yyanhan
Copy link

@yyanhan yyanhan commented Mar 12, 2024

Dear Alir3z4,

I used this repo for the work at my previous company, and I found one issue with the function get_stop_words():

if we obtain the list in variable and modifiy the list variable, like:

en_stop_words = get_stop_words('en')
en_stop_words.append('harrypotter')

then the return list from get_stop_words() will also be changed:

'harrypotter' in get_stop_words('en')   # True

This will raise a mistake when we call the function get_stop_words('en') many times recursively, like:

en_stop_words_again = get_stop_words('en')
'harrypotter' in en_stop_words_again    # True

To solve this issue, of course the user can use copy.deepcopy(get_stop_words('en')), however this may not be noticed by the user.

Thus I added a copy in the function get_stop_words('en'), namely:

replacing:
    return stop_words

by: 
     return stop_words[:]

and as a result:

en_stop_words = get_stop_words('en')
en_stop_words.append('harrypotter')
en_stop_words_again = get_stop_words('en')

'harrypotter' in en_stop_words              # True
'harrypotter' in get_stop_words('en')     # False
'harrypotter' in en_stop_words_again    # False

And I have tested the performance before and after, see:

I hope this PR can make it better!

Best
Han

@@ -1,3 +1,4 @@
from copy import deepcopy
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused

Suggested change
from copy import deepcopy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants