Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle Unicode emoji variants #8

Open
colditzjb opened this issue Sep 29, 2017 · 3 comments
Open

Handle Unicode emoji variants #8

colditzjb opened this issue Sep 29, 2017 · 3 comments
Assignees

Comments

@colditzjb
Copy link
Member

Unicode variants of the //ufeo* type are not being recoded in the parser (decode.py). We may be able to ignore these as they are context-dependent and add little or no utility for classification purposes.

See this link:
https://stackoverflow.com/questions/38100329/some-emojis-e-g-have-two-unicode-u-u2601-and-u-u2601-ufe0f-what-does

@colditzjb colditzjb self-assigned this Sep 29, 2017
@colditzjb
Copy link
Member Author

Check out emojitracker's list of known emoji: https://github.com/mroth/emoji_data.rb/blob/master/vendor/emoji-data/emoji.json

@colditzjb
Copy link
Member Author

After some group discussion, a few Unicode variants may be potentially valuable for continued research (e.g., Fitzpatrick variants are potentially interesting, when available). This Unicode issue is an ongoing topic of discussion.

@colditzjb
Copy link
Member Author

@sanyabt - I think this should just be a simple update to the emojilist.csv file. Should we ask one of our RA's to do this? If so, is there a list of important emoji or symbols that we're not currently capturing? (Don't worry about foreign language Unicode characters though.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants