A simple piece of python code that extracts unique emojis on a webpage and saves them into a json file. Meant for use on https://www.unicode.org/emoji/charts/full-emoji-list.html to produce a list of emojis usable as a regex pattern.
Flag emojis are accounted for by adding a list of all two letter permutations of letter emojis to the extracted emojis to get a complete emoji list. Take note if attempting to use the code on a different webpage.
Alternative ways of getting a list of emojis include using emoji.UNICODE_EMOJI from the emoji library, but it returned an error when I used it in regex, captured a number of characters I did not consider emojis, and was thus not suitable for my purposes. That was how this mini 'project' was conceived.
- Python 3
This project was designed to only use python standard libraries.
python has to be downloaded and installed.
Edit emoji_scraper.py using any text editor. Paste the webpage into the "emoji_webpage" variable in the file before running it from command line or any IDE.
emojis.json is the result. It contains a list of emojis. You may just download the file if you are not concerned with new emoji updates.
How do I use a list of emojis?
This code was written in order to get a regex pattern that can match all emojis. To get such a pattern, use:
with open(r'.\emojis.json') as f:
emojis = json.load(f)
pattern = '|'.join(emojis)
I get UnicodeEncodeError: 'charmap' codec can't encode character...
python is not running on utf-8 encoding on your computer. Append the following code to what you're running to fix the error:
import sys
import io
sys.stdout = io.TextIOWrapper(sys.stdout.detach(), encoding='utf-8')
sys.stderr = io.TextIOWrapper(sys.stderr.detach(), encoding='utf-8')
🚖🤴🌐🔈📭🕹🆖🕌🌴🌽🦲🐛🧑👻🥲🕞🎣💼🧸🚒💢🦻🍹🪵ç🕗🏄🛠👠🎀🍌🎪🕐🤥🐨🏭🍜🔛📲🪃🧫🛵😰🧱📐🖇🤵🍺😭📱📛🕸🍥🤚🕴🦖🧞😹🦘📹😈🏙🩲👈⌚🧢🦠☕💷🕎🥵😪👙👥🗑↩📣🏞🥛🦪🚠🦎🕯💓➕🎨🍾🔞🦧📥💏⏹🎾🆘🎰🏔💥✈📄🦑🥪🥂🥥📍🤏🍣🥿📡👽☠🩺😑🎇🐗🔪🎞🙄😶🎛🐣🌟🍫🏂🈚🚵🀄🔟🔳🐪✝🍭🚥☺🆒🚳🐏🛰🐦🟧🍿🌼🪳🌱🏥😒🛌🟢🐔🈺👟🪆í❓🛶🎢🥱🎹🈲🦿🛍🍇💙🪞🥧🤛🫒🪜😝🍃🎄♨🎓🧲👭🌅🚓🤌😇🙈🙂🍞🛢➖🍆🦄🚗🥩↕🏅📁🉐🤡🤞🐳✍🆕⬜🔙🧽🍛➰🌲🎊🐑🚐🛷👛🛃🥶😆🖱👺🆎🌌⏯🧕🛳🏚🍑🪅❕🪨⌛🥈🙊🌧💁🦁🏜🗽🚷🕙📎🚪🎽😱♓👨🐘😤🕤🎳🐙📢😁🗝⭕🥳🐮😼🎠🥷◀🛗🧒🔆🚬👎🏮♾🌄📪🍂📴🦴🔜🔥🈯🍄🤎🦝🥢✊📶🔎🕳🌖🉑🌹🍗💶💅🚄😖🐁👧👂🏠🤤🔍‼🍮💛🧳🎡🎈🕊🎲🟩🚂😏📺🥭🪗😜🎩🌬🚞⏸▪✴🏨🧪🏩🔦🈳🥌😀🗞💃🖕🛤🚉🐲🧠🏡🏪🕚📼🍒⏭🦂💋🐺😞👢🏗🧬🤜🥟🪖🐫…🛄🔅💜🦜➗💲⬅💧🎯🚸◻🤼🌦👒🍘🐞♍💱🙉🐶🚯🏯🟥👡🖨🐀🈷🎮🦅🕥❎🐋👴💒⬆⚪🔊🎒⏏🖲⏫🙅🕍📮🧍🆗№😧⏳📟👱🕠🏐❌🍧🧈🥝🧶🍚🦟☹🤽®👸📞🏉👼⊛😲💔🫐👉🌊🆙🟫📊🚑🔰🏝🧏🎏😯🥻☁🧚🏬🆓🔓👿🪐🚌🪂🖼👫🐍🧟🧰😮🧭🌕☀🦱🍎💬🐆🥞⚠🗒🤿🥊🦽ñ🛋👩🈶🍽🙁🅱♉🏷🦈🧎🌸🤐🦗🧨🍋♈🦀🌝📧🌘🧊👕🍔🔵📗🚏🥽🖥👗🦷🤓🎭👑🧦❄👁🌆🪢🕕🧁🪘🦺💭🚎🌗🦉📉🤑🏫🎫⛴🏤🛼😸😍🌃👹⚒📙☣🪁🍏🐉🛑☯🦋🦍📏🕜🔏🤰é🤗👪🐱🤷🤾⏲💺♐⛎🟣👅🍴🐓🥅🙋⌨🎵✋⛩🫀↗🐠🎷😐👘♎👰🔑🍢😻📘🛅🍝💘🏟🐜🦛😔🥮🧴📋☸⛸🛣🪠💠🤙🦞☢📳💈🥠🐽🌮🥯🐤♒🎼▶📸🧝⚽👯🕓🌑ℹ♦⛄👇🪥☘🛻⬛🌞🦇☝🛐🚝🧄🈁🏺⏪🤔🌛🌈😽🐬🐕🛂👤🧮🌨🥄🦃🌔😠🚔📃😳🐾⛺💩🧯🆚🥓🏹🕖⁉🪝🥣🖤🤪🔩😚🚊🍐🖐🪧↖🗜🍉📓🛎😟🍈📈🏕📒🏖👖💿👋👀😷🎃🟠🕛🚶🥏😣⚰🌯💴🥜🔯🚤🎐🧅♊🌇◾🐢🚿😵🎅🍰🪰🦊👆🛸📯🌠❗🚘🔝😬🔌🎎🥑🤖🗡🦮💎✅🫂🚧🪄🚩🤳🦒👍🏆🥡🦥🥋🎤🦰📽😙🍠⛵🧔🤝🐒🧂🧥🪚📠💂🥫🦨😅🟪🟦🙌™🎺📤👚🗿✡🎟🧩☦🐐🕋📻🍓💹🖌⏰🍙🌡〽😂💳🛺🥴⃣🕣🕶👷🏍⛓🍬🔣🪓💵💽🛏💟⚗💦🏦😗🏃🤢🛫🦢🔮🎍☑🌿😡🐃🎬🔀💫😫⤵🅾🎸🙏🦾📌↙🫑🥐🙎🧖🛹⛏🤘🙆🧆🚫📷💰⛪🔚🪒🏳💞🥖🌪📫😓🤺⛅🦸📖🔫❇💌🧇🧓🍅🔼⚔🏇☂👳🚟🪣🛡🛴♌🤹🦳🥒🤶🚍🍪🥤🥸🗯💐🍵📚😨㊗😊🌰⚫🔘🔨🦆🧋🖊🧼🏑🚈😎🔒🧿🏛♂🔹🥁🔧📰🈂🦭🗄📿💇🐵🦦🎖🎗🟨🌩👾😾🅿🗻🔸💤🖍🚜🦬🏵🧉📨▫🛒🤦🦓📑❤♿⏬🧤🗳😃💚⚛🈹⤴Å🏌🗺🚱🪔🌭🌉🎚💑🌓🚮🧾👐🍨🤍🕝🥎🐸💍🚋🔱🌍🤩➡🍟🚲↔💄🥕🎴😘🦙🚦🥔🚰✌🗾🧙👮🦡🕉👓🥰🦶📝🔬🦤😛🧣⚜🈵⏺🆑🌀🧃🗂🦔⬇🪟🥦🕺🍶😢🕢©🕷⚖🌻🎦🥀💣🍀🈸🍊🫁🐈🌥🚾🎿🤧⏱⛲🐅🕰🧧🐝🥃🪴🪦🔇👶🏰🚹🍷🐴🏊🤬🌁🐻🌋🐡🦹🌫🙀🪑🪡🏒🍤📆✨🎱✉🎌⛳🏣🐖📬🤭🛥🪛😴🌂🐩♥🤟🐂♠🥗🤯🧗🚣🟡⚱🏀😋💯👊📔🔁🍻❔🛀🎶🩸🔺🚴🚙🕒🍍🤕🦌👌🌚🦵🚨🦕🍯🥚🔷⚙🛁🥉🎑🔃🥺💪🚡🛕🕘⛔🌳🔲🤒🌒🧛🪤🏸💡🧘🍖🦐🥍🍦🪀🐭📦☮🤨🧹🎉👞🏧🌶🃏🌏↪🐟⭐♟🧀🔻🥙🍱🐹🥼🅰🎻🎁🏁🔐🐇🌎◼🎙💆🚼⚕🦼👃⛹💗️🏋💻👜🚀💮💸📵🫕ô🏢♣🖋🥾🩹🔔🔄📅🗨🔴🪶🍡🍩🌜📩🤲🤠⛷◽🍳💊🗣👬🚢🍸☔🔽🕧✂🐯㊙🛩☃🔠🕡🧐⛈🧺🟤🤮⛰🚭➿🔢🏈🗓📂⏩🚻⛽💖🦣🐰🚃👣🧵🦯🌵✖🕵✏🔋🛖🐼👲🪕🐷🥬🚆💕🤸🥇🎧🧷🚽Ⓜ⏮🎆👄🙍🧜😿🏏🌤🌷⚡🦫⚧🐧📕🚅🎥🧡🪱😉🪲🦏🏴🤫🤣🧻🈴🚕❣🍕🔤🩱☎⚓🏘🔡🍲🕔💀☄🐚😥♋🔶🗼⚾📇🔭🩳🔖🚁↘🎋🔗⛑🚛🕦〰🦩🔕🤱🐎🗃👔🙇👏🌺🍁🐄☪🎂🛬🕟🩴🏎✔🫔🥨🏓💉👝🚇🐌🐿🪙🩰🚚✒⛱🚺😺😌📀😩💝🫓👵🫖♀🔂♑💨ã🍼😄🔉🦚🐊🕑👦🐥♻💾♏📜🥘🙃🖖✳😕🌙🌾😦🆔🇶🇨🇶🇩🇶🇦🇶🇹🇶🇺🇶🇪🇶🇬🇶🇰🇶🇽🇶🇭🇶🇲🇶🇷🇶🇴🇶🇿🇶🇳🇶🇸🇶🇮🇶🇻🇶🇧🇶🇼🇶🇱🇶🇯🇶🇫🇶🇵🇶🇾🇨🇶🇨🇩🇨🇦🇨🇹🇨🇺🇨🇪🇨🇬🇨🇰🇨🇽🇨🇭🇨🇲🇨🇷🇨🇴🇨🇿🇨🇳🇨🇸🇨🇮🇨🇻🇨🇧🇨🇼🇨🇱🇨🇯🇨🇫🇨🇵🇨🇾🇩🇶🇩🇨🇩🇦🇩🇹🇩🇺🇩🇪🇩🇬🇩🇰🇩🇽🇩🇭🇩🇲🇩🇷🇩🇴🇩🇿🇩🇳🇩🇸🇩🇮🇩🇻🇩🇧🇩🇼🇩🇱🇩🇯🇩🇫🇩🇵🇩🇾🇦🇶🇦🇨🇦🇩🇦🇹🇦🇺🇦🇪🇦🇬🇦🇰🇦🇽🇦🇭🇦🇲🇦🇷🇦🇴🇦🇿🇦🇳🇦🇸🇦🇮🇦🇻🇦🇧🇦🇼🇦🇱🇦🇯🇦🇫🇦🇵🇦🇾🇹🇶🇹🇨🇹🇩🇹🇦🇹🇺🇹🇪🇹🇬🇹🇰🇹🇽🇹🇭🇹🇲🇹🇷🇹🇴🇹🇿🇹🇳🇹🇸🇹🇮🇹🇻🇹🇧🇹🇼🇹🇱🇹🇯🇹🇫🇹🇵🇹🇾🇺🇶🇺🇨🇺🇩🇺🇦🇺🇹🇺🇪🇺🇬🇺🇰🇺🇽🇺🇭🇺🇲🇺🇷🇺🇴🇺🇿🇺🇳🇺🇸🇺🇮🇺🇻🇺🇧🇺🇼🇺🇱🇺🇯🇺🇫🇺🇵🇺🇾🇪🇶🇪🇨🇪🇩🇪🇦🇪🇹🇪🇺🇪🇬🇪🇰🇪🇽🇪🇭🇪🇲🇪🇷🇪🇴🇪🇿🇪🇳🇪🇸🇪🇮🇪🇻🇪🇧🇪🇼🇪🇱🇪🇯🇪🇫🇪🇵🇪🇾🇬🇶🇬🇨🇬🇩🇬🇦🇬🇹🇬🇺🇬🇪🇬🇰🇬🇽🇬🇭🇬🇲🇬🇷🇬🇴🇬🇿🇬🇳🇬🇸🇬🇮🇬🇻🇬🇧🇬🇼🇬🇱🇬🇯🇬🇫🇬🇵🇬🇾🇰🇶🇰🇨🇰🇩🇰🇦🇰🇹🇰🇺🇰🇪🇰🇬🇰🇽🇰🇭🇰🇲🇰🇷🇰🇴🇰🇿🇰🇳🇰🇸🇰🇮🇰🇻🇰🇧🇰🇼🇰🇱🇰🇯🇰🇫🇰🇵🇰🇾🇽🇶🇽🇨🇽🇩🇽🇦🇽🇹🇽🇺🇽🇪🇽🇬🇽🇰🇽🇭🇽🇲🇽🇷🇽🇴🇽🇿🇽🇳🇽🇸🇽🇮🇽🇻🇽🇧🇽🇼🇽🇱🇽🇯🇽🇫🇽🇵🇽🇾🇭🇶🇭🇨🇭🇩🇭🇦🇭🇹🇭🇺🇭🇪🇭🇬🇭🇰🇭🇽🇭🇲🇭🇷🇭🇴🇭🇿🇭🇳🇭🇸🇭🇮🇭🇻🇭🇧🇭🇼🇭🇱🇭🇯🇭🇫🇭🇵🇭🇾🇲🇶🇲🇨🇲🇩🇲🇦🇲🇹🇲🇺🇲🇪🇲🇬🇲🇰🇲🇽🇲🇭🇲🇷🇲🇴🇲🇿🇲🇳🇲🇸🇲🇮🇲🇻🇲🇧🇲🇼🇲🇱🇲🇯🇲🇫🇲🇵🇲🇾🇷🇶🇷🇨🇷🇩🇷🇦🇷🇹🇷🇺🇷🇪🇷🇬🇷🇰🇷🇽🇷🇭🇷🇲🇷🇴🇷🇿🇷🇳🇷🇸🇷🇮🇷🇻🇷🇧🇷🇼🇷🇱🇷🇯🇷🇫🇷🇵🇷🇾🇴🇶🇴🇨🇴🇩🇴🇦🇴🇹🇴🇺🇴🇪🇴🇬🇴🇰🇴🇽🇴🇭🇴🇲🇴🇷🇴🇿🇴🇳🇴🇸🇴🇮🇴🇻🇴🇧🇴🇼🇴🇱🇴🇯🇴🇫🇴🇵🇴🇾🇿🇶🇿🇨🇿🇩🇿🇦🇿🇹🇿🇺🇿🇪🇿🇬🇿🇰🇿🇽🇿🇭🇿🇲🇿🇷🇿🇴🇿🇳🇿🇸🇿🇮🇿🇻🇿🇧🇿🇼🇿🇱🇿🇯🇿🇫🇿🇵🇿🇾🇳🇶🇳🇨🇳🇩🇳🇦🇳🇹🇳🇺🇳🇪🇳🇬🇳🇰🇳🇽🇳🇭🇳🇲🇳🇷🇳🇴🇳🇿🇳🇸🇳🇮🇳🇻🇳🇧🇳🇼🇳🇱🇳🇯🇳🇫🇳🇵🇳🇾🇸🇶🇸🇨🇸🇩🇸🇦🇸🇹🇸🇺🇸🇪🇸🇬🇸🇰🇸🇽🇸🇭🇸🇲🇸🇷🇸🇴🇸🇿🇸🇳🇸🇮🇸🇻🇸🇧🇸🇼🇸🇱🇸🇯🇸🇫🇸🇵🇸🇾🇮🇶🇮🇨🇮🇩🇮🇦🇮🇹🇮🇺🇮🇪🇮🇬🇮🇰🇮🇽🇮🇭🇮🇲🇮🇷🇮🇴🇮🇿🇮🇳🇮🇸🇮🇻🇮🇧🇮🇼🇮🇱🇮🇯🇮🇫🇮🇵🇮🇾🇻🇶🇻🇨🇻🇩🇻🇦🇻🇹🇻🇺🇻🇪🇻🇬🇻🇰🇻🇽🇻🇭🇻🇲🇻🇷🇻🇴🇻🇿🇻🇳🇻🇸🇻🇮🇻🇧🇻🇼🇻🇱🇻🇯🇻🇫🇻🇵🇻🇾🇧🇶🇧🇨🇧🇩🇧🇦🇧🇹🇧🇺🇧🇪🇧🇬🇧🇰🇧🇽🇧🇭🇧🇲🇧🇷🇧🇴🇧🇿🇧🇳🇧🇸🇧🇮🇧🇻🇧🇼🇧🇱🇧🇯🇧🇫🇧🇵🇧🇾🇼🇶🇼🇨🇼🇩🇼🇦🇼🇹🇼🇺🇼🇪🇼🇬🇼🇰🇼🇽🇼🇭🇼🇲🇼🇷🇼🇴🇼🇿🇼🇳🇼🇸🇼🇮🇼🇻🇼🇧🇼🇱🇼🇯🇼🇫🇼🇵🇼🇾🇱🇶🇱🇨🇱🇩🇱🇦🇱🇹🇱🇺🇱🇪🇱🇬🇱🇰🇱🇽🇱🇭🇱🇲🇱🇷🇱🇴🇱🇿🇱🇳🇱🇸🇱🇮🇱🇻🇱🇧🇱🇼🇱🇯🇱🇫🇱🇵🇱🇾🇯🇶🇯🇨🇯🇩🇯🇦🇯🇹🇯🇺🇯🇪🇯🇬🇯🇰🇯🇽🇯🇭🇯🇲🇯🇷🇯🇴🇯🇿🇯🇳🇯🇸🇯🇮🇯🇻🇯🇧🇯🇼🇯🇱🇯🇫🇯🇵🇯🇾🇫🇶🇫🇨🇫🇩🇫🇦🇫🇹🇫🇺🇫🇪🇫🇬🇫🇰🇫🇽🇫🇭🇫🇲🇫🇷🇫🇴🇫🇿🇫🇳🇫🇸🇫🇮🇫🇻🇫🇧🇫🇼🇫🇱🇫🇯🇫🇵🇫🇾🇵🇶🇵🇨🇵🇩🇵🇦🇵🇹🇵🇺🇵🇪🇵🇬🇵🇰🇵🇽🇵🇭🇵🇲🇵🇷🇵🇴🇵🇿🇵🇳🇵🇸🇵🇮🇵🇻🇵🇧🇵🇼🇵🇱🇵🇯🇵🇫🇵🇾🇾🇶🇾🇨🇾🇩🇾🇦🇾🇹🇾🇺🇾🇪🇾🇬🇾🇰🇾🇽🇾🇭🇾🇲🇾🇷🇾🇴🇾🇿🇾🇳🇾🇸🇾🇮🇾🇻🇾🇧🇾🇼🇾🇱🇾🇯🇾🇫🇾🇵
This is the only version until I have free time and decide to add more features:
- Adding code that automatically scrapes the website instead of needing manual copy and paste.
- Creating an emoji dictionary.
- The approach to accounting for flag emojis results in non-existant letter permutations being added to the emoji list.
- Does not account for skin tone variations of emojis.
MIT License - see LICENSE.md.