In this project, I worked with a small corpus consisting of simple sentences. I tokenized the words using n-grams from the NLTK library and performed word-level and character-level one-hot encoding. Additionally, I utilized the Keras Tokenizer to tokenize the sentences and implemented word embedding using the Embedding layer. For sentiment analysis, I used the IMDB dataset, which is a popular dataset for binary sentiment classification. To enhance the model's performance, you leveraged the GloVe 6B with 100D pre-trained word embeddings, which provided valuable semantic information to the words in the corpus. Overall, My project focused on text data preprocessing, tokenization, and leveraging pre-trained word embeddings for sentiment analysis. It demonstrated how these techniques can be applied to text data to build effective machine learning models for classification tasks like sentiment analysis.
-
Notifications
You must be signed in to change notification settings - Fork 0
In this project, I worked with a small corpus consisting of simple sentences. I tokenized the words using n-grams from the NLTK library and performed word-level and character-level one-hot encoding. Additionally, I utilized the Keras Tokenizer to tokenize the sentences and implemented word embedding using the Embedding layer. For sentiment analysis
MUHAMMADAKMAL137/IMDB-Dataset-Classification-using-Pre-trained-Word-Embedding-with-GloVec-6B
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
In this project, I worked with a small corpus consisting of simple sentences. I tokenized the words using n-grams from the NLTK library and performed word-level and character-level one-hot encoding. Additionally, I utilized the Keras Tokenizer to tokenize the sentences and implemented word embedding using the Embedding layer. For sentiment analysis
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published