Skip to content

Generating rap lyrics, inspired by DopeLearning

License

Notifications You must be signed in to change notification settings

frankzl/deep-rap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

80 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

deep-rap

The dream: generating rap lyrics

How to read:

We have written jupyter notebooks for every step that we took. It shows our incremental approach to the problem. The notebooks can be read like chapters and at the end of each chapter, we describe our learnings and what we should improve on.

Generally, the next notebook will try to fix those problems shown in the previous and apply the learnings.

Table of Contents

  1. Generating Letter by Letter
  2. Introducing the Tools
  3. Generating Word by Word
  4. Applying Multiple Layers
  5. Embedding the Words
  6. Cleaning the Rap
  7. Working on Clean Rap
  8. Building Embeddings
  9. Glovely Embedded
  10. Glove Applied
  11. Glove and Phonem

Content

00 - Generating Letter by Letter

To start our project, we predict letter by letter. Given a sequence of text, we try to infer the next letter, e.g.

  1. "Hello my name is "   => "A"
  2. "ello my name is A"   => "l"
  3. "llo my name is Al"   => "e"
  4. "lo my name is Ale"   => "x"

We can repeat these steps to generate an arbitrary number of text. Of course we have trained the model before on a rap text by 2pac.

Accuracy: 49.46 %

Seed:
as real as it seems

Sampling:
as real as it seems willed to dene
cares and beave
you kep soun
the doudn't wen the care done

We are doing a multiclass classification, where we try to classify the next letter given the predecessors. In the notebook, we used a single LSTM cell followed by a linear output layer.


01 - Introducing the Tools

Here we describe how we outsource some of the functions to separate modules, in order to reuse them for future notebooks. We introduce a train_model function that can be used to train any Trainable object.

If you want to understand more about how we set up our toolchain and outsourced the functions, please check out the notebook. It might be a bit hard to understand at first though.

It is also crucial to understand the sample function, as that describes how we can sample from a Trainable object.


02 - Generating Word by Word

In chapter 00, we were generating letter by letter. This is a complex task as it has a lot of room for errors. One single spelling mistake can influence the whole rest of the sentence. We therefore want to generate word by word instead.

Basically, the approach is the same as in 00, but this time we have a different Dictionary. Instead of using the Alphabet, we use the Vocabulary instead. (Please checkout the module tools.processing to understand more about this)


03 - Applying Multiple Layers

The maximum accuracy from chapter 02 was xxx. By stacking LSTM cells, we increase the complexity of our model, so that it can solve more complex problems.

While this approach seems to yield relatively high accuracy, the samples are still not very useful.

Seed:
killin people left and right
use a gun cool homie
that is right

Sampling:
killin people left and right
use a gun cool homie
that is right so i bust it alone is know think i got up y'all mine
out this thing a clip for can on your got up


04 - Embedding the Words

We now use an embedding lookup for our data. Instead of feeding in 1-hot-encoded words, we feed in the indices for the words and perform an embedding lookup on those words in an embedding matrix that we can also learn.

Architecture:

sequence of words  =>  sequence of indices  =>  sequence of vectors in embedding space  =>  Single Layer RNN  =>  outputs 1-Hot-Vector

Seed:
while i go down the street
while i go down the street
you was lookin' at me
is this even good or is it just bad
is this even good or is it mad

Sampling:
while i go down the street
while i go down the street
you was lookin' at me
is this even good or is it just bad
is this even good or is it mad
your brother and your trifeass wife wants to do me
on a mountain and still couldn't top me


04.1 - Bonus Book - Trying a different text

Here we tried setting a very low time step for our RNN. It is our goal to feed a short sentence and generate the next rap line from that. Therefore setting the time step to 6, allows us to infer the next word from the last 6 words.

This turns out to be pretty bad despite high accuracy.

Also we tried out a different rapper here (Rakim)


05 - Cleaning the Rap

To be honest, this was the worst part of the whole project. It involved:

  1. building a new dictionary
  2. expanding contractions
  3. removing extra letters like tooooools => tools
  4. numbers to words: 1 => one
  5. correcting spelling mistakes, e.g. somthin => something
  6. rebuilding the dictionary with newly found words

There are 4 notebooks on this, order of creation:

  1. 05-text-cleanup
  2. 05-text-cleanup-rakim
  3. 05-text-cleanup-kidcudi

I recommend checking out 3. (I think 1. is a bit messy tbh)

Seriously, this was the worst part


06 - Working on Clean Rap

Here we use a Embedded Single Layer RNN from chapter 04 on our freshly cleaned data. The result is pretty good!

(Note that we replaced \n with ; and our words are only one space apart. In the following we replaced ; with \n for readability)

Seed:
when i was thirteen
i had my first love
there was nobody that could put hands on my baby
and nobody came between us that could ever come above

Sampling:
when i was thirteen
i had my first love
there was nobody that could put hands on my baby
and nobody came between us that could ever come above
but now i am guilty of falling for his girlfriend
i know it is wrong but it is not a cop damn
i am trapped in the hell
one if you will find out my homies
it is the nigga that you the nigga


07 - Building Embeddings

We train our own embedding on a collection of rap texts. For this we use the model word2vec to produce our very own embedding.

Didn't work out that well though, since we did not have enough data.


08 - Glovely Embedded

Contains some minor corrections of the cleaned up text (repeated character problem). Using the Glove embedding, we can translate almost all of our used words into a feature space of 300 dimensions.

A portion of the unknowns can be corrected using our tools.spell_correction module. This module was created after chapter 05.

The rest of the words that are still not recognized by Glove, are mapped to the mean of the embedding vectors.


09 - Glove Applied

We are using the GloVe embedding, which is trained with 840 billion words with a dimension of 300. In order to use it properly, we mapped our existing words of our three combined rap lyrics with GloVe. Also, we combined all our findings and written code into several libraries and classes to be able to build and train our model efficiently.

Remark: Bonus book with less data available at 09.1-(small)glove-applied


10 - Create Phonetic Dictionary

We are using the CMU pronouncing dictionary which is provided with the nltk-toolkit. And the LOGIOS lexicon tool (http://www.speech.cs.cmu.edu/tools/lextool.html). The CMU provides a dictionary (a so called arpabet) in order to extract phonetic information of a word. Furthermore, for words which the arpabet doesn't know, the LOGIOS tool is using a generative process to extract the needed phonetic information.

At last we are creating a textfile which is essentially the combined rap text but encoded in phonetics.


10.1 - Phonetics2Vec

We want to apply the method of word2vec onto the phonetics in order to get an embedding which aids our text generation process. The reason is, that in rap lyrics a predicted word depends not only on the context but especially on previous rhyming words.

The same algorithm we have used to train our first word2vec we are using to train the phone2vec model. In the last step the cosine distance is used to determine similar phonetics to a given phonetic.


11 - Work In Progress: Glove and Phonetics applied

We want to use the added information of phonetics of a word within a sentence to get more accurate rhyme words. As described in chapter 10.1, a word within rap lyrics has to depend on the context plus its rhyming predecessors. We want to use the extra information from the phonems to punish "not rhyming" words.

Unfortunately, we were not able to finish this last step, because of errors we were unable to resolve.


Inspired by

Eric Malmi et al., DopeLearning: A Computational Approach to Rap Lyrics Generation

About

Generating rap lyrics, inspired by DopeLearning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published