Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to Generate Music using a LSTM Neural Network in Keras #1

Closed
dev-onejun opened this issue Feb 4, 2023 · 11 comments · Fixed by #2
Closed

How to Generate Music using a LSTM Neural Network in Keras #1

dev-onejun opened this issue Feb 4, 2023 · 11 comments · Fixed by #2
Assignees
Labels
bug Something isn't working enhancement New feature or request note Organizing the concept wontfix This will not be worked on

Comments

@dev-onejun
Copy link
Owner

This article is introduced on the codecrafters-io/build-your-own-x

@dev-onejun dev-onejun added the enhancement New feature or request label Feb 4, 2023
@dev-onejun dev-onejun self-assigned this Feb 4, 2023
@dev-onejun dev-onejun moved this from Todo to In Progress in Jan~Mar, 2023. Self-scheduled Feb 10, 2023
@dev-onejun
Copy link
Owner Author

dev-onejun commented Feb 10, 2023

Summary

Repository


using music21 to deal with music data and from the standard of Piano ..

  • Pitch: 음계 (원래는 주파수와 함께 옥타브의 정보도 갖고 있는듯)
  • Octave: 옥타브
  • Offset: 악보에서 음표의 위치 및 길이 (박자로 이해하면 될 듯)
    • 이 tutorial에서는 offset을 모두 0.5로 보고 무시한다는데,,

공식 문서를 참고하는 게 더 정확하겠다.

  • 참고로 Object Note에는 Object Duration도 있는데, 이것이 음표(또는 쉼표)의 길이를 나타낸다.

    • whole, half, quarter, ... 등으로 초기화 가능하지만, float type의 숫자로도 가능한데 1은 quarter를 나타낸다.
      • ex. duration.Duration(1.5) => 점4분음표
  • Pitch: 음계. ex) G3 is the lowest frequency of the pitch that the violin can sound

  • Chord: 화음. the list of the pitches

  • Note: 음표.

    • 다시 말해, 옥타브, 길이(박자) 등의 정보를 다 가지고 있어야 함.

@dev-onejun
Copy link
Owner Author

dev-onejun commented Feb 10, 2023

Preparing the Data

from music21 import converter, instrument, note, chord

notes = []

for file in glob.glob("midi_songs/*.mid"): # glob()를 통해 * (wildcard)로 파일을 읽어들인다.
    midi = converter.parse(file)
    parts = instrument.partitionByInstrument(midi) # 1.

    notes_to_parse = None
    if parts: # file has instrument parts
        notes_to_parse = parts.parts[0].recurse()
    else: # file has notes in a flat structure
        notes_to_parse = midi.flat.notes

    for element in notes_to_parse:
        if isinstance(element, note.Note):
            notes.append(str(element.pitch))
        elif isinstance(element, chord.Chord):
            notes.append('.'.join(str(n) for n in element.normalOrder))
  1. instrument.partitionByInstrument()
  • I wonder that 0 index of every partitions is piano ..?
  • Plus, if it is a flat structure then the instrument is piano always?
    • The result of running print(parts.show('text')) solved my question. It seems that the author selects the music only using the piano or which the 0 index of music partition is piano or preprocess the data before.

@dev-onejun dev-onejun added the note Organizing the concept label Feb 10, 2023
@dev-onejun
Copy link
Owner Author

dev-onejun commented Feb 19, 2023

Preprocessing the data

sequence_length = 100

# get all pitch names
pitchnames = sorted(set(item for item in notes)) # 중복 제거 후, 정렬된 리스트로 변환 -> 0.

# create a dictionary to map pitches to integers -> 1.
note_to_int = dict((note, number) for number, note in enumerate(pitchnames))

# create input sequences and the corresponding outputs
network_input = []
network_output = []
for i in range(0, len(notes) - sequence_length, 1):
    sequence_in = notes[i: i + sequence_length] # ex. 0~99까지 (100개) 음표를 저장
    sequence_out = notes[i + sequence_length] # ex. 100의 음표를 저장

    network_input.append([note_to_int[char] for char in sequence_in]) # 위에서 저장한 음표를 string -> int로 변환
    network_output.append(note_to_int[sequence_out]) # 위와 동일

# reshape the input into a format compatible with LSTM layers -> 2.
n_patterns = len(network_input)
network_input = numpy.reshape(network_input, (n_patterns, sequence_length, 1))

# normalize input
n_vocab = len(note_to_int)
network_input = network_input / float(n_vocab)
network_output = np_utils.to_categorical(network_output) # 3.
  1. 그런데 pitchnames 출력해보면, A1 등 pitches뿐만 아니라 0.1.5와 같은 숫자 베이스도 있음.
    화음이라는 생각이 드는데, dict()를 통해 music21의 인코딩 방식과 관련해 조금 더 찾아봐야 할 듯
  • 아래에 decode 하는 부분을 참고해보면, 데이터를 모을 때 화음의 인코딩을 숫자로 한 것으로 보임.
  1. 본문 중

First, we will create a mapping function to map from string-based categorical data to integer-based numerical data. This is done because neural network perform much better with integer-based numerical data than string-based categorical data.

enumerate()을 통해, 위에서 정렬한 순서(인덱스) 그대로 매핑하고 있음. 다시 말해 정말로 의미있는 데이터 값을 지니진 않는 것으로 보임.

  1. network_* 변수에 append를 통해 1차원 배열 형태로 값을 저장하기도 했고, numpy 형태로 변환하기 위해 사용한 것으로 보임.
    변환한 형태는 위의 for문을 통해 자른 것과 동일한 효과를 지님. (이럴거면 for문 내에서 잘라도 되지 않았나 하는 생각도 든다 ..)
    하지만 2차원 형태가 아닌 3차원 형태로 재배열했기에 이 또한 좋아보인다.

  2. keras.utils.np_utils.to_categorical() official docs
    multiclass-classification에서 어떤 label인지 표시하는 것처럼 나옴

>>> np_utils.to_categorical([0,1])
array([[1., 0.],
       [0., 1.]], dtype=float32)

소결론

이러한 형태로 전처리한 데이터를 One-Hot Encode Data라고 한다.
그런데 위에서 데이터를 준비할 때 모든 곡들을 한 번에 이어붙였는데 괜찮나? 아니면 글의 마지막에 있는 예시곡을 들었을 때 곡의 분위기가 한 곡에서도 여러번 바뀌었던 것으로 보아 이것의 영향이 있었던 것 같다.

@dev-onejun
Copy link
Owner Author

Building a model

model = Sequential()

# Input Layer
model.add(LSTM(
    256,
    input_shape=(network_input.shape[1], network_input.shape[2]), # 몇개의 행렬로 나누었는지 불러옴. (sequence_length, 1)
    return_sequences=True
))
model.add(Dropout(0.3))

# Hidden Layers
model.add(LSTM(512, return_sequences=True))
model.add(Dropout(0.3))

model.add(LSTM(256))
model.add(Dense(256))
model.add(Dropout(0.3))

# Output Layer
model.add(Dense(n_vocab))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy', optimizer='rmsprop')

@dev-onejun
Copy link
Owner Author

dev-onejun commented Feb 19, 2023

Fitting the model

filepath = "weights-improvement-{epoch:02d}-{loss:.4f}-bigger.hdf5"

# 1.
checkpoint = ModelCheckpoint(
    filepath, monitor='loss', 
    verbose=0,        
    save_best_only=True,        
    mode='min'
)
callbacks_list = [checkpoint]

model.fit(network_input, network_output, epochs=200, batch_size=64, callbacks=callbacks_list)
  1. ModelCheckpoint를 위와 같이 사용해서 200번의 propagation 중 가장 마음에 드는(loss가 작은) weight의 모델을 저장할 수 있다.

@dev-onejun
Copy link
Owner Author

dev-onejun commented Feb 19, 2023

Generating music with the model

# make random seed
start = numpy.random.randint(0, len(network_input)-1)
pattern = network_input[start] # same as the variable sequence_length


int_to_note = dict((number, note) for number, note in enumerate(pitchnames))

# generate 500 notes
prediction_output = []
for note_index in range(500):
    # 0.
    prediction_input = numpy.reshape(pattern, (1, len(pattern), 1))
    prediction_input = prediction_input / float(n_vocab)
    prediction = model.predict(prediction_input, verbose=0)

    # 1.
    index = numpy.argmax(prediction)
    result = int_to_note[index]
    prediction_output.append(result)

    # 2.
    pattern.append(index)
    pattern = pattern[1:len(pattern)]
  1. LSTM model requires the sequential data to predict

  2. to determine the highest possibility pitch, numpy.argmax() is used

  3. discard the first note and append the predicted note to preserve the variable pattern having 100 (sequence_length) notes only.

@dev-onejun
Copy link
Owner Author

dev-onejun commented Feb 19, 2023

Decode to music

offset = 0
music = []

# create note and chord objects based on the values generated by the model
for pitches in prediction_output:
    # pitches is a chord
    if ('.' in pitches) or pitches.isdigit():
        notes_in_chord = pitches.split('.')

        notes = []
        for current_note in notes_in_chord:
            new_note = note.Note(int(current_note))
            new_note.storedInstrument = instrument.Piano()
            notes.append(new_note)

        new_chord = chord.Chord(notes)
        new_chord.offset = offset
        music.append(new_chord)
    # pattern is a note
    else:
        new_note = note.Note(pitches)
        new_note.offset = offset
        new_note.storedInstrument = instrument.Piano()
        music.append(new_note)

    offset += 0.5

Convert to midi file

midi_stream = stream.Stream(music)
midi_stream.write('midi', fp='test_output.mid')

@dev-onejun
Copy link
Owner Author

Author Suggested

  1. various length of notes (or rests) with more classes and deeper LSTM network.
  2. make distinct the start and end of the music (classify the each musics)
  3. about unknown notes.
  4. adding more instruments.

My thought

  1. what about making the length of sequence as bar in music sheet (not 100).

dev-onejun added a commit that referenced this issue Feb 19, 2023
@dev-onejun dev-onejun linked a pull request Feb 19, 2023 that will close this issue
dev-onejun added a commit that referenced this issue Feb 19, 2023
@github-project-automation github-project-automation bot moved this from In Progress to Done-Issue in Jan~Mar, 2023. Self-scheduled Feb 19, 2023
@dev-onejun
Copy link
Owner Author

The generated midi file just plays the same chords continuously.
In my opinion, iteration in generate_music() which generates the variable prediction_output should fix somewhere.
But, for now, I won't fix it ...

@dev-onejun dev-onejun added bug Something isn't working wontfix This will not be worked on labels Feb 19, 2023
@dev-onejun
Copy link
Owner Author

dev-onejun commented Mar 3, 2023

  • maybe it is related with model.load_weights()

@skrinsky
Copy link

skrinsky commented Jan 9, 2024

did you ever fix the problem with prediction_output? I have not been able to solve this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request note Organizing the concept wontfix This will not be worked on
Projects
Development

Successfully merging a pull request may close this issue.

2 participants