-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated Automatic Speech Recognition using CTC example for Keras v3 #1768
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
examples/audio/ctc_asr.py
Outdated
@@ -244,16 +249,74 @@ def encode_single_sample(wav_file, label): | |||
""" | |||
|
|||
|
|||
# Reference: https://github.com/keras-team/keras/blob/ec67b760ba25e1ccc392d288f7d8c6e9e153eea2/keras/legacy/backend.py#L674-L711 | |||
def ctc_label_dense_to_sparse(labels, label_lengths): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than rewriting this code, you can just use the built-in Keras 3 loss function keras.losses.CTC
. I expect it will also enable the code example to run with all backends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback 👍
After removing the legacy code we still have some references to tf
in the example and I'm not sure this can be made backend-agnostic.
Please let me know if I should substitute the remaining tf
references.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you! You can add the generated files.
examples/audio/ctc_asr.py
Outdated
@@ -320,7 +307,7 @@ def build_model(input_dim, output_dim, rnn_layers=5, rnn_units=128): | |||
# Optimizer | |||
opt = keras.optimizers.Adam(learning_rate=1e-4) | |||
# Compile the model and return | |||
model.compile(optimizer=opt, loss=CTCLoss) | |||
model.compile(optimizer=opt, loss=keras.losses.ctc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer using CTC()
(ends up running the same thing but it's more idiomatic)
input_length = tf.cast(input_length, tf.int32) | ||
|
||
if greedy: | ||
(decoded, log_prob) = tf.nn.ctc_greedy_decoder( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, we're going to have to use TF for this and ctc_beam_search_decoder
I guess, unless we implement them as new backend ops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, thanks for the feedback 👍
I created an issue to address this.
Please let me know if I should change the description or add/remove details.
Thanks!
This PR is stale because it has been open for 14 days with no activity. It will be closed if no further activity occurs. Thank you. |
This PR was closed because it has been inactive for 28 days. Please reopen if you'd like to work on this further. |
Updates the "Automatic Speech Recognition using CTC" example to support Keras v3.