You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Due to our model being memory intensive (transformer, n^2 in sequence length), we limit all training data at a sequence length and batch_size. (filter here)
Currently, if memory serves me right, out of ~4000 videos in the dicta_sign dataset, the model trains on ~2500 because of the limit of 100 frames max. (more frames, we get an out-of-memory error. perhaps related to more than just the transformer)
The ideal backbone for the pose encoding, in my opinion, is an S4 model, while the text (usually a lot shorter) could still use a transformer.
We should experiment how to increase the input size for the model.
The text was updated successfully, but these errors were encountered:
Due to our model being memory intensive (transformer,
n^2
in sequence length), we limit all training data at a sequence length andbatch_size
. (filter here)Currently, if memory serves me right, out of ~4000 videos in the
dicta_sign
dataset, the model trains on ~2500 because of the limit of 100 frames max. (more frames, we get an out-of-memory error. perhaps related to more than just the transformer)The ideal backbone for the pose encoding, in my opinion, is an S4 model, while the text (usually a lot shorter) could still use a transformer.
We should experiment how to increase the input size for the model.
The text was updated successfully, but these errors were encountered: