Source text doesn't equal text after encoding and decoding #9

mikevolgo · 2024-04-24T10:56:16Z

Hi,

I've found a situation when after encoding and decoding text is not equal to source text. Example

text = 'ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑÜ§¿abcdefghijklmnopqrstuvwxyzäöñüà'
print(codec.decode(codec.encode(text)) == text)
False
print(codec.decode(codec.encode(text)))
ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑÜ§¿abcdefghijklmnopqrstuvwxyzäöñüà@

we can see that after encoding/decoding an extra symbol "@" is added.

sacha-senchuk · 2024-04-24T11:09:15Z

Hi,

Thanks for the report.

Do you already have an idea why this might happen?

sacha-senchuk · 2024-05-10T16:29:03Z

It seems like you have been using the GSM encoding.

There is a caveat that requires padding in certain situations:

https://github.com/qotto/smspdudecoder/blob/master/smspdudecoder/codecs.py#L87

In your case, you should consider using the following code:

from smspdudecoder.codecs import GSM

text = 'ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑÜ§¿abcdefghijklmnopqrstuvwxyzäöñüà'

assert GSM.decode(GSM.encode(text, with_padding=True), strip_padding=True) == text

I probably need to create a new version of the package where padding is enabled by default, to be in-line with the GSM specifications:

sacha-senchuk · 2025-01-21T10:51:36Z

Hello, a quick update here.

This will be taken care of in the upcoming v3 of the library.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Source text doesn't equal text after encoding and decoding #9

Source text doesn't equal text after encoding and decoding #9

mikevolgo commented Apr 24, 2024

sacha-senchuk commented Apr 24, 2024

sacha-senchuk commented May 10, 2024 •

edited

Loading

sacha-senchuk commented Jan 21, 2025

Source text doesn't equal text after encoding and decoding #9

Source text doesn't equal text after encoding and decoding #9

Comments

mikevolgo commented Apr 24, 2024

sacha-senchuk commented Apr 24, 2024

sacha-senchuk commented May 10, 2024 • edited Loading

sacha-senchuk commented Jan 21, 2025

sacha-senchuk commented May 10, 2024 •

edited

Loading