-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCTETS support for database connection #12
Comments
To bypass encoding entirely, Firebird supports NONE as a connection charset (rather than OCTETS). Specifying OCTETS at the connection level and then skipping it when passing to the engine (which actually means NONE charset is used) seems weird to me. |
When passing 'NONE' as a connection charset, fdb uses as a python codec the one returned by getpreferredencoding() function (see charset_map definition). This depends on current system configuration and is never 'latin-1' (which maps all bytes 0-255 to the same characters, identity encoding). This in turn produces decoding errors when reading fields, which have string encoded in forms, which getpreferredencoding() can't decode. |
If this is possible to do (and I don't know if it is), I would strongly recommend against using the name I would rather recommend using a separate property to enable this behaviour. For example, in Jaybird, there are two properties controlling character sets, one using Firebird names and one using Java character set names. When only one is specified, either will configure the connection character set, but specifying both properties then uses one to specify the Firebird connection encoding, and the other will define the Java character set to read/write bytes from/to Firebird. |
This issue is specific to fdb python interface. There is no standard manner of OCTETS use in fdb currently as using it in |
The correct method (patch) should use the None/"NONE" connection charset in FDB with its redefinition in ibase.py/charset_map from 'getpreferredencoding()' to 'latin-1', instead hijacking OCTETS. |
I totally agree. However, such a change would be backwards incompatible and could break existing programs, so I searched for alternatives. |
Well, I was not aware of 'latin-1' meaning at the time I was creating FDB, so using getpreferredencoding() was the best from bad solutions I was able to came with. So |
@savingorup This problem is wider than FDB, because NONE has a special meaning in Firebird, and OCTETS cannot be specified as a connection character set in general. Allowing FDB to overload this, makes the behaviour more surprising to people familiar with Firebird on other platforms. @pcisar Redefining NONE to always map to latin-1 would break expectations of Firebird users that using connection character set none will use the default encoding of a system. Doesn't Python have an option to convert invalid byte combinations to either a Making this change will also break things for users unknowingly relying on the current encoding behaviour, receiving garbage output (or sending garbage as input to the database). Consider carefully if you really want to make this change. I'd sooner recommend defining a special character set name that maps to NONE + latin-1. |
I agree with Mark here. Changing existing behaviour is bad for users. I
proposed 'OCTETS' as it was already there, but maybe defining a new
character set is better. However, this new character set will not be
supported anywhere except in fdb and will again be a bit surprising to
people familiar with Firebird on other platforms
Maybe even better, I propose new parameter python_encoding for
fdb.connect() function, such as:
db = fdb.connect(user='SYSDBA', password='secret',
dsn='127.0.0.1:/tmp/test.fdb',
charset='NONE', python_encoding='latin-1')
Another complication here is that actual encoding/decoding functions are
called only in case of python3, not when using python2. This should be made
clear to users, maybe using a bit different parameter name:
db = fdb.connect(user='SYSDBA', password='secret',
dsn='127.0.0.1:/tmp/test.fdb',
charset='NONE', python3_encoding='latin-1')
As I understand, something like this is also implemented in Jaybird.
Such a change is not trivial anymore, but I can work on it. What do you
think?
…On Fri, Jan 24, 2020 at 1:40 PM Mark Rotteveel ***@***.***> wrote:
@savingorup <https://github.com/savingorup> This problem is wider than
FDB, because NONE has a special meaning in Firebird, and OCTETS cannot be
specified as a connection character set in general. Allowing FDB to
overload this, makes the behaviour more surprising to people familiar with
Firebird on other platforms.
@pcisar <https://github.com/pcisar> Redefining NONE to always map to
latin-1 would break expectations of Firebird users that using connection
character set none will use the default encoding of a system. Doesn't
Python have an option to convert invalid byte combinations to either a ?
or the unicode replacement character (�, U+FFFD)?
Making this change will also break things for users unknowingly relying on
the current encoding behaviour, receiving garbage output (or sending
garbage as input to the database). Consider carefully if you really want to
make this change. I'd sooner recommend defining a special character set
name that maps to NONE + latin-1.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#12?email_source=notifications&email_token=AGCMHSRJMJQLRPSTW2VRZ2DQ7LOTNA5CNFSM4KKMY3M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJ2VKXI#issuecomment-578114909>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGCMHSUX3HDTYXFJHES3DQLQ7LOTNANCNFSM4KKMY3MQ>
.
|
@mrotteveel I will change the mapping for 'NONE' charset, not for None value (when connection charset is not specified at all). This should not break any application that does not use connection charset, and will fix the expected behavior of NONE for applications that explicitly define 'NONE'. @savingorup Automatic conversion is not dependent on Python version, but actual type passed. It's applied for P3.str or P2.unicode, and do not for P3.bytes or P2.str. |
When connecting with fdb + python3 to legacy databases with mixed/non-proper encodings, you receive a lot of encoding/decoding errors.
If there is no option to actually correct this databases, Firebird offers OCTETS encoding to bypass encodings entierly.
However, calling fdb.connect(..., charset='OCTETS') produces multiple errors. In fact, this never worked as intended.
The first fix is to change in ibase.py None->'latin-1'
charset_map = {
'OCTETS' : 'latin-1', # Allow to pass through unchanged.
'latin-1' is identity encoding in python (as 'octets' is in Firebird).
The second fix is in fbcore.py in build_dpb():
...
if charset and charset!='OCTETS': # added ... and charset!='OCTETS'
dpb.add_string_parameter(isc_dpb_lc_ctype, charset.upper())
...
Firebird does not allow to pass OCTETS ad LC_TYPE parameter, so we skip it.
With this two changes, I can pass charset='OCTETS' and do selects/inserts, including BLOBS.
If you are interested to merge, I can provide patch against latest version. I am not used to GitHub, so I will probably need a little help with it.
The text was updated successfully, but these errors were encountered: