-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loss of precision when serializing Decimal String (DS) and Integer String (IS) #398
Comments
There's been discussion of this in the past. See: #53, #97, #99 Issues haven't seemed to have come up in practice, but if you find that these are real use cases where lost data has an impact then it would make sense to implement a lossless round trip strategy. A workable solution would be to save the string representation in a local spot like It would be interesting to see if pydicom or other tools handle this. |
I've done some testing with My understanding is it does this by maintaining each data element value in its original form as a I am interested in implementing some form of lossless round trip parsing in dcm-js. I am working on an anonymizer and ran into this issue in real files where precision from decimal strings was being lost (specifically numbers like Do you have any guidance on implementing this in such a way that would be accepted as a contribution? My thinking was to add read/write flags that would opt-out to any formatting applied past the initial serialization from the data view. I could add an BTW its a smaller issue but one other manifestation of this that I have seen is around trimming whitespace. It appears many VR's run either a |
One more use case I want to highlight that is similar but might warrant a separate issue/discussion. I followed the recent change to implement There is a similar "error" here where reading/writing a DICOM file will change an unknown/unimplemented VR into There are few VR's that I don't believe have been implemented yet ( So I am thinking about ways to preserve the |
I like the idea of being lossless wherever possible, and supporting all valid VRs. I don't think we'd need to track the modified state of any elements, just keep the Regarding whitespace I guess the same could be done. I think trailing spaces are only added when the value needs to be an even length, but again in the spirit of losslessness we could try to keep track of when the spaces are stripped and write out the original if trimming it results in the same string as the exposed value. In both these cases I'd try to avoid changing the current API behavior. Regarding |
The problem in my mind would be that due to the applied formatting when converting to So a file with value:
Would be read as:
|
What I meant was that if you run |
I see what you mean. The formatting that is applied during each value representation read would need to be extracted so it could be re-applied onto |
Exactly - ideally there would be a parsing function for each VR that would be called in both places (reading and before writing). |
When reading a DICOM file, data elements with VR
IS
orDS
are currently serialized by reading as ASCII Strings, applying custom formatting, and then converting toNumber
.Examples:
1.0000
drops padded zeroes and yields1
+1.234
drops the+
sign9007199254740993
is read as9007199254740992
due to floating point precision456
drops whitespace to456
My read from the spec is that these are all considered valid values:
This leads to fundamental changes in the data element value by simply reading and then writing a DICOM file.
Example
When reading in a file containing the data elements:
A read & write of it using dcm-js with no other modifications yields:
Assuming that these are in fact valid values according to the spec, why does the library try to apply custom formatting to the value like dropping whitespace and the
+
sign, or bother converting to aNumber
which is subject to Floating Point rounding issues? Why not leave as a String and let clients decide when/how to parse as a number?The text was updated successfully, but these errors were encountered: