-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to read a EBCDIC file with multiple columns #694
Comments
@yruslan can you please guide me on this. |
Hi, this is not directly supported, unfortunately. The only workaround I see right now is to have a REDEFINE for each use case. Which is messy... 01 CUSTOMER-RECORD.
05 SEGMENT-INDICATORS.
10 CUSTOMER-DETAILS-PRESENT PIC X(1).
10 ACCOUNT-INFORMATION-PRESENT PIC X(1).
10 TRANSACTION-HISTORY-PRESENT PIC X(1).
05 DETAILS100.
10 CUSTOMER-DETAILS.
15 CUSTOMER-ID PIC X(10).
05 DETAILS010 REDEFINES DETAILS100.
10 ACCOUNT-INFORMATION .
15 ACCOUNT-NUMBER PIC X(10).
05 DETAILS001 REDEFINES DETAILS100.
10 TRANSACTION-HISTORY.
15 TRANSACTION-ID PIC X(10).
05 DETAILS110 REDEFINES DETAILS100.
10 CUSTOMER-DETAILS.
15 CUSTOMER-ID PIC X(10).
10 ACCOUNT-INFORMATION .
15 ACCOUNT-NUMBER PIC X(10).
05 DETAILS011 REDEFINES DETAILS100.
10 ACCOUNT-INFORMATION .
15 ACCOUNT-NUMBER PIC X(10).
10 TRANSACTION-HISTORY.
15 TRANSACTION-ID PIC X(10).
05 DETAILS101 REDEFINES DETAILS100.
10 CUSTOMER-DETAILS.
15 CUSTOMER-ID PIC X(10).
10 TRANSACTION-HISTORY.
15 TRANSACTION-ID PIC X(10).
05 DETAILS111 REDEFINES DETAILS100.
10 CUSTOMER-DETAILS.
15 CUSTOMER-ID PIC X(10).
15 CUSTOMER-NAME PIC X(30).
10 ACCOUNT-INFORMATION .
15 ACCOUNT-NUMBER PIC X(10).
10 TRANSACTION-HISTORY.
15 TRANSACTION-ID PIC X(10). and then you can resolve fields based on indicators. |
hi @yruslan .Cobrix is not able to understand next record for permutation DETAILS100. I guess it wont undertstand next record if either of the record is missing and length get disturbed. Can you suggest |
Tried Getting error : Code : |
Yes, I see that there is an additional complication. Record size varies, and it depends on index fields. Currently, Cobrix only supports record length mapping only if the segment field is a single field. Since your index fields are close together you can combine them, as the workaround. See that I've removed 01 CUSTOMER-RECORD.
05 SEGMENT-ID PIC X(3).
05 SEGMENT-INDICATORS REDEFINES SEGMENT-ID.
10 CUSTOMER-DETAILS-PRESENT PIC X(1).
10 ACCOUNT-INFORMATION-PRESENT PIC X(1).
10 TRANSACTION-HISTORY-PRESENT PIC X(1).
05 DETAILS100.
10 CUSTOMER-DETAILS.
15 CUSTOMER-ID PIC X(10).
05 DETAILS010 REDEFINES DETAILS100.
10 ACCOUNT-INFORMATION .
15 ACCOUNT-NUMBER PIC X(10).
05 DETAILS001 REDEFINES DETAILS100.
10 TRANSACTION-HISTORY.
15 TRANSACTION-ID PIC X(10).
05 DETAILS110 REDEFINES DETAILS100.
10 CUSTOMER-DETAILS.
15 CUSTOMER-ID PIC X(10).
10 ACCOUNT-INFORMATION .
15 ACCOUNT-NUMBER PIC X(10).
05 DETAILS011 REDEFINES DETAILS100.
10 ACCOUNT-INFORMATION .
15 ACCOUNT-NUMBER PIC X(10).
10 TRANSACTION-HISTORY.
15 TRANSACTION-ID PIC X(10).
05 DETAILS101 REDEFINES DETAILS100.
10 CUSTOMER-DETAILS.
15 CUSTOMER-ID PIC X(10).
10 TRANSACTION-HISTORY.
15 TRANSACTION-ID PIC X(10).
05 DETAILS111 REDEFINES DETAILS100.
10 CUSTOMER-DETAILS.
15 CUSTOMER-ID PIC X(10).
15 CUSTOMER-NAME PIC X(30).
10 ACCOUNT-INFORMATION .
15 ACCOUNT-NUMBER PIC X(10).
10 TRANSACTION-HISTORY.
15 TRANSACTION-ID PIC X(10). Then, you can use segment id to size mapping. But you need to get the size info for each combination of indexes. .option("record_format", "F")
.option("record_length_field", "SEGMENT_ID")
.option("record_length_map", """{"001":50,"010":30,"100":20,"011":80,"110":50,"101":70,"111":100}""")
.option("segment_field", "SEGMENT-ID")
.option("redefine-segment-id-map:0", "DETAILS001 => 001")
.option("redefine-segment-id-map:1", "DETAILS010 => 010")
.option("redefine-segment-id-map:2", "DETAILS100 => 100")
.option("redefine-segment-id-map:3", "DETAILS011 => 011")
.option("redefine-segment-id-map:4", "DETAILS110 => 110")
.option("redefine-segment-id-map:5", "DETAILS101 => 101")
.option("redefine-segment-id-map:6", "DETAILS111 => 111") |
@yruslan |
The input data is binary like 100,001,111,101,100 |
Which version of Cobrix are you using? You can add
to ensure all passed options are recognized. |
@yruslan This the copy book:
read code: Getting error : IllegalArgumentException: Redundant or unrecognized option(s) to 'spark-cobol': record_length_map. |
Version :za.co.absa.cobrix:spark-cobol_2.12:2.6.9 |
@yruslan The above file I also tried to read using occurs but was getting syntax error. |
The
This confirms that you need to update to |
@yruslan I have updated the version but getting : |
@yruslan Hi , I had updated the version to za.co.absa.cobrix:spark-cobol_2.12:2.7.3 |
I had also tried occurs but that is giving me syntax error as :
|
The error message is due to padding of the copybook with spaces. Please, fix the padding by making sure the first 6 characters of each line are part of a comment, or use the these options to specify the padding for your copybook: https://github.com/AbsaOSS/cobrix?tab=readme-ov-file#copybook-parsing-options |
@yruslan regarding -->I have updated the version but getting : |
Please, post the full stack trace - it is hard to tell what causing the error. Also, make sure the segment value to record length map that you pass to {
"100": 20,
"101": 70,
"110": 50,
"111": 100,
"001": 50,
"010": 30,
"011": 80
} |
@yruslan Stack Trace : Driver stacktrace: |
Looks like some of your options might be incorrect. Use: .option("pedantic", "true") to reveal the incorrect option. Also, please share all options you are passing to |
@yruslan |
The options look good with the exception of the JSON you are passing to I can help you if you send the layout position table that is printed in the log when you use UPDATE. You can also try
instead of
|
This is also a possible solution. It is more elegant than what I proposed. I think if you just fix the padding of the copybook, it might work. (e.g. add 4 spaces to each of the line) |
@yruslan : I have fixed the padding still giving syntax error. |
I think you fixed only padding at the beginning of each line, but not at the end, your copybook should look like this (including spaces): 01 CUSTOMER-RECORD.
10 SEGMENT-INDICATORS.
15 CUSTOMER-DETAILS-PRESENT PIC 9(1).
15 ACCOUNT-INFORMATION-PRESENT PIC 9(1).
15 TRANSACTION-HISTORY-PRESENT PIC 9(1).
01 CUSTOMER-DETAILS-TAB.
10 CUST-TAB OCCURS 1 TO 2 TIMES
DEPENDING ON CUSTOMER-DETAILS-PRESENT.
15 CUSTOMER-ID PIC X(10).
15 CUSTOMER-NAME PIC X(30).
15 CUSTOMER-ADDRESS PIC X(50).
15 CUSTOMER-PHONE-NUMBER PIC X(15).
01 ACCOUNT-INFORMATION-TAB.
10 ACCT-INFO-TAB OCCURS 1 TO 2 TIMES
DEPENDING ON ACCOUNT-INFORMATION-PRESENT.
15 ACCOUNT-NUMBER PIC X(10).
15 ACCOUNT-TYPE PIC X(2).
15 ACCOUNT-BALANCE PIC X(12).
01 TRANSACTION-HISTORY-TAB.
10 TRANS-TAB OCCURS 1 TO 2 TIMES
DEPENDING ON TRANSACTION-HISTORY-PRESENT.
15 TRANSACTION-ID PIC X(10).
15 TRANSACTION-DATE PIC X(8).
15 TRANSACTION-AMOUNT PIC X(12).
15 TRANSACTION-TYPE PIC X(2). |
@yruslan I guess variable memory size is not supported at same location based on segment condition. |
Yes, this is not directly supported. However, I decided to check if I can make it with variable sized OCCURS, and I was able to do it: https://github.com/AbsaOSS/cobrix/pull/696/files Issues with your current copybook (beside padding):
I think if you fix these issues, the settings will work for you as well. |
Thanks mate it worked! Thanks for your time. |
Awesome! Glad it it finally worked for you |
Background [Optional]
I need to read a dat file which can have multiple variable length column.
01 CUSTOMER-RECORD.
05 SEGMENT-INDICATORS.
10 CUSTOMER-DETAILS-PRESENT PIC X(1).
10 ACCOUNT-INFORMATION-PRESENT PIC X(1).
10 TRANSACTION-HISTORY-PRESENT PIC X(1).
05 CUSTOMER-DETAILS.
10 CUSTOMER-ID PIC X(10).
10 CUSTOMER-NAME PIC X(30).
05 ACCOUNT-INFORMATION .
10 ACCOUNT-NUMBER PIC X(10).
05 TRANSACTION-HISTORY.
10 TRANSACTION-ID PIC X(10).
Question
Based on SEGMENT-INDICATORS we need to read file
ie if CUSTOMER-DETAILS-PRESENT is 1 then will have CUSTOMER-DETAILS
if ACCOUNT-INFORMATION-PRESENT is 1 then will have ACCOUNT-INFORMATION so on.
not able to read such file in pyspark using cobix.
The text was updated successfully, but these errors were encountered: