You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are several issues with this response, but the main one and the one causing trouble is that the line endings are just LF, not CRLF. This causes warcat verify to crash with the following traceback:
Traceback (most recent call last):
File "/usr/lib/python3.4/runpy.py", line 170, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.4/runpy.py", line 85, in _run_code
exec(code, run_globals)
File ".../lib/python3.4/site-packages/warcat/__main__.py", line 154, in <module>
main()
File ".../lib/python3.4/site-packages/warcat/__main__.py", line 70, in main
command_info[1](args)
File ".../lib/python3.4/site-packages/warcat/__main__.py", line 136, in verify_command
tool.process()
File ".../lib/python3.4/site-packages/warcat/tool.py", line 95, in process
check_block_length=self.check_block_length)
File ".../lib/python3.4/site-packages/warcat/model/warc.py", line 75, in read_record
check_block_length=check_block_length)
File ".../lib/python3.4/site-packages/warcat/model/record.py", line 68, in load
content_type)
File ".../lib/python3.4/site-packages/warcat/model/block.py", line 21, in load
field_cls=HTTPHeader)
File ".../lib/python3.4/site-packages/warcat/model/block.py", line 92, in load
fields = field_cls.parse(file_obj.read(field_length).decode())
File ".../lib/python3.4/site-packages/warcat/model/field.py", line 215, in parse
http_headers.status, s = s.split(newline, 1)
ValueError: need more than 1 value to unpack
The text was updated successfully, but these errors were encountered:
I'm getting a similar but slightly different error
$ python3 -m warcat extract 195.242.99.71-8181-2016-03-23-3324e7c6-00000.warc.gz --output-dir output2 --progress --keep-going
47900 | = | Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/username/.local/lib/python3.8/site-packages/warcat/__main__.py", line 154, in <module>
main()
File "/home/username/.local/lib/python3.8/site-packages/warcat/__main__.py", line 70, in main
command_info[1](args)
File "/home/username/.local/lib/python3.8/site-packages/warcat/__main__.py", line 131, in extract_command
tool.process()
File "/home/username/.local/lib/python3.8/site-packages/warcat/tool.py", line 93, in process
record, has_more = model.WARC.read_record(f,
File "/home/username/.local/lib/python3.8/site-packages/warcat/model/warc.py", line 74, in read_record
record = Record.load(file_object, preserve_block=preserve_block,
File "/home/username/.local/lib/python3.8/site-packages/warcat/model/record.py", line 67, in load
record.content_block = ContentBlock.load(file_obj, block_length,
File "/home/username/.local/lib/python3.8/site-packages/warcat/model/block.py", line 20, in load
return BlockWithPayload.load(file_obj, length,
File "/home/username/.local/lib/python3.8/site-packages/warcat/model/block.py", line 94, in load
fields = field_cls.parse(field_str)
File "/home/username/.local/lib/python3.8/site-packages/warcat/model/field.py", line 215, in parse
http_headers.status, s = s.split(newline, 1)
ValueError: not enough values to unpack (expected 2, got 1)
Any possible workarounds for this or any way to remove the bad data from the file so the rest can be unpacked? I tried the --keep-going option but it doesn't seem to help
I have a WARC which contains an HTTP response whose headers are malformed. Specifically, it's from http://www.assoc-amazon.com/s/link-enhancer?tag=discount039-20&o=1 and this is the data returned:
More precisely, in Python repr notation:
There are several issues with this response, but the main one and the one causing trouble is that the line endings are just LF, not CRLF. This causes warcat verify to crash with the following traceback:
The text was updated successfully, but these errors were encountered: