Bug fixes & improve duplicate checks #342

normalizedwater546 · 2024-09-22T01:57:47Z

When using --meta --cbz with a duplicated cbz file will result in this error.
- This was caused by the downloader being skipped, but not the rest of the process.

File "\?\A:\nhentai.venv\Scripts\nhentai-script.py", line 33, in <module>
    sys.exit(load_entry_point('nhentai==0.5.7', 'console_scripts', 'nhentai')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "A:\nhentai.venv\Lib\site-packages\nhentai-0.5.7-py3.12.egg\nhentai\command.py", line 94, in main
    generate_metadata_file(options.output_dir, table, doujinshi)
  File "A:\nhentai.venv\Lib\site-packages\nhentai-0.5.7-py3.12.egg\nhentai\utils.py", line 313, in generate_metadata_file
    f = open(os.path.join(doujinshi_dir, 'info.txt'), 'w', encoding='utf-8')

Fixes pdf generation reading non-image files (info.txt, ComicInfo.xml)
- This assumes the only image types are png, jpg, jpeg, and gif from nhentai.

Traceback (most recent call last):
  File "A:\nhentai.venv\Lib\site-packages\img2pdf.py", line 1817, in read_images
    imgdata = Image.open(im)
              ^^^^^^^^^^^^^^
  File "A:\nhentai.venv\Lib\site-packages\PIL\Image.py", line 3498, in open
    raise UnidentifiedImageError(msg)
PIL.UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x0000026F22120720>


Traceback (most recent call last):
  File "\\?\A:\nhentai.venv\Scripts\nhentai-script.py", line 33, in <module>
    sys.exit(load_entry_point('nhentai==0.5.7', 'console_scripts', 'nhentai')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "A:\nhentai.venv\Lib\site-packages\nhentai-0.5.7-py3.12.egg\nhentai\command.py", line 114, in main
    generate_pdf(options.output_dir, doujinshi, options.rm_origin_dir, options.move_to_folder)
  File "A:\nhentai.venv\Lib\site-packages\nhentai-0.5.7-py3.12.egg\nhentai\utils.py", line 243, in generate_pdf
    pdf_f.write(img2pdf.convert(full_path_list, rotation=img2pdf.Rotation.ifvalid))
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "A:\nhentai.venv\Lib\site-packages\img2pdf.py", line 2733, in convert
    ) in read_images(
         ^^^^^^^^^^^^
  File "A:\nhentai.venv\Lib\site-packages\img2pdf.py", line 1829, in read_images
    raise ImageOpenError(
img2pdf.ImageOpenError: cannot read input image (not jpeg2000). PIL: error reading image: cannot identify image file <_io.BytesIO object at 0x0000026F22120720>

When a .cbz file is already generated, duplicate downloads will be ignored even when the flags are different (i.e. --pdf instead of --cbz).
- This was caused by a hard-coded .cbz file extension check.
De-duped file exists checks into helper method for more consistent behavior.
- Logger message is done in parent function rather than helper to maintain logger context.
Removed warning when folder already exists.
- Nothing is wrong with this. Proceed silently.
Optimized path call in generating filename path at 1 and 2
- os.path.join(doujinshi_dir, '..') was essentially equivalent to output_dir

Not sure what the expected behavior should be here. I assumed this should continue with the rest of the process.

Nothing is wrong with the folder already existing -- silently ignore and move on. Might still have other files inside that haven't been downloaded yet.

If you wanted to generate both .cbz and .pdf, the .pdf will be skipped if .cbz was generated first.

RicterZ · 2024-09-22T03:42:16Z

nhentai/downloader.py

@@ -138,7 +139,8 @@ def start_download(self, queue, folder='', regenerate_cbz=False):
            logger.warning(f'Path "{folder}" already exist.')

        if os.getenv('DEBUG', None) == 'NODOWNLOAD':
-            return
+            # Assuming we want to continue with rest of process?


In some cases we don't need to download files, use export DEBUG=NODOWNLOAD to skip rest process. In other cases we can use export DEBUG=1 to enable debug mode and continue download process

This should be fine then. Returning True will proceed with everything else in main, as it did previously.

RicterZ · 2024-09-22T04:38:18Z

Some problems still exists:

If --pdf --cbz option specified, only generate cbz file.
If a cbz file exists, will ignore --pdf option.
start_download method is coupled with too many options such as regenerate_cbz, file_type, maybe optimization is needed

improve #342

normalizedwater546 added 6 commits September 22, 2024 00:43

fix process continuing despite cbz download request skipped

12364e9

refactor: de-dupe doujinshi_obj parsers

4bfe104

fix: remove warning for folder already exists in downloader

497eb6f

Nothing is wrong with the folder already existing -- silently ignore and move on. Might still have other files inside that haven't been downloaded yet.

fix: add file_type check to downloader

5a29eaf

If you wanted to generate both .cbz and .pdf, the .pdf will be skipped if .cbz was generated first.

fix: check if metadata file is downloaded before skipping

a05a308

fix: non-image files in pdf conversion causing crash

7fa9193

normalizedwater546 changed the title ~~Improve duplicate checks~~ Bug fixes & improve duplicate checks Sep 22, 2024

RicterZ reviewed Sep 22, 2024

View reviewed changes

RicterZ approved these changes Sep 22, 2024

View reviewed changes

RicterZ added a commit that referenced this pull request Sep 22, 2024

generate html viewer automatically after download #342

16bac45

RicterZ merged commit 7fa9193 into RicterZ:master Sep 22, 2024

RicterZ added a commit that referenced this pull request Sep 22, 2024

improve #342

cbf9448

RicterZ added a commit that referenced this pull request Sep 22, 2024

Merge pull request #343 from RicterZ/pull-342

a8a48c6

improve #342

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug fixes & improve duplicate checks #342

Bug fixes & improve duplicate checks #342

normalizedwater546 commented Sep 22, 2024 •

edited

Loading

RicterZ Sep 22, 2024

normalizedwater546 Sep 22, 2024

RicterZ commented Sep 22, 2024

Bug fixes & improve duplicate checks #342

Bug fixes & improve duplicate checks #342

Conversation

normalizedwater546 commented Sep 22, 2024 • edited Loading

RicterZ Sep 22, 2024

Choose a reason for hiding this comment

normalizedwater546 Sep 22, 2024

Choose a reason for hiding this comment

RicterZ commented Sep 22, 2024

normalizedwater546 commented Sep 22, 2024 •

edited

Loading