Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fixes & improve duplicate checks #342

Merged
merged 6 commits into from
Sep 22, 2024

Conversation

normalizedwater546
Copy link
Contributor

@normalizedwater546 normalizedwater546 commented Sep 22, 2024

File "\?\A:\nhentai.venv\Scripts\nhentai-script.py", line 33, in <module>
    sys.exit(load_entry_point('nhentai==0.5.7', 'console_scripts', 'nhentai')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "A:\nhentai.venv\Lib\site-packages\nhentai-0.5.7-py3.12.egg\nhentai\command.py", line 94, in main
    generate_metadata_file(options.output_dir, table, doujinshi)
  File "A:\nhentai.venv\Lib\site-packages\nhentai-0.5.7-py3.12.egg\nhentai\utils.py", line 313, in generate_metadata_file
    f = open(os.path.join(doujinshi_dir, 'info.txt'), 'w', encoding='utf-8')
  • Fixes pdf generation reading non-image files (info.txt, ComicInfo.xml)
    • This assumes the only image types are png, jpg, jpeg, and gif from nhentai.
Traceback (most recent call last):
  File "A:\nhentai.venv\Lib\site-packages\img2pdf.py", line 1817, in read_images
    imgdata = Image.open(im)
              ^^^^^^^^^^^^^^
  File "A:\nhentai.venv\Lib\site-packages\PIL\Image.py", line 3498, in open
    raise UnidentifiedImageError(msg)
PIL.UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x0000026F22120720>


Traceback (most recent call last):
  File "\\?\A:\nhentai.venv\Scripts\nhentai-script.py", line 33, in <module>
    sys.exit(load_entry_point('nhentai==0.5.7', 'console_scripts', 'nhentai')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "A:\nhentai.venv\Lib\site-packages\nhentai-0.5.7-py3.12.egg\nhentai\command.py", line 114, in main
    generate_pdf(options.output_dir, doujinshi, options.rm_origin_dir, options.move_to_folder)
  File "A:\nhentai.venv\Lib\site-packages\nhentai-0.5.7-py3.12.egg\nhentai\utils.py", line 243, in generate_pdf
    pdf_f.write(img2pdf.convert(full_path_list, rotation=img2pdf.Rotation.ifvalid))
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "A:\nhentai.venv\Lib\site-packages\img2pdf.py", line 2733, in convert
    ) in read_images(
         ^^^^^^^^^^^^
  File "A:\nhentai.venv\Lib\site-packages\img2pdf.py", line 1829, in read_images
    raise ImageOpenError(
img2pdf.ImageOpenError: cannot read input image (not jpeg2000). PIL: error reading image: cannot identify image file <_io.BytesIO object at 0x0000026F22120720>
  • When a .cbz file is already generated, duplicate downloads will be ignored even when the flags are different (i.e. --pdf instead of --cbz).
  • De-duped file exists checks into helper method for more consistent behavior.
    • Logger message is done in parent function rather than helper to maintain logger context.
  • Removed warning when folder already exists.
    • Nothing is wrong with this. Proceed silently.
  • Optimized path call in generating filename path at 1 and 2
    • os.path.join(doujinshi_dir, '..') was essentially equivalent to output_dir

Not sure what the expected behavior should be here. I assumed this should continue with the rest of the process.

Nothing is wrong with the folder already existing -- silently ignore and move on. Might still have other files inside that haven't been downloaded yet.
If you wanted to generate both .cbz and .pdf, the .pdf will be skipped if .cbz was generated first.
@normalizedwater546 normalizedwater546 changed the title Improve duplicate checks Bug fixes & improve duplicate checks Sep 22, 2024
@@ -138,7 +139,8 @@ def start_download(self, queue, folder='', regenerate_cbz=False):
logger.warning(f'Path "{folder}" already exist.')

if os.getenv('DEBUG', None) == 'NODOWNLOAD':
return
# Assuming we want to continue with rest of process?
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some cases we don't need to download files, use export DEBUG=NODOWNLOAD to skip rest process. In other cases we can use export DEBUG=1 to enable debug mode and continue download process

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fine then. Returning True will proceed with everything else in main, as it did previously.

@RicterZ RicterZ merged commit 7fa9193 into RicterZ:master Sep 22, 2024
@RicterZ
Copy link
Owner

RicterZ commented Sep 22, 2024

Some problems still exists:

  1. If --pdf --cbz option specified, only generate cbz file.
  2. If a cbz file exists, will ignore --pdf option.
  3. start_download method is coupled with too many options such as regenerate_cbz, file_type, maybe optimization is needed

RicterZ added a commit that referenced this pull request Sep 22, 2024
RicterZ added a commit that referenced this pull request Sep 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants