-
-
Notifications
You must be signed in to change notification settings - Fork 10
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
21 changed files
with
4,604 additions
and
43 deletions.
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,7 @@ | ||
[workspace] | ||
|
||
resolver = "2" | ||
members = [ | ||
"gulagcleaner", | ||
"gulagcleaner_rs", | ||
"gulagcleaner_python", | ||
"gulagcleaner_wasm" | ||
] |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
|
||
# Gulag Cleaner | ||
|
||
|
||
[](https://twitter.com/gulagcleaner) | ||
[](https://www.instagram.com/gulagcleaner/) | ||
[](https://ko-fi.com/L3L86VEX9) | ||
|
||
|
||
Gulag Cleaner is a tool designed to remove advertisements from PDFs, making it easier to read and navigate documents without being disrupted by unwanted ads. | ||
|
||
This tool does not just crop the ads out of the PDF, instead, we extract the original file without ads by manipulating the internal structure of the PDF, ensuring maximum quality. | ||
|
||
In addition to removing advertisements, Gulag Cleaner is also capable of extracting metadata, such as the author, subject, university, and more, from the file. | ||
|
||
# Web Version | ||
|
||
This tool can be used without installation directly from [our website](https://gulagcleaner.com) (in Spanish). | ||
|
||
[](https://gulagcleaner.com) | ||
|
||
# Installation | ||
|
||
To install Gulag Cleaner, please [download](https://www.python.org/downloads/) and [install](https://wiki.python.org/moin/BeginnersGuide/Download) Python and then run the following command in your terminal: | ||
``` | ||
pip install gulagcleaner | ||
``` | ||
|
||
# Usage | ||
|
||
Gulag Cleaner can be used through both a Command Line Interface (CLI) and in your code. | ||
|
||
## Command Line Interface | ||
|
||
To use Gulag Cleaner through the CLI, simply run the following command, replacing `<filename>` with the name of one or more PDF files or folders containing PDF: | ||
|
||
``` | ||
gulagcleaner [-r] [-s] [-h] [-v] <filename>... | ||
``` | ||
|
||
## Options | ||
|
||
Gulag Cleaner provides several options for its usage: | ||
|
||
> * '-r': Replace the original file with the cleaned version. | ||
> * '-s': Do not show metadata about cleaned files. | ||
> * '-h': Display the help message, providing information on how to use Gulag Cleaner. | ||
> * '-v': Display the current version of Gulag Cleaner. | ||
## Code | ||
|
||
To use Gulag Cleaner in your code, you can use the following code snippet: | ||
|
||
```python | ||
from gulagcleaner.extract import clean_pdf | ||
|
||
return_msg = clean_pdf("file.pdf") | ||
``` | ||
|
||
# License | ||
Gulag Cleaner is distributed under the GPL-3 license, which means it's open-source and free to use. | ||
|
||
# Contributing | ||
We're always looking for ways to improve Gulag Cleaner, and we welcome contributions from the community. If you have ideas for improvements or bug fixes, please feel free to submit a pull request. | ||
|
||
## TODO | ||
If you want to help, these are the top priorities right now: | ||
|
||
* Revamp the argument parsing. We should use some parsing library to allow for short "-v" and long "--version" arguments. Idealy it should support parameters for each argument. | ||
|
||
* Add the "Naive" cleaning method. This method is just a fallback that crops the Ads by zooming in and moving the MediaBox. This is not ideal, but there will always be edge cases not covered in the other methods and doing this better than giving an error. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,11 +8,11 @@ version = "0.0.1" | |
description = "Ad removal tool for PDFs." | ||
authors = [ | ||
{name = "YM162", email = "[email protected]"}] | ||
readme = "../README.md" | ||
readme = "README.md" | ||
dependencies = [ | ||
"pikepdf>=5.1.2","pdfminer.six>=20220524" | ||
] | ||
license = {file = "../LICENSE"} | ||
license = {file = "LICENSE"} | ||
classifiers = ["Programming Language :: Python :: 3", | ||
"License :: OSI Approved :: MIT License", | ||
"Operating System :: OS Independent"] | ||
|
Binary file modified
BIN
+1021 Bytes
(380%)
gulagcleaner_python/python/gulagcleaner/__pycache__/clean.cpython-38.pyc
Binary file not shown.
Binary file modified
BIN
+5 Bytes
(100%)
gulagcleaner_python/python/gulagcleaner/__pycache__/command_line.cpython-38.pyc
Binary file not shown.
Binary file modified
BIN
+0 Bytes
(100%)
gulagcleaner_python/python/gulagcleaner/__pycache__/decrypt.cpython-38.pyc
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,14 +1,30 @@ | ||
from ._lib import clean_pdf # export public parts of the binary extension | ||
|
||
#Here there should only be a function clean_pdf(pdf_path, output_path, force_naive) | ||
#that calls the rust function and then saves the pdf in the given output_path. | ||
#It should return a dictionary with the following keys: | ||
# Returns: | ||
# return_msg (dict): A dictionary with the following keys: | ||
# success (bool): Indicates whether the de-embedding process was successful. | ||
# return_path (str): The path to the de-embedded file if successful. | ||
# error (str): An error description if the process was unsuccessful. | ||
# """ | ||
|
||
def clean_pdf(pdf_path, output_path, force_naive): | ||
return clean_pdf(10,6) | ||
def clean_pdf_path(pdf_path, output_path, force_naive): | ||
""" | ||
Cleans the ads from the PDF file in a given path and saves it in another path. | ||
Args: | ||
pdf_path (str): The path to the pdf file. | ||
output_path (str): The path to save the cleaned pdf file. | ||
force_naive (bool): Whether to force the naive cleaning method. | ||
Returns: | ||
return_msg (dict): A dictionary with the following keys: | ||
success (bool): Indicates whether the de-embedding process was successful. | ||
return_path (str): The path to the cleaned file if successful. | ||
method (str): The method used to clean the file. | ||
error (str): An error description if the process was unsuccessful. | ||
""" | ||
try: | ||
with open(pdf_path, "rb") as f: | ||
pdf = f.read() | ||
cleaned_pdf = clean_pdf(pdf, force_naive) | ||
with open(output_path, "wb") as f: | ||
method = cleaned_pdf[len(cleaned_pdf)-1] | ||
cleaned_pdf = cleaned_pdf[0:len(cleaned_pdf)-1] | ||
f.write(bytes(cleaned_pdf)) | ||
return {"success": True, | ||
"return_path": output_path, | ||
"method": method, | ||
"error": ""} | ||
except Exception as e: | ||
return {"success": False, "return_path": "","method":"", "error": str(e)} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
[package] | ||
name = "gulagcleaner" | ||
name = "gulagcleaner_rs" | ||
version = "0.10.0" | ||
edition = "2021" | ||
authors = ["YM162 <[email protected]>"] | ||
|
@@ -13,8 +13,7 @@ keywords = ["wuolah", "pdf", "ads", "advertisments", "cleaner", "gulagcleaner"] | |
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html | ||
|
||
[lib] | ||
name = "gulagcleaner" | ||
crate-type = ["lib"] | ||
name = "gulagcleaner_rs" | ||
|
||
[dependencies] | ||
flate2 = "1.0.27" | ||
|
Oops, something went wrong.