-
-
Notifications
You must be signed in to change notification settings - Fork 10
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #18 from carlosiborra/main
Refactor PDF Cleaning Tests for Improved Modularity and Error Handling + Add Rust Distribution README
- Loading branch information
Showing
3 changed files
with
156 additions
and
35 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
# Gulag Cleaner Rust Distribution | ||
|
||
## Setting Up Rust | ||
|
||
To incorporate Rust components within Gulag Cleaner, ensure Rust is correctly installed on your system. Follow the installation guide on the [official Rust website](https://www.rust-lang.org/tools/install) for detailed instructions. This includes installing `rustup`, which is the Rust toolchain manager, and the Rust compiler (`rustc`). | ||
|
||
## Running Rust Tests | ||
|
||
Gulag Cleaner leverages Rust for certain operations, providing performance and safety benefits. To ensure these components work as expected, comprehensive tests are included. | ||
|
||
To run the tests: | ||
|
||
1. Open a terminal. | ||
2. Navigate to the root directory of Gulag Cleaner. | ||
3. Execute the following command to run all tests: | ||
|
||
```bash | ||
cargo test | ||
``` | ||
|
||
For more detailed test outputs, including print statements from your tests, use: | ||
|
||
```bash | ||
cargo test --package gulagcleaner_rs --lib -- tests --nocapture | ||
``` | ||
|
||
This command targets the specific Rust package (`gulagcleaner_rs`) and enables detailed outputs with `--nocapture`. | ||
|
||
Note: at the moment this test only include the reading, cleaning and writing of 2 example PDFs for Wuolah and Studocs. | ||
|
||
## Rust Development Guidelines | ||
|
||
To contribute to the Rust portion of Gulag Cleaner, please adhere to the following guidelines: | ||
|
||
- **Code Clarity**: Write clear, readable code with meaningful variable names and concise functions. | ||
- **Comments and Documentation**: Add comments explaining complex logic or important decisions. Update the `README.md` with relevant examples and instructions when adding new features or making significant changes. | ||
- **Performance**: Optimize for efficiency. Rust is known for its performance, so ensure your contributions enhance or maintain the current speed and memory usage. | ||
- **Testing**: Write tests for new features or bug fixes if possible. Ensure existing tests pass without modifications unless the changes are intended to update the test behavior. | ||
|
||
## TODO for Rust | ||
|
||
If you're looking to contribute, here are some areas that need attention: | ||
|
||
- **Writing Tests**: Our test coverage could be improved. Writing additional unit and integration tests for the Rust code is a priority. | ||
- **Documentation**: A detailed README.md needs to be added, including setup instructions, examples of usage, and a description of the functions available. | ||
- **Code Optimization**: There's always room for performance improvements. Profiling and optimizing existing Rust code can significantly impact overall tool performance. | ||
|
||
## Contributing | ||
|
||
Contributions to the Rust codebase of Gulag Cleaner are highly encouraged. Whether you're fixing bugs, optimizing performance, or adding new features, your input is valued. Follow the project's contribution guidelines and submit pull requests with your changes. | ||
|
||
## License | ||
|
||
Gulag Cleaner is distributed under the GPL-3 license, which means it's open-source and free to use. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,36 +1,98 @@ | ||
use crate::clean::clean_pdf; | ||
use std::fs; | ||
use std::time::Instant; | ||
|
||
const OUT_PATH: &str = "example_docs/out"; | ||
|
||
/// Creates out folder if missing so tests won't fail | ||
fn create_out_folder() { | ||
fs::create_dir_all(OUT_PATH).unwrap(); | ||
/// Represents configuration for running a test, including the paths for input and output files. | ||
struct TestConfig { | ||
input_path: &'static str, | ||
output_filename: &'static str, | ||
} | ||
|
||
#[test] | ||
fn test_wuolah() { | ||
create_out_folder(); | ||
/// Ensures the output directory exists, creating it if necessary. | ||
/// This function is invoked before running tests to ensure a location | ||
/// is available for storing cleaned PDFs. | ||
fn create_output_directory() { | ||
fs::create_dir_all(OUT_PATH).expect("Failed to create output directory"); | ||
} | ||
|
||
//Load some pdf bytes and clean it | ||
let data = std::fs::read("example_docs/wuolah-free-example.pdf").expect( | ||
"Missing Wuolah test PDF, please store one in path `example_docs/wuolah-free-example.pdf", | ||
); | ||
let (clean_pdf, _) = clean_pdf(data, false); | ||
/// Reads a PDF from the specified path, cleans it, and returns the cleaned PDF data. | ||
/// | ||
/// # Arguments | ||
/// | ||
/// * `in_path` - A string slice that holds the path to the input PDF file. | ||
/// | ||
/// # Returns | ||
/// | ||
/// A `Result` which is `Ok` with a `Vec<u8>` containing the cleaned PDF data, | ||
/// or an `Err` with a string describing the error. | ||
fn read_and_clean_pdf(in_path: &str) -> Result<Vec<u8>, String> { | ||
let data = | ||
std::fs::read(in_path).map_err(|e| format!("Failed to read `{}`: {}", in_path, e))?; | ||
let (clean_file, _) = clean_pdf(data, false); | ||
Ok(clean_file) | ||
} | ||
|
||
//Stores the clean pdf in the out directory | ||
std::fs::write(format!("{}/wuolah_clean.pdf", OUT_PATH), clean_pdf).unwrap(); | ||
/// Writes the cleaned PDF data to a file in the output directory. | ||
/// | ||
/// # Arguments | ||
/// | ||
/// * `out_path` - The path where the cleaned PDF will be stored. | ||
/// * `clean_file` - A vector of bytes representing the cleaned PDF data. | ||
/// | ||
/// # Returns | ||
/// | ||
/// A `Result` which is `Ok` if the file was successfully written, or an `Err` | ||
/// with a string describing the error. | ||
fn store_pdf(out_path: &str, clean_file: Vec<u8>) -> Result<(), String> { | ||
std::fs::write(out_path, clean_file) | ||
.map_err(|e| format!("Failed to write `{}`: {}", out_path, e)) | ||
} | ||
#[test] | ||
|
||
fn test_studocu() { | ||
create_out_folder(); | ||
/// Executes a cleaning test using the provided `TestConfig`. | ||
/// | ||
/// This function orchestrates the test process: creating the output directory, | ||
/// cleaning the PDF specified in the `TestConfig`, and storing the cleaned PDF | ||
/// in the output directory. It also measures and prints the duration of the test. | ||
/// | ||
/// # Arguments | ||
/// | ||
/// * `test_config` - A reference to the `TestConfig` containing the test parameters. | ||
fn run_test_for_config(test_config: &TestConfig) { | ||
create_output_directory(); | ||
|
||
let start = Instant::now(); | ||
|
||
let clean_file = read_and_clean_pdf(test_config.input_path).expect("Failed to clean PDF"); | ||
store_pdf( | ||
&format!("{}/{}", OUT_PATH, test_config.output_filename), | ||
clean_file, | ||
) | ||
.expect("Failed to store PDF"); | ||
|
||
let duration = start.elapsed(); | ||
|
||
//Stores the clean pdf in the out directory | ||
let data = std::fs::read("example_docs/studocu-example.pdf").expect( | ||
"Missing Studocu test PDF, please store one in path `example_docs/studocu-example.pdf", | ||
println!( | ||
"Test for `{}` completed in {:?}", | ||
test_config.input_path, duration | ||
); | ||
let (clean_pdf, _) = clean_pdf(data, false); | ||
//Print the length of the pdf | ||
std::fs::write(format!("{}/studocu_clean.pdf", OUT_PATH), clean_pdf).unwrap(); | ||
} | ||
|
||
// Define tests for specific PDF files, utilizing the TestConfig structure. | ||
|
||
#[test] | ||
fn test_wuolah_pdf() { | ||
run_test_for_config(&TestConfig { | ||
input_path: "example_docs/wuolah-free-example.pdf", | ||
output_filename: "wuolah_clean.pdf", | ||
}); | ||
} | ||
|
||
#[test] | ||
fn test_studocu_pdf() { | ||
run_test_for_config(&TestConfig { | ||
input_path: "example_docs/studocu-example.pdf", | ||
output_filename: "studocu_clean.pdf", | ||
}); | ||
} |