Skip to content

Commit

Permalink
Improve error handling for Pagefind HTML parsing
Browse files Browse the repository at this point in the history
  • Loading branch information
bglw committed Sep 12, 2022
1 parent b148b68 commit b0ba453
Show file tree
Hide file tree
Showing 3 changed files with 53 additions and 1 deletion.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@

## Unreleased

* Pagefind now gracefully skips pages that fail HTML parsing, and provides more context when these errors are hit.

## v0.8.0 (August 23, 2022)

### Important Changes
Expand Down
38 changes: 38 additions & 0 deletions pagefind/features/errors.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
Feature: Graceful Pagefind Errors
Background:
Given I have the environment variables:
| PAGEFIND_SOURCE | public |
Given I have a "public/index.html" file with the body:
"""
<p data-url>Nothing</p>
"""

Scenario: Pagefind gracefully skips pages with parsing ambiguities
Given I have a "public/cat/index.html" file with the body:
"""
<h1>hello world</h1>
"""
Given I have a "public/dog/index.html" file with the body:
"""
<h1>hello world</h1>
<select><xmp><script>"use strict";</script></select>
"""
When I run my program
Then I should see "Running Pagefind" in stdout
Then I should see "Failed to parse file public/dog/index.html" in stdout
Then I should see the file "public/_pagefind/pagefind.js"
When I serve the "public" directory
When I load "/"
When I evaluate:
"""
async function() {
let pagefind = await import("/_pagefind/pagefind.js");
let search = await pagefind.search("world");
let results = await Promise.all(search.results.map(r => r.data()));
document.querySelector('[data-url]').innerText = results.map(r => r.url).sort().join(', ');
}
"""
Then There should be no logs
Then The selector "[data-url]" should contain "/cat/"
14 changes: 13 additions & 1 deletion pagefind/src/fossick/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,11 @@ impl Fossicker {
break;
}
if let Err(error) = rewriter.write(&buf[..read]) {
panic!("HTML parse encountered an error: {:#?}", error);
println!(
"Failed to parse file {} — skipping this file. Error:\n{error}",
self.file_path.to_str().unwrap_or("[unknown file]")
);
return Ok(());
}
}

Expand All @@ -75,6 +79,10 @@ impl Fossicker {

fn parse_digest(&mut self) -> (String, HashMap<String, Vec<u32>>) {
let mut map: HashMap<String, Vec<u32>> = HashMap::new();
// TODO: push this error handling up a level and return an Err from parse_digest
if self.data.as_ref().is_none() {
return ("".into(), map); // empty page result, will be dropped from search
}
let data = self.data.as_ref().unwrap();
let stemmer = get_stemmer(&data.language);

Expand Down Expand Up @@ -144,6 +152,10 @@ impl Fossicker {

let (content, word_data) = self.parse_digest();

if self.data.is_none() {
return Err(());
}

let data = self.data.unwrap();
let url = build_url(&self.file_path, options);

Expand Down

0 comments on commit b0ba453

Please sign in to comment.