Add a fetcher that uses a real Chrome browser to download the html #237

johanoskarsson · 2024-03-05T17:02:53Z

Adds a new Fetcher that uses a real Chrome browser to fetch the html. This solved a problem where I was unable to fetch a page that was partially generated by javascript using any of the existing fetchers. (I assume the page required a modern real browser for some reason I did not investigate further).

This change uses the cdt-java-client library found here to launch and communicate with a Chrome browser: https://github.com/kklisura/chrome-devtools-java-client
However due to a breaking change in Chrome that has not been fixed in this library I am using a fork with that one patch applied: io.fluidsonic.mirror:cdt-java-client:4.0.0-fluidsonic-1. Hopefully the change gets merged back into the main library.

WIP warning: I figured I would publish this PR in its current state in case it helps anyone else. It does however not fullfil all the expectations of a fetcher. It does not return the correct http status etc, just the body. There is a Network class that can probably be used to extract those.

codecov · 2024-03-05T17:09:47Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.56%. Comparing base (382f21b) to head (475065d).

Additional details and impacted files

@@           Coverage Diff           @@
##           master     #237   +/-   ##
=======================================
  Coverage   89.56%   89.56%           
=======================================
  Files          38       38           
  Lines         986      986           
  Branches       69       69           
=======================================
  Hits          883      883           
  Misses         81       81           
  Partials       22       22

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Add a fetcher that uses a real Chrome browser to download the html.

475065d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a fetcher that uses a real Chrome browser to download the html #237

Add a fetcher that uses a real Chrome browser to download the html #237

johanoskarsson commented Mar 5, 2024

codecov bot commented Mar 5, 2024

Add a fetcher that uses a real Chrome browser to download the html #237

Are you sure you want to change the base?

Add a fetcher that uses a real Chrome browser to download the html #237

Conversation

johanoskarsson commented Mar 5, 2024

codecov bot commented Mar 5, 2024

Codecov Report