Add a fetcher that uses a real Chrome browser to download the html #237
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Adds a new Fetcher that uses a real Chrome browser to fetch the html. This solved a problem where I was unable to fetch a page that was partially generated by javascript using any of the existing fetchers. (I assume the page required a modern real browser for some reason I did not investigate further).
This change uses the cdt-java-client library found here to launch and communicate with a Chrome browser: https://github.com/kklisura/chrome-devtools-java-client
However due to a breaking change in Chrome that has not been fixed in this library I am using a fork with that one patch applied:
io.fluidsonic.mirror:cdt-java-client:4.0.0-fluidsonic-1
. Hopefully the change gets merged back into the main library.WIP warning: I figured I would publish this PR in its current state in case it helps anyone else. It does however not fullfil all the expectations of a fetcher. It does not return the correct http status etc, just the body. There is a Network class that can probably be used to extract those.