Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
Qianlitp authored Jul 27, 2021
1 parent 2b942e9 commit 4a0948c
Showing 1 changed file with 45 additions and 35 deletions.
80 changes: 45 additions & 35 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,37 +85,6 @@ When the output mode is set to `json`, the returned result, after JSON deseriali
* `sub_domain_list`:List of subdomains found.



## Parameters

* **`--chromium-path Path, -c Path`** The path to the chrome executable. (**Required**)
* **`--custom-headers Headers`** Customize the HTTP header. Please pass in the data after JSON serialization, this is globally defined and will be used for all requests. (**Default: null**)
* **`--post-data PostData, -d PostData`** POST data. (**Default: null**)
* **`--max-crawled-count Number, -m Number`** The maximum number of tasks for crawlers to avoid long crawling time due to pseudo-static. (**Default: 200**)
* **`--filter-mode Mode, -f Mode`** Filtering mode, `simple`: only static resources and duplicate requests are filtered. `smart`: with the ability to filter pseudo-static. `strict`: stricter pseudo-static filtering rules. (**Default: smart**)
* **`--output-mode value, -o value`** Result output mode, `console`: print the glorified results directly to the screen. `json`: print the json serialized string of all results. `none`: don't print the output. (**Default: console**)
* **`--output-json filepath`** Write the result to the specified file after JSON serializing it. (**Default: null**)
* **`--incognito-context, -i`** Browser start incognito mode. (**Default: true**)
* **`--max-tab-count Number, -t Number`** The maximum number of tabs the crawler can open at the same time. (**Default: 8**)
* **`--fuzz-path`** Use the built-in dictionary for path fuzzing. (**Default: false**)
* **`--fuzz-path-dict`** Customize the Fuzz path by passing in a dictionary file path, e.g. /home/user/fuzz_dir.txt, each line of the file represents a path to be fuzzed. (**Default: null**)
* **`--robots-path`** Resolve the path from the /robots.txt file. (**Default: false**)
* **`--request-proxy proxyAddress`** **socks5** proxy address, all network requests from crawlergo and chrome browser are sent through the proxy. (**Default: null**)
* **`--tab-run-timeout Timeout`** Maximum runtime for a single tab page. (**Default: 20s**)
* **`--wait-dom-content-loaded-timeout Timeout`** The maximum timeout to wait for the page to finish loading. (**Default: 5s**)
* **`--event-trigger-interval Interval`** The interval when the event is triggered automatically, generally used in the case of slow target network and DOM update conflicts that lead to URL miss capture. (**Default: 100ms**)
* **`--event-trigger-mode Value`** DOM event auto-triggered mode, with `async` and `sync`, for URL miss-catching caused by DOM update conflicts. (**Default: async**)
* **`--before-exit-delay`** Delay exit to close chrome at the end of a single tab task. Used to wait for partial DOM updates and XHR requests to be captured. (**Default: 1s**)
* **`--ignore-url-keywords, -iuk`** URL keyword that you don't want to visit, generally used to exclude logout links when customizing cookies. Usage: `-iuk logout -iuk exit`. (**default: "logout", "quit", "exit"**)
* **`--form-values, -fv`** Customize the value of the form fill, set by text type. Support definition types: default, mail, code, phone, username, password, qq, id_card, url, date and number. Text types are identified by the four attribute value keywords `id`, `name`, `class`, `type` of the input box label. For example, define the mailbox input box to be automatically filled with A and the password input box to be automatically filled with B, `-fv mail=A -fv password=B`.Where default represents the fill value when the text type is not recognized, as "Cralwergo". (**Default: Cralwergo**)
* **`--form-keyword-values, -fkv`** Customize the value of the form fill, set by keyword fuzzy match. The keyword matches the four attribute values of `id`, `name`, `class`, `type` of the input box label. For example, fuzzy match the pass keyword to fill 123456 and the user keyword to fill admin, `-fkv user=admin -fkv pass=123456`. (**Default: Cralwergo**)
* **`--push-to-proxy`** The listener address of the crawler result to be received, usually the listener address of the passive scanner. (**Default: null**)
* **`--push-pool-max`** The maximum number of concurrency when sending crawler results to the listening address. (**Default: 10**)
* **`--log-level`** Logging levels, debug, info, warn, error and fatal. (**Default: info**)
* **`--no-headless`** Turn off chrome headless mode to visualize the crawling process. (**Default: false**)



## Examples

crawlergo returns the full request and URL, which can be used in a variety of ways:
Expand All @@ -134,6 +103,15 @@ crawlergo returns the full request and URL, which can be used in a variety of wa

* Regularly clean up zombie processes generated by crawlergo [(example)](https://github.com/0Kee-Team/crawlergo/blob/master/examples/zombie_clean.py) , contributed by @ring04h


## Bypass headless detect
crawlergo can bypass headless mode detection by default.

https://intoli.com/blog/not-possible-to-block-chrome-headless/chrome-headless-test.html

![](./imgs/bypass.png)


## TroubleShooting

* 'Fetch.enable' wasn't found
Expand Down Expand Up @@ -165,12 +143,44 @@ crawlergo returns the full request and URL, which can be used in a variety of wa

![](./imgs/chrome_path.png)

## Bypass headless detect
crawlergo can bypass headless mode detection by default.

https://intoli.com/blog/not-possible-to-block-chrome-headless/chrome-headless-test.html

![](./imgs/bypass.png)
## Parameters
### Required parameters
* `--chromium-path Path, -c Path` The path to the chrome executable. (**Required**)
### Basic parameters
* `--custom-headers Headers` Customize the HTTP header. Please pass in the data after JSON serialization, this is globally defined and will be used for all requests. (Default: null)
* `--post-data PostData, -d PostData` POST data. (Default: null)
* `--max-crawled-count Number, -m Number` The maximum number of tasks for crawlers to avoid long crawling time due to pseudo-static. (Default: 200)
* `--filter-mode Mode, -f Mode` Filtering mode, `simple`: only static resources and duplicate requests are filtered. `smart`: with the ability to filter pseudo-static. `strict`: stricter pseudo-static filtering rules. (Default: smart)
* `--output-mode value, -o value` Result output mode, `console`: print the glorified results directly to the screen. `json`: print the json serialized string of all results. `none`: don't print the output. (Default: console)
* `--output-json filepath` Write the result to the specified file after JSON serializing it. (Default: null)
* `--request-proxy proxyAddress` socks5 proxy address, all network requests from crawlergo and chrome browser are sent through the proxy. (Default: null)

### Expand input URL
* `--fuzz-path` Use the built-in dictionary for path fuzzing. (Default: false)
* `--fuzz-path-dict` Customize the Fuzz path by passing in a dictionary file path, e.g. /home/user/fuzz_dir.txt, each line of the file represents a path to be fuzzed. (Default: null)
* `--robots-path` Resolve the path from the /robots.txt file. (Default: false)

### Form auto-fill
* `--ignore-url-keywords, -iuk` URL keyword that you don't want to visit, generally used to exclude logout links when customizing cookies. Usage: `-iuk logout -iuk exit`. (default: "logout", "quit", "exit")
* `--form-values, -fv` Customize the value of the form fill, set by text type. Support definition types: default, mail, code, phone, username, password, qq, id_card, url, date and number. Text types are identified by the four attribute value keywords `id`, `name`, `class`, `type` of the input box label. For example, define the mailbox input box to be automatically filled with A and the password input box to be automatically filled with B, `-fv mail=A -fv password=B`.Where default represents the fill value when the text type is not recognized, as "Cralwergo". (Default: Cralwergo)
* `--form-keyword-values, -fkv` Customize the value of the form fill, set by keyword fuzzy match. The keyword matches the four attribute values of `id`, `name`, `class`, `type` of the input box label. For example, fuzzy match the pass keyword to fill 123456 and the user keyword to fill admin, `-fkv user=admin -fkv pass=123456`. (Default: Cralwergo)

### Advanced settings for the crawling process
* `--incognito-context, -i` Browser start incognito mode. (Default: true)
* `--max-tab-count Number, -t Number` The maximum number of tabs the crawler can open at the same time. (Default: 8)
* `--tab-run-timeout Timeout` Maximum runtime for a single tab page. (Default: 20s)
* `--wait-dom-content-loaded-timeout Timeout` The maximum timeout to wait for the page to finish loading. (Default: 5s)
* `--event-trigger-interval Interval` The interval when the event is triggered automatically, generally used in the case of slow target network and DOM update conflicts that lead to URL miss capture. (Default: 100ms)
* `--event-trigger-mode Value` DOM event auto-triggered mode, with `async` and `sync`, for URL miss-catching caused by DOM update conflicts. (Default: async)
* `--before-exit-delay` Delay exit to close chrome at the end of a single tab task. Used to wait for partial DOM updates and XHR requests to be captured. (Default: 1s)

### Other
* `--push-to-proxy` The listener address of the crawler result to be received, usually the listener address of the passive scanner. (Default: null)
* `--push-pool-max` The maximum number of concurrency when sending crawler results to the listening address. (Default: 10)
* `--log-level` Logging levels, debug, info, warn, error and fatal. (Default: info)
* `--no-headless` Turn off chrome headless mode to visualize the crawling process. (Default: false)


## Follow me
Expand Down

0 comments on commit 4a0948c

Please sign in to comment.