Skip to content

Commit

Permalink
Merge pull request #29 from ScrapingAnt/feature/issue28-handle-timeouts
Browse files Browse the repository at this point in the history
feature/issue28-handle-timeouts: done
  • Loading branch information
megabotan authored Oct 23, 2022
2 parents 05f3948 + b4df603 commit e4060e0
Show file tree
Hide file tree
Showing 6 changed files with 74 additions and 41 deletions.
57 changes: 29 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,8 +52,8 @@ All public classes, methods and their parameters can be inspected in this API re

Main class of this library.

| Param | Type |
| --- | --- |
| Param | Type |
|-------|---------------------|
| token | <code>string</code> |

* * *
Expand All @@ -62,17 +62,17 @@ Main class of this library.

https://docs.scrapingant.com/request-response-format#available-parameters

| Param | Type | Default |
| --- | --- | --- |
| url | <code>string</code> | |
| cookies | <code>List[Cookie]</code> | None |
| headers | <code>List[Dict[str, str]]</code> | None |
| js_snippet | <code>string</code> | None |
| proxy_type | <code>ProxyType</code> | datacenter |
| proxy_country | <code>str</code> | None |
| return_text | <code>boolean</code> | False |
| wait_for_selector | <code>str</code> | None |
| browser | <code>boolean</code> | True |
| Param | Type | Default |
|-------------------|-----------------------------------|------------|
| url | <code>string</code> | |
| cookies | <code>List[Cookie]</code> | None |
| headers | <code>List[Dict[str, str]]</code> | None |
| js_snippet | <code>string</code> | None |
| proxy_type | <code>ProxyType</code> | datacenter |
| proxy_country | <code>str</code> | None |
| return_text | <code>boolean</code> | False |
| wait_for_selector | <code>str</code> | None |
| browser | <code>boolean</code> | True |

**IMPORTANT NOTE:** <code>js_snippet</code> will be encoded to Base64 automatically by the ScrapingAnt client library.

Expand All @@ -82,9 +82,9 @@ https://docs.scrapingant.com/request-response-format#available-parameters

Class defining cookie. Currently it supports only name and value

| Param | Type |
| --- | --- |
| name | <code>string</code> |
| Param | Type |
|-------|---------------------|
| name | <code>string</code> |
| value | <code>string</code> |

* * *
Expand All @@ -93,23 +93,24 @@ Class defining cookie. Currently it supports only name and value

Class defining response from API.

| Param | Type |
| --- | --- |
| content | <code>string</code> |
| cookies | <code>List[Cookie]</code> |
| status_code | <code>int</code> |
| Param | Type |
|-------------|---------------------------|
| content | <code>string</code> |
| cookies | <code>List[Cookie]</code> |
| status_code | <code>int</code> |

## Exceptions

`ScrapingantClientException` is base Exception class, used for all errors.

| Exception | Reason |
| --- | --- |
| ScrapingantInvalidTokenException | The API token is wrong or you have exceeded the API calls request limit
| ScrapingantInvalidInputException | Invalid value provided. Please, look into error message for more info |
| ScrapingantInternalException | Something went wrong with the server side code. Try again later or contact ScrapingAnt support |
| ScrapingantSiteNotReachableException | The requested URL is not reachable. Please, check it locally |
| ScrapingantDetectedException | The anti-bot detection system has detected the request. Please, retry or change the request settings. |
| Exception | Reason |
|--------------------------------------|------------------------------------------------------------------------------------------------------------------------------|
| ScrapingantInvalidTokenException | The API token is wrong or you have exceeded the API calls request limit |
| ScrapingantInvalidInputException | Invalid value provided. Please, look into error message for more info |
| ScrapingantInternalException | Something went wrong with the server side code. Try again later or contact ScrapingAnt support |
| ScrapingantSiteNotReachableException | The requested URL is not reachable. Please, check it locally |
| ScrapingantDetectedException | The anti-bot detection system has detected the request. Please, retry or change the request settings. |
| ScrapingantTimeoutException | Got timeout while communicating with Scrapingant servers. Check your network connection. Please try later or contact support |

* * *

Expand Down
4 changes: 3 additions & 1 deletion scrapingant_client/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
__version__ = "1.0.0"
__version__ = "1.0.1"

from scrapingant_client.client import ScrapingAntClient
from scrapingant_client.cookie import Cookie
Expand All @@ -9,6 +9,7 @@
ScrapingantInternalException,
ScrapingantSiteNotReachableException,
ScrapingantDetectedException,
ScrapingantTimeoutException,
)
from scrapingant_client.proxy_type import ProxyType
from scrapingant_client.response import Response
Expand All @@ -23,5 +24,6 @@
'ScrapingantInternalException',
'ScrapingantSiteNotReachableException',
'ScrapingantDetectedException',
'ScrapingantTimeoutException',
'Response',
]
34 changes: 22 additions & 12 deletions scrapingant_client/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,15 @@
import requests

import scrapingant_client
from scrapingant_client.constants import SCRAPINGANT_API_BASE_URL
from scrapingant_client.constants import SCRAPINGANT_API_BASE_URL, TIMEOUT_SECONDS
from scrapingant_client.cookie import Cookie, cookies_list_to_string, cookies_list_from_string
from scrapingant_client.errors import (
ScrapingantInvalidTokenException,
ScrapingantInvalidInputException,
ScrapingantInternalException,
ScrapingantSiteNotReachableException,
ScrapingantDetectedException,
ScrapingantTimeoutException,
)
from scrapingant_client.headers import convert_headers
from scrapingant_client.proxy_type import ProxyType
Expand Down Expand Up @@ -100,11 +101,15 @@ def general_request(
wait_for_selector=wait_for_selector,
browser=browser,
)
response = self.requests_session.post(
SCRAPINGANT_API_BASE_URL + '/general',
json=request_data,
headers=convert_headers(headers),
)
try:
response = self.requests_session.post(
SCRAPINGANT_API_BASE_URL + '/general',
json=request_data,
headers=convert_headers(headers),
timeout=TIMEOUT_SECONDS
)
except requests.exceptions.Timeout:
raise ScrapingantTimeoutException()
response_status_code = response.status_code
response_data = response.json()
parsed_response: Response = self._parse_response(response_status_code, response_data, url)
Expand Down Expand Up @@ -138,13 +143,18 @@ async def general_request_async(
headers={
'x-api-key': self.token,
'User-Agent': self.user_agent,
}
},
timeout=TIMEOUT_SECONDS,
) as client:
response = await client.post(
SCRAPINGANT_API_BASE_URL + '/general',
json=request_data,
headers=convert_headers(headers),
)
try:
response = await client.post(
SCRAPINGANT_API_BASE_URL + '/general',
json=request_data,
headers=convert_headers(headers),
)
except httpx.TimeoutException:
raise ScrapingantTimeoutException()

response_status_code = response.status_code
response_data = response.json()
parsed_response: Response = self._parse_response(response_status_code, response_data, url)
Expand Down
1 change: 1 addition & 0 deletions scrapingant_client/constants.py
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
SCRAPINGANT_API_BASE_URL = 'https://api.scrapingant.com/v1'
TIMEOUT_SECONDS = 120
7 changes: 7 additions & 0 deletions scrapingant_client/errors.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,3 +30,10 @@ class ScrapingantInternalException(ScrapingantClientException):
def __init__(self):
message = 'Something went wrong with the server side. Please try later or contact support'
super().__init__(message)


class ScrapingantTimeoutException(ScrapingantClientException):
def __init__(self):
message = 'Got timeout while communicating with Scrapingant servers.' \
' Check your network connection. Please try later or contact support'
super().__init__(message)
12 changes: 12 additions & 0 deletions tests/test_exceptions.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import pytest
import requests
import responses

from scrapingant_client import (
Expand All @@ -8,6 +9,7 @@
ScrapingantInternalException,
ScrapingantSiteNotReachableException,
ScrapingantDetectedException,
ScrapingantTimeoutException,
)
from scrapingant_client.constants import SCRAPINGANT_API_BASE_URL

Expand Down Expand Up @@ -58,3 +60,13 @@ def test_detected():
with pytest.raises(ScrapingantDetectedException) as e:
client.general_request('example.com')
assert 'The anti-bot detection system has detected the request' in str(e)


@responses.activate
def test_timeout():
responses.add(responses.POST, SCRAPINGANT_API_BASE_URL + '/general',
body=requests.exceptions.ReadTimeout())
client = ScrapingAntClient(token='some_token')
with pytest.raises(ScrapingantTimeoutException) as e:
client.general_request('example.com')
assert 'Got timeout' in str(e)

0 comments on commit e4060e0

Please sign in to comment.