Skip to content

Commit

Permalink
Refactoring + fix ProxyDocker provider: 0.10.0
Browse files Browse the repository at this point in the history
  • Loading branch information
nbulaj committed Mar 7, 2019
1 parent 6867bd1 commit fd579b2
Show file tree
Hide file tree
Showing 13 changed files with 167 additions and 54 deletions.
3 changes: 0 additions & 3 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@ rvm:
- 2.6
- ruby-head
- jruby-9.2.1
- truffleruby

matrix:
allow_failures:
Expand All @@ -33,5 +32,3 @@ matrix:
exclude:
- rvm: 2.0
gemfile: gemfiles/nokogiri.gemfile # Nokogiri doesn't support Ruby 2.0
- rvm: truffleruby
gemfile: gemfiles/nokogiri.gemfile # Truffle doesn't support Nokogiri
12 changes: 7 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ you can implement your own adapter if it your use-case. Take a look at the [Conf
If using bundler, first add 'proxy_fetcher' to your Gemfile:

```ruby
gem 'proxy_fetcher', '~> 0.7'
gem 'proxy_fetcher', '~> 0.10'
```

or if you want to use the latest version (from `master` branch), then:
Expand All @@ -72,7 +72,7 @@ bundle install
Otherwise simply install the gem:

```sh
gem install proxy_fetcher -v '0.7'
gem install proxy_fetcher -v '0.10'
```

## Example of usage
Expand Down Expand Up @@ -141,8 +141,8 @@ manager.refresh_list! # or manager.fetch!
# @response_time=5217, @type="HTTP", @anonymity="High">, ... ]
```

If you need to filter proxy list, for example, by country or response time and selected provider supports filtering with GET params,
then you can just pass your filters like a simple Ruby hash to the Manager instance:
If you need to filter proxy list, for example, by country or response time and **selected provider supports filtering**
with GET params, then you can just pass your filters like a simple Ruby hash to the Manager instance:

```ruby
ProxyFetcher.config.providers = :proxy_docker
Expand All @@ -153,7 +153,9 @@ manager.proxies
# => [...]
```

If you are using multiple providers, then you can split your filters by proxy provider names:
**[IMPORTANT]**: All the providers have their own filtering params! So you can't just use something like `country` to
filter all the proxies by country. If you are using multiple providers, then you can split your filters by proxy
provider names:

```ruby
ProxyFetcher.config.providers = [:proxy_docker, :xroxy]
Expand Down
1 change: 1 addition & 0 deletions lib/proxy_fetcher.rb
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@ def configure
#
def logger
return @logger if defined?(@logger)

@logger = config.logger || NullLogger.new
end

Expand Down
56 changes: 48 additions & 8 deletions lib/proxy_fetcher/providers/base.rb
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,25 @@ class Base
# Loads proxy provider page content, extract proxy list from it
# and convert every entry to proxy object.
def fetch_proxies!(filters = {})
load_proxy_list(filters).map { |html_node| to_proxy(html_node) }
raw_proxies = load_proxy_list(filters)
proxies = raw_proxies.map { |html_node| build_proxy(html_node) }.compact
proxies.reject { |proxy| proxy.addr.nil? }
end

def provider_url
raise NotImplementedError, "#{__method__} must be implemented in a descendant class!"
end

def provider_method
:get
end

def provider_params
{}
end

def provider_headers
{}
end

# Just synthetic sugar to make it easier to call #fetch_proxies! method.
Expand All @@ -17,7 +35,27 @@ def self.fetch_proxies!(*args)

protected

# Loads HTML document with Nokogiri by the URL combined with custom filters
# Loads raw provider HTML with proxies.
#
# @return [String]
# HTML body
#
def load_html(url, filters = {})
raise ArgumentError, 'filters must be a Hash' if filters && !filters.is_a?(Hash)

uri = URI.parse(url)
# TODO: query for post request?
uri.query = URI.encode_www_form(provider_params.merge(filters)) if filters && filters.any?

ProxyFetcher.config.http_client.fetch(
uri.to_s,
method: provider_method,
headers: provider_headers,
params: provider_params
)
end

# Loads provider HTML and parses it with internal document object.
#
# @param url [String]
# URL to fetch
Expand All @@ -29,15 +67,17 @@ def self.fetch_proxies!(*args)
# ProxyFetcher document object
#
def load_document(url, filters = {})
raise ArgumentError, 'filters must be a Hash' if filters && !filters.is_a?(Hash)

uri = URI.parse(url)
uri.query = URI.encode_www_form(filters) if filters && filters.any?

html = ProxyFetcher.config.http_client.fetch(uri.to_s)
html = load_html(url, filters)
ProxyFetcher::Document.parse(html)
end

def build_proxy(*args)
to_proxy(*args)
rescue StandardError => error
ProxyFetcher.logger.warn("Failed to build Proxy object due to error: #{error.message}")
nil
end

# Fetches HTML content by sending HTTP request to the provider URL and
# parses the document (built as abstract <code>ProxyFetcher::Document</code>)
# to return all the proxy entries (HTML nodes).
Expand Down
8 changes: 5 additions & 3 deletions lib/proxy_fetcher/providers/free_proxy_list.rb
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,13 @@ module Providers
# FreeProxyList provider class.
class FreeProxyList < Base
# Provider URL to fetch proxy list
PROVIDER_URL = 'https://free-proxy-list.net/'.freeze
def provider_url
'https://free-proxy-list.net/'
end

# [NOTE] Doesn't support filtering
def load_proxy_list(*)
doc = load_document(PROVIDER_URL, {})
def load_proxy_list(_filters = {})
doc = load_document(provider_url, {})
doc.xpath('//table[@id="proxylisttable"]/tbody/tr')
end

Expand Down
8 changes: 5 additions & 3 deletions lib/proxy_fetcher/providers/free_proxy_list_ssl.rb
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,9 @@ module Providers
# FreeProxyListSSL provider class.
class FreeProxyListSSL < Base
# Provider URL to fetch proxy list
PROVIDER_URL = 'https://www.sslproxies.org/'.freeze
def provider_url
'https://www.sslproxies.org/'
end

# Fetches HTML content by sending HTTP request to the provider URL and
# parses the document (built as abstract <code>ProxyFetcher::Document</code>)
Expand All @@ -15,8 +17,8 @@ class FreeProxyListSSL < Base
# Collection of extracted HTML nodes with full proxy info
#
# [NOTE] Doesn't support filtering
def load_proxy_list(*)
doc = load_document(PROVIDER_URL, {})
def load_proxy_list(_filters = {})
doc = load_document(provider_url, {})
doc.xpath('//table[@id="proxylisttable"]/tbody/tr')
end

Expand Down
6 changes: 4 additions & 2 deletions lib/proxy_fetcher/providers/gather_proxy.rb
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,9 @@ module Providers
# GatherProxy provider class.
class GatherProxy < Base
# Provider URL to fetch proxy list
PROVIDER_URL = 'http://www.gatherproxy.com/'.freeze
def provider_url
'http://www.gatherproxy.com/'
end

# Fetches HTML content by sending HTTP request to the provider URL and
# parses the document (built as abstract <code>ProxyFetcher::Document</code>)
Expand All @@ -17,7 +19,7 @@ class GatherProxy < Base
# Collection of extracted HTML nodes with full proxy info
#
def load_proxy_list(*)
doc = load_document(PROVIDER_URL)
doc = load_document(provider_url)
doc.xpath('//div[@class="proxy-list"]/table/script')
end

Expand Down
8 changes: 5 additions & 3 deletions lib/proxy_fetcher/providers/http_tunnel.rb
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,9 @@ module Providers
# HTTPTunnel provider class.
class HTTPTunnel < Base
# Provider URL to fetch proxy list
PROVIDER_URL = 'http://www.httptunnel.ge/ProxyListForFree.aspx'.freeze
def provider_url
'http://www.httptunnel.ge/ProxyListForFree.aspx'
end

# Fetches HTML content by sending HTTP request to the provider URL and
# parses the document (built as abstract <code>ProxyFetcher::Document</code>)
Expand All @@ -14,8 +16,8 @@ class HTTPTunnel < Base
# @return [Array<ProxyFetcher::Document::Node>]
# Collection of extracted HTML nodes with full proxy info
#
def load_proxy_list(*)
doc = load_document(PROVIDER_URL)
def load_proxy_list(_filters = {})
doc = load_document(provider_url)
doc.xpath('//table[contains(@id, "GridView")]/tr[(count(td)>2)]')
end

Expand Down
66 changes: 53 additions & 13 deletions lib/proxy_fetcher/providers/proxy_docker.rb
Original file line number Diff line number Diff line change
@@ -1,11 +1,39 @@
# frozen_string_literal: true

require 'json'

module ProxyFetcher
module Providers
# ProxyDocker provider class.
class ProxyDocker < Base
# Provider URL to fetch proxy list
PROVIDER_URL = 'https://www.proxydocker.com/en/proxylist/'.freeze
def provider_url
'https://www.proxydocker.com/en/api/proxylist/'
end

def provider_method
:post
end

def provider_params
{
token: 'GmZyl0OJmmgrWakdzO7AFf6AWfkdledR6xmKvGmwmJg',
country: 'all',
city: 'all',
state: 'all',
port: 'all',
type: 'all',
anonymity: 'all',
need: 'all',
page: '1'
}
end

def provider_headers
{
cookie: 'PHPSESSID=7f59558ee58b1e4352c4ab4c2f1a3c11'
}
end

# Fetches HTML content by sending HTTP request to the provider URL and
# parses the document (built as abstract <code>ProxyFetcher::Document</code>)
Expand All @@ -16,30 +44,42 @@ class ProxyDocker < Base
#
# [NOTE] Doesn't support direct filters
def load_proxy_list(*)
doc = load_document(PROVIDER_URL, {})
doc.xpath('//table[contains(@class, "table")]/tbody/tr[(count(td)>2)]')
json = JSON.parse(load_html(provider_url, {}))
json.fetch('proxies', [])
rescue JSON::ParserError
[]
end

# Converts HTML node (entry of N tags) to <code>ProxyFetcher::Proxy</code>
# Converts JSON node to <code>ProxyFetcher::Proxy</code>
# object.
#
# @param html_node [Object]
# HTML node from the <code>ProxyFetcher::Document</code> DOM model.
# @param node [Hash]
# JSON entry from the API response
#
# @return [ProxyFetcher::Proxy]
# Proxy object
#
def to_proxy(html_node)
def to_proxy(node)
ProxyFetcher::Proxy.new.tap do |proxy|
uri = URI("//#{html_node.content_at('td[1]')}")
proxy.addr = uri.host
proxy.port = uri.port
proxy.addr = node['ip']
proxy.port = node['port']

proxy.type = html_node.content_at('td[2]')
proxy.anonymity = html_node.content_at('td[3]')
proxy.country = html_node.content_at('td[5]')
proxy.type = types_mapping.fetch(node['type'], ProxyFetcher::Proxy::HTTP)
proxy.anonymity = "Lvl#{node['anonymity']}"
proxy.country = node['country']
end
end

def types_mapping
{
'16' => ProxyFetcher::Proxy::HTTP,
'26' => ProxyFetcher::Proxy::HTTPS,
'3' => ProxyFetcher::Proxy::SOCKS4,
'4' => ProxyFetcher::Proxy::SOCKS5,
'56' => ProxyFetcher::Proxy::HTTP, # CON25
'6' => ProxyFetcher::Proxy::HTTP # CON80
}
end
end

ProxyFetcher::Configuration.register_provider(:proxy_docker, ProxyDocker)
Expand Down
6 changes: 4 additions & 2 deletions lib/proxy_fetcher/providers/proxy_list.rb
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,9 @@ module Providers
# ProxyList provider class.
class ProxyList < Base
# Provider URL to fetch proxy list
PROVIDER_URL = 'https://proxy-list.org/english/index.php'.freeze
def provider_url
'https://proxy-list.org/english/index.php'
end

# Fetches HTML content by sending HTTP request to the provider URL and
# parses the document (built as abstract <code>ProxyFetcher::Document</code>)
Expand All @@ -17,7 +19,7 @@ class ProxyList < Base
# Collection of extracted HTML nodes with full proxy info
#
def load_proxy_list(filters = {})
doc = load_document(PROVIDER_URL, filters)
doc = load_document(provider_url, filters)
doc.css('.table-wrap .table ul')
end

Expand Down
6 changes: 4 additions & 2 deletions lib/proxy_fetcher/providers/xroxy.rb
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,9 @@ module Providers
# XRoxy provider class.
class XRoxy < Base
# Provider URL to fetch proxy list
PROVIDER_URL = 'https://www.xroxy.com/free-proxy-lists/'.freeze
def provider_url
'https://www.xroxy.com/free-proxy-lists/'
end

# Fetches HTML content by sending HTTP request to the provider URL and
# parses the document (built as abstract <code>ProxyFetcher::Document</code>)
Expand All @@ -15,7 +17,7 @@ class XRoxy < Base
# Collection of extracted HTML nodes with full proxy info
#
def load_proxy_list(filters = { type: 'All_http' })
doc = load_document(PROVIDER_URL, filters)
doc = load_document(provider_url, filters)
doc.xpath('//div/table/tbody/tr')
end

Expand Down
Loading

0 comments on commit fd579b2

Please sign in to comment.