Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new xpath queries and login support #9

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions FinalScraper.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,11 @@
print('Data will be saved in the following directory:', os.getcwd())


scrape_oddsportal_historical(sport = 'soccer', country = 'france', league = 'ligue-1', start_season = '2010-2011', nseasons = 5, current_season = 'yes', max_page = 25)
scrape_oddsportal_current_season(sport = 'soccer', country = 'finland', league = 'veikkausliiga', season = '2020', max_page = 25)
scrape_oddsportal_specific_season(sport = 'soccer', country = 'finland', league = 'veikkausliiga', season = '2019', max_page = 25)
scrape_oddsportal_next_games(sport = 'tennis', country = 'germany', league = 'exhibition-bett1-aces-berlin-women', season = '2020')
scrape_oddsportal_historical(sport = 'soccer', country = 'france', league = 'ligue-1', start_season = '2020-2021', nseasons = 1, current_season = 'no', max_page = 1)

# scrape_oddsportal_current_season(sport = 'soccer', country = 'finland', league = 'veikkausliiga', season = '2020', max_page = 25)
# scrape_oddsportal_specific_season(sport = 'soccer', country = 'finland', league = 'veikkausliiga', season = '2019', max_page = 25)
# scrape_oddsportal_next_games(sport = 'tennis', country = 'germany', league = 'exhibition-bett1-aces-berlin-women', season = '2020')



Expand Down
78 changes: 44 additions & 34 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,68 +1,79 @@
# scrapeOP
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
:new::new::new: A python package for scraping oddsportal.com :new::new::new: <br/>
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

---

---

## :new::new::new: A python package for scraping oddsportal.com :new::new::new: <br/>

:information_source: Oddportal.com [1] is a tremendous website containing both historical and future betting odds concerning a wide range of sports and boomakers. <br />
This repository contains : <br />
1. A set of functions to scrape whatever league you wish :
1. Clone the repository or download it
2. Set your chrome driver location at line 20 in *functions.py*. NB : path is written with double slashes
2. Open *FinalScraper.py* and use one of the functions to scrape, that is as simple as this!
This repository contains : <br />

1. A set of functions to scrape whatever league you wish :
1. Clone the repository or download it
2. Set your chrome driver location at line 20 in _functions.py_. NB : path is written with double slashes
3. Set your session cookie at line 239 ( you can find it on Chrome > inspect > Application > Cookies > "oddsportalcom_session" )
4. Open _FinalScraper.py_ and use one of the functions to scrape, that is as simple as this!

:information_source: Functionalities :

- Multiple sports supported : soccer, basketball, esports, darts, tennis, baseball, rugby, american football, hockey [list to be expanded soon!]
- Mutiple functionalities : collect historical odds, current season only, upcoming games, specific season only
- Collects all available bookmakers odds for each game
- Collects the final result
- Automatically sort the data by date
- 14/11/2020 : You can now have the possibility to scrape the opening odds instead of the closing odds. To do this, just change the line 26 in *functions.py* to 'OPENING'<br />
- Automatically sort the data by date
- 14/11/2020 : You can now have the possibility to scrape the opening odds instead of the closing odds. To do this, just change the line 26 in _functions.py_ to 'OPENING'<br />

:information_source: Currently supported : <br />

Sport | Historical data (multiple seasons) | Current Season only | Specific Season | Next Games
------------ | ------------- | ------------ | ------------- | -------------
American Football | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark:
Baseball | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark:
Basketball | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark:
Darts | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark:
eSports | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark:
Handball | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark:
Hockey | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_multiplication_x:
Rugby | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark:
Soccer | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark:
Tennis | :heavy_multiplication_x: | :heavy_check_mark: | :heavy_multiplication_x: | :heavy_check_mark:
Volleyball | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark:
| Sport | Historical data (multiple seasons) | Current Season only | Specific Season | Next Games |
| ----------------- | ---------------------------------- | ------------------- | ------------------------ | ------------------------ |
| American Football | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| Baseball | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| Basketball | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| Darts | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| eSports | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| Handball | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| Hockey | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_multiplication_x: |
| Rugby | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| Soccer | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| Tennis | :heavy_multiplication_x: | :heavy_check_mark: | :heavy_multiplication_x: | :heavy_check_mark: |
| Volleyball | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |

:information_source: The main functions which you can use are the following one : <br />

```python
1.scrape_oddsportal_historical(sport = 'soccer', country = 'france', league = 'ligue-1', start_season = '2010-2011', nseasons = 5, current_season = 'yes', max_page = 25)
2.scrape_oddsportal_current_season(sport = 'soccer', country = 'finland', league = 'veikkausliiga', season = '2020', max_page = 25)
3.scrape_oddsportal_specific_season(sport = 'soccer', country = 'finland', league = 'veikkausliiga', season = '2019', max_page = 25)
4.scrape_oddsportal_next_games(sport = 'tennis', country = 'germany', league = 'exhibition-bett1-aces-berlin-women', season = '2020')
4.scrape_oddsportal_next_games(sport = 'tennis', country = 'germany', league = 'exhibition-bett1-aces-berlin-women', season = '2020')
```
:information_source: *e.g.* if I want to collect the data concerning the **soccer** league **MLS (USA)** during the **2013 season (1 season)** and for the **current season aswell** and that I only want the **first two pages** from oddsportal, I just need to call : <br/>

:information_source: _e.g._ if I want to collect the data concerning the **soccer** league **MLS (USA)** during the **2013 season (1 season)** and for the **current season aswell** and that I only want the **first two pages** from oddsportal, I just need to call : <br/>
![Scraper_final code](Screenshots/ScreenshotCode.PNG)
..then console when running code : <br/>
![Scraper_final code running](Screenshots/ScreenshotRunning.PNG)
..and finally the scraped data, saved in .csv format : <br/>
![Table Overview](Screenshots/ScreenshotTable.PNG) <br/>

Extra-documentation can be found for the functions in the *functions.py* script. :100::100: <br/><br/>
Extra-documentation can be found for the functions in the _functions.py_ script. :100::100: <br/><br/>

:information_source: How to create a sports-betting algorithm from historical data : </br>

- Medium article : https://medium.com/analytics-vidhya/how-covid-19-prevented-me-from-being-a-millionnaire-in-2020-5b2144e8bdef </br>
- Full paper : https://seb943.github.io/Data/Paper_Exploiting_bookmakers_biases.pdf </br> </br>

:information_source: Please report any bug/issue in the *issues* section or directly at [email protected]. Any feedback is really appreciated :speech_balloon: :+1:. </br>
:information_source: Please report any bug/issue in the _issues_ section or directly at [email protected]. Any feedback is really appreciated :speech_balloon: :+1:. </br>

---

---

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
You can also have a look at the *functions.py* source code in order to understand the mechanics and eventually adapt the code to your own purpose. In the *functions.py* script, I distinguished 4 types of sports, according to the sport-related format of outcome (either 1X2, 12, and various types of score : tennis-alike, football-alike, baseball-alike, hockey-like (the format is different for hockey on oddsportal website) ). <br />
---

---

You can also have a look at the _functions.py_ source code in order to understand the mechanics and eventually adapt the code to your own purpose. In the _functions.py_ script, I distinguished 4 types of sports, according to the sport-related format of outcome (either 1X2, 12, and various types of score : tennis-alike, football-alike, baseball-alike, hockey-like (the format is different for hockey on oddsportal website) ). <br />

With the emergence of sports analytics and machine learning, it has become possible for anyone to create data-based betting strategies, taking into considerations both market-related figures (odds values, variations, differentials between bookmakers) and sport-related metrics of performance of any team. In order to perform this task, the very minimal data required is the historical results and betting odds (closing odds are usually preferred) which you can then use to create Machine Learning and Deep Learning models to infer probabilities of victories, and to analyze whether any given team is being undervalued or overvalued by any given bookmaker. The oddsportal website is one of the largest publicly open odds database, however its format and architecture are not very pleasing to deal with, therefore one need a bit of time to build tools to collect the data on their website. This package offers a comprehensive interface (sort of unofficial API) to collect odds and save the data into a comprehensive csv format. <br />

Expand All @@ -72,4 +83,3 @@ NB : This package is purposed for educational use only, not for any commmercial

BTC : 3PkoHLXmXsL8kBrFu7GQ8kpmzPBmNK6m8B <br/>
ETH : 0xFdbB5aF291cB7e711D62c1E4a8B58d0EbD423F9C

Loading