Releases: adbar/htmldate
htmldate-0.8.0
dateparser
andregex
modules fully integrated- patterns added for coverage
- smarter HTML doc loading
htmldate-0.7.3
- dependencies updated and reduced: switch from
requests
to bareurllib3
, makechardet
standard andcchardet
optional - fixes: downloads,
OverflowError
in extraction
htmldate-0.7.2
- compatibility with Python 3.9
- better speed and accuracy
htmldate-0.7.1
- technical release: package requirements and docs wording
htmldate-0.7.0
- code base and performance improved
- minimum date available as option
- support for Turkish patterns and CMS idiosyncrasies (thanks @evolutionoftheuniverse)
htmldate-0.6.3
- more efficient code
- additional evaluation data
htmldate-0.6.2
v0.6.2 roundup + version bump
htmldate-0.6.1
htmldate finds original and updated publication dates of any web page. All the steps needed from web page download to HTML parsing, scraping and text analysis are included.
In a nutshell, with Python:
from htmldate import find_date
find_date('http://blog.python.org/2016/12/python-360-is-now-available.html')
'2016-12-23'
find_date('https://netzpolitik.org/2016/die-cider-connection-abmahnungen-gegen-nutzer-von-creative-commons-bildern/', original_date=True)
'2016-06-23'
On the command-line:
$ htmldate -u http://blog.python.org/2016/12/python-360-is-now-available.html
'2016-12-23'
Releases used in production and meant to be archived on Zenodo for reproducibility and citability.
For more information see htmldate.readthedocs.io
First stable release for Zenodo
First release used in production and meant to be archived on Zenodo for reproducibility and citability.