Release Version 0.9.3 Article Parsing improvements and huge jump in multi language support (support for over 40 languages added) · AndyTheFactory/newspaper4k

Massive improvements in multi-language capabilities. Added over 40 new languages and completely reworked the language module. Much easier to add new languages now. Additionally, added support for Google News as a source. You can now search and parse news based on keywords, topic, location or website.
Integrated cloudscraper as an optional dependency. If installed, it will us cloudscraper as a layer over requests. Cloudscraper tries to bypass cloudflair protection.
We now have use two evaluation datasets - the one from scrapinghub and one created by us drom the top 200 most popular websites. This will help keeping track of future improvements and to have a clear view of the impact of the changes.

We see a steady improvement from version 0.9.0 up to 0.9.3. The evaluation results are available in the documentation. The evaluation dataset is also available in the following repository: Article Extraction Dataset

You can now install languages that need special packages as optional dependencies
Google News full integrated in the scraping process.
You can now pickle sources and articles - easier to save and recover scraping
Bumped minimum python version support to Python 3.8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Version 0.9.3 Article Parsing improvements and huge jump in multi language support (support for over 40 languages added)