Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2022-10-27可以正常爬蟲的版本 #3

Open
wants to merge 39 commits into
base: master
Choose a base branch
from

Conversation

cool9203
Copy link

改動列表:

  1. 蘋果新聞不再更新, 所以標註為無更新且不爬取的網站
  2. 基於相同template改寫, 並將parse date, TODAY, YESTERDAY, news date hceck抽取出來, 放到utils. 並去除使用time或datetime兩個lib, 只使用datetime
  3. 可以正常爬取各新聞網站(蘋果新聞除外)
  4. 增加爬取各新聞的description與key_word
  5. 增加測試程式, 也可以call測試程式run所有spider
  6. 相依lib版本, scrapy==2.7.0與Twisted==22.8.0可以正常執行
  7. 改寫README, 新增使用範例與可用spider與對應feature
  8. 刪除 liberty_realtimenews_spider.py

改寫幅度很大, 以上應該是全部了, 2022-10-27時有測試過都可以正常爬取
但setn(三立)每24小時應該只能爬一次, 會因為過度送request被他的server擋下, 然後不能爬, cool down約24H

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant