Skip to content

Commit

Permalink
百家号添加readme
Browse files Browse the repository at this point in the history
  • Loading branch information
copie committed Mar 4, 2017
1 parent a41a8a3 commit c511c72
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 0 deletions.
6 changes: 6 additions & 0 deletions 6.爬虫项目源码/11.百家号/baijiahao_cw/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# 百家号采集
## BUG
* 由于使用PhantomJS效率太低
* 代码风格问题,有些地方的代码太长不符合PEP8
* 通过get_url.py爬取两天获得50万URL
* 但是获取id时服务器跑了一星期了才获得4万可用的ID(百家号作者)
11 changes: 11 additions & 0 deletions 6.爬虫项目源码/11.百家号/baijiahao_cw/qucong.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
lll = set()
with open('appid.txt','r') as idfile:
for appid in idfile.readlines():
lll.add(appid)
idfile.close()
print(len(lll))
with open('sortid.txt','w') as idfile:
for appid in lll:
idfile.write(appid)
idfile.write('\n')
idfile.close()

0 comments on commit c511c72

Please sign in to comment.