-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
9ec2716
commit 5ec0fcd
Showing
8 changed files
with
321 additions
and
284 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,41 +1,71 @@ | ||
# Npm Rank | ||
根据2019年8月16的npm下载量前1000的包名,获取这些包现在的下载量排行 | ||
|
||
根据 2019 年 8 月 16 的 npm 下载量前 1000 的包名,获取这些包现在的下载量排行,默认展示前30。前端仓库地址为 [npmrank-v](https://github.com/XueMeijing/npmrank-v) | ||
|
||
# 数据来源 | ||
- 数据使用的包名来自 [anvaka](https://gist.github.com/anvaka/8e8fa57c7ee1350e3491) 2019年爬取的top1000数据,距今三四年的时间,但是老的包一般被依赖的多下载量大,所以对前面的排名影响应该不是很大 | ||
- 获取数据使用的api来自 [npm](https://github.com/npm/registry/blob/master/docs/download-counts.md) | ||
- npm每天在UTC凌晨不久后更新数据,所以这里选择在UTC3点更新新的数据,因为这是协调世界时,这会产生一些有点不直观的结果 | ||
- 总数据为2015年至今的数据 | ||
|
||
- 数据使用的包名来自 [anvaka](https://gist.github.com/anvaka/8e8fa57c7ee1350e3491) 2019 年爬取的 top1000 数据,距今三四年的时间,但是老的包一般被依赖的多下载量大,所以对前面的排名影响应该不是很大 | ||
- 获取数据使用的 api 来自 [npm](https://github.com/npm/registry/blob/master/docs/download-counts.md) | ||
- npm 每天在 UTC 凌晨不久后更新数据,所以这里选择在 UTC 3 点更新新的数据,因为这是协调世界时,这会产生一些有点不直观的结果 | ||
- 总数据为 2015 年至今的数据 | ||
- github star 来自 爬到的 github 页面数据,如果使用 github api 每小时每个 ip会限速 60 次请求,[详见文档](https://docs.github.com/en/rest/overview/resources-in-the-rest-api?apiVersion=2022-11-28#rate-limiting) | ||
|
||
如果有看到下载量比较多但是不在`source.md`的包,欢迎提 issue | ||
|
||
如果有看到下载量比较多但是不在```source.md```的,欢迎提issue | ||
# 快速开始 | ||
|
||
1. 安装依赖 | ||
``` | ||
pip3 install -r requirements.txt | ||
``` | ||
2. 生成或更新下载数据, 网络正常的情况下持续大约四五十分钟(两秒请求一条数据,报错暂停一分钟) | ||
``` | ||
python3 generate_download_data.py | ||
``` | ||
``` | ||
pip3 install -r requirements.txt | ||
``` | ||
2. 生成或更新下载数据, 网络正常的情况下持续大约1~2小时 | ||
``` | ||
nohup python3 -u generate_download_data.py > nohup.out 2>&1 & | ||
``` | ||
3. 查看日志 | ||
``` | ||
tail -30f nohup.out | ||
``` | ||
|
||
# 目录 | ||
|
||
``` | ||
. | ||
├── LICENSE | ||
├── README.md | ||
├── database.db # sqlite3的包下载数据 | ||
├── db.py # 数据库方法 | ||
├── generate_download_data.py # 生成、更新新数据 | ||
├── requirements.txt | ||
├── server.py # 给前端展示提供数据 | ||
└── source.md # 2019年包排名 | ||
``` | ||
|
||
# 待办 | ||
|
||
- 使用代理,同时发出多个请求 | ||
- 增加定时任务,每天UTC4点更新数据 | ||
- 目前是前 30 条,希望能展示更多的数据,但是图表不方便展示更多的数据, | ||
|
||
# 注意 | ||
- 更新数据 ```generate_download_data``` 时电脑不能开代理,否则请求报SSL443错误 | ||
- 同一接口如 https://www.npmjs.com/package/glob 会根据不同的header来返回json或者html,但是暂未确认是哪个header | ||
- 查询最多限于18个月的数据,返回数据的最早日期是2015年1月10日 | ||
- 获取某个包某个区间的下载量,可能为0,不确定是什么原因,例如 | ||
``` | ||
https://api.npmjs.org/downloads/point/2020-01-01:2021-01-01/express | ||
``` | ||
返回 | ||
``` | ||
{ | ||
"downloads": 0, | ||
"start": "2020-01-01", | ||
"end": "2021-01-01", | ||
"package": "express" | ||
} | ||
``` | ||
- github api每小时每个ip限速60次请求,[详见文档](https://docs.github.com/en/rest/overview/resources-in-the-rest-api?apiVersion=2022-11-28#rate-limiting) | ||
|
||
- 更新数据 `generate_download_data` 时电脑不能开代理,否则请求报 SSL443 错误 | ||
- 同一接口如 https://www.npmjs.com/package/glob 会根据不同的 header 来返回 json 或者 html,但是暂未确认是哪个 header | ||
- 单次查询最多限于 18 个月的数据,返回数据的最早日期是 2015 年 1 月 10 日 | ||
- 获取某个包某个区间的下载量,可能为 0,不确定是什么原因,例如 | ||
``` | ||
https://api.npmjs.org/downloads/point/2020-01-01:2021-01-01/express | ||
``` | ||
返回 | ||
``` | ||
{ | ||
"downloads": 0, | ||
"start": "2020-01-01", | ||
"end": "2021-01-01", | ||
"package": "express" | ||
} | ||
``` | ||
|
||
# 致谢 | ||
|
||
感谢大哥的数据库指导 [sunkxs](https://github.com/sunkxs) |
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
import sqlite3 | ||
|
||
# 查询结果元组转字典 | ||
def dict_factory(cursor, row): | ||
d = {} | ||
for idx, col in enumerate(cursor.description): | ||
d[col[0]] = row[idx] | ||
return d | ||
|
||
DATABASE = 'database.db' | ||
|
||
def init_db(): | ||
db = sqlite3.connect(DATABASE, check_same_thread=False) | ||
cursor = db.cursor() | ||
create_table_pkgbase_query = ''' | ||
CREATE TABLE IF NOT EXISTS pkgbase( | ||
id TEXT PRIMARY KEY NOT NULL, | ||
npm_url TEXT NOT NULL, | ||
github_url TEXT , | ||
homepage_url TEXT , | ||
version TEXT , | ||
license TEXT , | ||
github_star TEXT , | ||
size TEXT , | ||
created TEXT , | ||
updated TEXT ); | ||
''' | ||
create_table_pkgdownload_query = ''' | ||
CREATE TABLE IF NOT EXISTS pkgdownload( | ||
id TEXT NOT NULL, | ||
dltype TEXT NOT NULL, | ||
downloads INTEGER , | ||
timepoint TEXT ); | ||
''' | ||
cursor.execute(create_table_pkgbase_query) | ||
cursor.execute(create_table_pkgdownload_query) | ||
cursor.close() | ||
db.close() | ||
print('数据库初始化成功') | ||
|
||
class SQLDB(object): | ||
def __init__(self): | ||
self.db = sqlite3.connect(DATABASE, isolation_level=None) | ||
self.db.row_factory = dict_factory | ||
|
||
def get(self,query, args=(), one=False): | ||
cur = self.db.cursor() | ||
cur.execute(query, args) | ||
rv = cur.fetchall() | ||
cur.close() | ||
return (rv[0] if rv else None) if one else rv | ||
|
||
def update(self,SQL, args): | ||
# 判断是否报错,报错回滚,否则提交 | ||
cur = self.db.cursor() | ||
cur.execute(SQL, args) | ||
self.db.commit() | ||
cur.close() | ||
|
||
def __del__(self): | ||
self.db.commit() | ||
self.db.close() | ||
|
||
__all__ = [ | ||
init_db, | ||
SQLDB | ||
] |
Oops, something went wrong.