You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
you asked this question in the title within you comments of the ost_analysis.ipynb notebook! I think it's worth starting a separate issue on this.The most important missing is about usage of open source projects. In combination with the analysis we already did this could help us to identify projects that are used a lot but do not have enough support. We could get this data in two way:
Dependents and Dependencies
For some programming languages GitHub integrated dependency trees into the platform. For Python and Java Script we still get this data. Here an example: https://github.com/pvlib/pvlib-python
The data mining script extracts this data from the website, because GitHub has not integrated this data into the API. If you are interested into the data we could still plot it. We also have it for the dependencies. We could create a plot like "The most used Python dependencies used in the listed GitHub projects".
Download Statistics
Since most projects are in R and Python we could use the Package index platforms to get this data:
I was able to plot the Dependents data of the Python projects but counts for the projects with the most dependents are wrong.
The rest of the data looks good.
Interesting ideas, thanks for all your leg work exploring and listing all these resources, @Ly0n ! This could definitely enrich the analysis and help identify 'meaningful' projects (those that are widely used) and show how many of them are (un)supported (although I'm guessing there's a strong positive correlation between the two). One thing to consider is: even if most projects are written in Python/R, are they actual libraries/packages? Sometimes webapps can be (and should be!) packaged but that's not a common practice at least in the R community. But this should be a quick check 👍
It would be very interesting to see how many projects are released as packages. I other domain like robotics is is very common to create packages with similar interfaces to improve modularity. A low level of modularity would indicate that there is very little collaboration between open source projects in this area. I have the impression that this is the case and it would be interesting to prove this with figures. Also NLP could help here a lot.
Hey @KKulma,
you asked this question in the title within you comments of the
ost_analysis.ipynb
notebook! I think it's worth starting a separate issue on this.The most important missing is about usage of open source projects. In combination with the analysis we already did this could help us to identify projects that are used a lot but do not have enough support. We could get this data in two way:Dependents and Dependencies
For some programming languages GitHub integrated dependency trees into the platform. For Python and Java Script we still get this data. Here an example:
https://github.com/pvlib/pvlib-python
The data mining script extracts this data from the website, because GitHub has not integrated this data into the API. If you are interested into the data we could still plot it. We also have it for the dependencies. We could create a plot like "The most used Python dependencies used in the listed GitHub projects".
Download Statistics
Since most projects are in R and Python we could use the Package index platforms to get this data:
This package could get use the data for R:
https://github.com/GuangchuangYu/dlstats
This packages could help use get the download number from Python:
https://github.com/hugovk/pypistats
https://github.com/asadmoosvi/pypi-search
The problem is how to find out the package name via the repo URL as input?
One projects exists, that gives you that:
https://github.com/librariesio/bibliothecary
Maybe we also could regex for pip install in the README. That could be simple work around that works with most projects.
What is your opinion?
The text was updated successfully, but these errors were encountered: