Skip to content

Latest commit

 

History

History
1494 lines (971 loc) · 92.2 KB

File metadata and controls

1494 lines (971 loc) · 92.2 KB
outline
deep

OSS Data Analytics

Overview

The China Open Source Annual Report is based on in-depth and comprehensive data insights and is divided into eight major parts. The 1st part, General Overall Macro Insights, provides an overview of China's global open-source ecology through an in-depth analysis of essential events, active repositories, active users, open-source licensing, and programming languages. The 2nd part, OpenRank Rank List, is the list of open source projects, enterprises, foundations, developers, and collaborative robots in all areas of the world and China, and provides a comprehensive and systematic OpenRank indicator information service for industry. Part 3 and Part 4 contain Enterprise Insights and Foundation Insights, which illustrate the evolution of global and Chinese enterprises and foundations in the open source area through evolution maps and trend analyses. Part 5 Technology Sector Insights provides an in-depth study on the evolution of the Top 10 lists and projects in each area, showing the direction and trends in forward technology. Part 6 Open Source Project Insights provides insights into the diversity and innovative directions of different project types, areas, and topics. Part 7 Open Source Developer Insights An analysis of developer types, hours of work, geographical distribution, and robotic use shows the diversity and characteristics of the developer community. Part 8, Case Studies, provides a series of interesting case analyses that allow readers to glimpse China's exponential ecological boom. Overall, the data page offers a panorama of China's open-source ecology in 2023 through rich data insights and analyses.

Introduction to indicators

OpenRank

The OpenRank indicator is a collaborative network indicator developed by the X-lab Open Laboratory and based on an open source developer-project collaborative relationships network, which not only characterizes the overall development of projects community participation but also introduces elements of open source ecology, which can be well identified and displayed by such entities as projects, people, organizations, etc. in open source ecology. OpenRank is now widely accepted by industry and academia, including the China Institute for Standardization (ISI) series of Open Source Governance Standards, the ICT White Paper on Open Source Governance, the Open Atomic Open Source Foundation Global Open Source Screen, and the Business Open Source Office Governance Toolkit.

For a definition of this indicator, refer to:

[1] [Shengyu Zhao et al.: OpenRank Leaderboard: Motivating Open Source Collections Through Social Network Evaluation in Alibaba. ICSE, 2024] (https://www.researchgate.net/publication/3766686121_OpenRank_Leaderboard_Motivating_Open_Source_Collections_Through_Social_Network_Evaluation_in_Alibaba)

[2] [Zhao Honghou: How to evaluate an open source project (iii) value stream, 2021] (https://blog.frankzhao.cn/how_to_measure_open_source_3)

[3] Institute for Standardization of the Ministry of Industry and Information: Information Technology Open Source Governance Part 3:Community Governance and Operationalisation [T/CESA 1270.3-2023]; Information Technology Open Source Governance Part 5:Evaluation Model for Open Source Contributors" [T/CESA 1270.5-2023], 2023

Activity

Activity is a statistical indicator of the level of activity of the X-lab researcher or developer. Developer activity is weighted by the behavior of developers, such as Issue, PR, and Code Review. The project's activity is processed by the sum of the total activity of all developers in the project.

For a definition of this indicator, refer to:

[1] Xiaoya Xia et al: Exploring activity and contributors on GitHub: Who, what, when, and when. APSEC, 2023

[2] Frank Zhao:How to evaluate an open source project (i) - activity,2021

1. Overall Macro Insight

1.1 Basic Events

Basic events are the database for this data page analysis and refer to a series of event log data generated by developer activity on GitHub, Gitee, and others on the global open-source collaborative platform. A statistical analysis of underlying events provides a macro insight into the dynamics of global ecological development. This annual open-source report covers the collaborative platforms GitHub, Gitee, and GitLink.

1.1.1 Trends in events across GitHub

First, the total number of events logs for statistical analysis across GitHub is shown in the graph below.

1-1

Figure 1.1 Trends in GitHub annual events

The overall activity of global open sources and the number of active warehouses have increased significantly in recent years, reflecting the growth rate in global open-source development.2023 GitHub log data reached 1.4 billion compared to 2022 when it increased by about 10.32 percent. After high growth in 2018-2020, the GitHub platform's annual event growth gradually declined, with a growth rate of about 10% in 2023. However, the 10 percent growth rate, because of its overall volume, continues to highlight open-source technology's dynamic and critical role in the global digital transition.

1.1.2 Comparison of overall events trends in GitHub and Gitee

Because of the size of the events active on the GitHub platform, the subsequent analysis was built on the benchmark of the top 30,000 active warehouses per platform. For ease of comparison, we have selected GitHub for statistical analysis of 8 categories of events of greater relevance to open source participation in Gitee, including CommunityCommentEvent, ForkEvent, IssueCommentEvent, IssuesEvent, FullRequestEvent, FullRequestReviewCommentEvent, PushEvent, and WatchEvent.

1-2

Figure 1.2 GitHub and Gitee Active Repository Events

The Gitee platform showed a more pronounced growth trend. Even since 2021, the number of incidents in the top 30,000 active warehouses has surpassed GitHub, highlighting the outbreak of active open-source projects in the country. Domestic developers' active participation and contribution to open-source communities have injected new dynamism into technological innovation and knowledge sharing.

However, it must be emphasized that data on the first 30,000 active projects alone does not fully reveal the reality of the global GitHub platform, as the long-end effects are still evident globally. Subsequent analyses will reflect this more clearly, especially in the broad and diverse nature of the GitHub platform as the world's leading open-source community. In the future, with the evolution of technology and the promotion of an open-source culture, the Chinese open-source community can be expected to continue to flourish globally.

Further to the analysis of disaggregated data on underlying events, the results are shown in the figure below.

1-3

Figure 1.3 GitHub vs. Gitee Active Repository Event Types

Can be seen from the analytics results:

The most frequent event type on the GitHub platform is the Push event, while Pull Request events and Issue Comment events rank 2nd and 3rd, respectively. The occurrence rates of each event type have remained relatively stable, reflecting a trend towards a stable ecosystem in GitHub's open-source community. On the Gitee platform, event data grew significantly in 2020, initially focusing on Watch events. But after 2020, Pull Request and Review Events grew rapidly, becoming the largest event type in 2022 and growing steadily in 2023. The structural changes in Gitee event data reflect a significant shift in the role of domestic developers from a watchdog to a contributor, which is consistent with observations worldwide.

1.1.3 GitLink Events Analysis

For the GitLink platform, we have also selected the top 30,000 active repositories as benchmarks. Given the limitations of the data, only data covering the six types of events—CommunityCommentEvent, ForkEvent, IssueCommentEvent, IssuesEvent, FullRequestEvent, and WatchEvent—were selected for analysis.

1-17

Data analysis of events on the GitLink platform

While the number of active repository events on GitLink still lags behind platforms like GitHub and Gitee, it exhibits a notable upward trend. On the GitLink platform, Issues events and CommitComment events constitute the vast majority of active repository events.

1.2 Active Repository

1.2.1 Trends in GitHub total number of active warehouses

The following figure shows the statistical analysis of the overall activity trends of GitHub and Gitee active repositories.

1-4

Figure 1.5 Trends in the number of GitHub annual active repositories

According to overall data for 2023, the total number of active repositories worldwide reached 87.92 million, marking a 4.06% increase from the previous year; this aligns with the overall trend in events, which has been declining annually since experiencing high growth from 2018 to 2020. This decline could stem from the COVID-19 pandemic and global economic developments.

Because of the gap in the number of GitHub and Gitee warehouses, the following analytical work is also based on 30,000 active repositories in front of each platform.

1.2.2 Comparison of the overall activity of GitHub and Gitee

The graph below shows the statistical analysis of GitHub and Gitee's overall activity in the repositories.

1-5

Figure 1.6 GitHub vs. Gitee active repository activity

Looking at the activity data of the top 30,000 active repositories from each platform, the overall activity on the Gitee platform grew rapidly from 2019 onwards. By 2022, it surpassed GitHub and maintained this high-growth trend, revealing the enormous vitality of open-source development in China during this period.

1-6

Figure 1.7 GitHub compared to Gitee active repository activity

Furthermore, the detailed analysis of the composition of the activity reveals the following:

On the GitHub platform, the activity stemming from "Create PR" events comprises nearly half of the total activity, while "Merge PR" events contribute to approximately one-fourth. Reviewing PRs contributes around 10% of the activity, while the combined activity from issue creation and comments nearly matches, accounting for 7%.

On the Gitee platform, the highest activity contribution comes from reviewing PRs, constituting two-thirds of the total activity. Similarly to GitHub, "Merge PR" events follow closely behind in activity contribution, with a proportion comparable to that on the GitHub platform. A surprising finding is that while "Create PR" events contribute the highest proportion of activity on GitHub, they contribute the least on the Gitee platform, accounting for only 2% of the total activity events.

1.2.3 GitHub and Gitee overall active repository OpenRank trends vs.

The graph below shows the statistical analysis of GitHub and Gitee's active repository, OpenRank trends.

1-7

Figure 1.8 GitHub vs. Gitee Active Repository OpenRank

Although the activity of the top 30,000 repositories on Gitee briefly surpassed that of GitHub in 2022, the influence gap measured by OpenRank remains significant (approximately 5:2). Not only is the gap considerable but there also seems to be no indication of it narrowing in terms of trends. This is particularly noteworthy and underscores a key area of focus for future open-source development in China.

1.3 Active users

1.3.1 Trends in the total number of active users on GitHub

The following figure presents a statistical analysis of the overall active user count on GitHub.

1-8

Figure 1.9 Trends in GitHub annual active users

In 2023, the total number of active developers in the field reached 21.93 million, an increase of 8.88 percent over the previous year. Like the GitHub active warehouse, after nearly five years of high growth, the growth rate began to decline in 2020. The growth of active users on the GitHub platform began to slow (although the GitHub official announced at the beginning of 2023 that the overall number of users of its platform surpassed 100 million), there was also some correlation with changes in the global situation and the rise of a platform like Gitee.

1.3.2 Active user geographical distribution and ranking

The annual report can include detailed geo-location data analysis for GitHub developers as a contribution to the award-winning game of the OpenDigger Open Source Software Ecological Data Analysis Dredging Platform (OpenSODA).

The following analysis is based on approximately 2 million developers who have correctly filled in their geographical location information out of the 10 million active developers on GitHub in 2023. Considering the total registered users on GitHub to be 100 million, the sampling ratio is approximately 2%.

1. Geographical distribution of global developers

First, analyze developers' geographical distribution worldwide, as shown in the following chart.

1-9

Figure 1.10 Global geographical distribution of developers
Table 1.1 Global Developer Distribution by Country/Region (Top 15)
Ranking States Total Number Percentage Annual Activity Active rate
1 United States 408983 21.09% 236899 57.92%
2 India 177669 9.16% 107066 60.26%
3 China 171039 8.82% 126238 73.81%
4 Brazil 114855 5.92% 83932 73.08%
5 Germany 88767 4.58% 64836 73.04%
6 United Kingdom 83245 4.29% 55175 66.28%
7 Canada 65241 3.36% 42238 64.74%
8 France 57480 2.96% 40341 70.18%
9 Russia 47213 2.43% 31534 66.79%
10 Australia 31638 1.63% 20512 64.83%
11 Poland 31469 1.62% 21792 69.25%
12 Japan 30873 1.59% 21942 71.07%
13 Netherlands 30617 1.58% 21685 70.83%
14 Spain 28928 1.49% 19509 67.44%
15 South Korea 28325 1.46% 21811 77.00%

Overall, developers from various countries are continuously increasing:

  • The United States ranks first due to its early involvement in the open-source domain and its advantage in technology talent.
  • Based on the calculated total number of developers from the United States in the table (409,000), the actual number of developers from the United States on GitHub is estimated to be around 21.01 million, with a deviation of approximately 4% from the official data released by GitHub (22 million).
  • India, China, and Brazil, with their large population bases, rank second, third, and fourth in terms of the number of developers. However, based on the activity rate (annual active users/total users), China has the highest rate among the top four.
  • Developers from European countries also constitute a significant force in the open-source community, collectively ranking second in volume.
  • According to the official data released by GitHub and Gitee (both around 12 million), the total number of global open-source developers from China is likely to exceed 20 million, roughly equivalent to the number from the United States in quantity alone.

2. Geographical distribution of Chinese developers

Further analysis shows the geographical distribution of Chinese developers, as shown in the graph below.Of these, the data sources are almost 150,000 developers of “China” users who correctly fill out provincial information.

1-10

Figure 1.11 Geographical distribution of Chinese developers

According to data from GitHub 2023 Q3 quarter, the total number of Chinese developers is approximately 18.8 million, which can be estimated on the basis of proportion to the total actual developers in each province.

Table 1.2 Distribution of Chinese Developers (Top 15)
Ranking Provinces Total Number National percentage Actual Total
1 Beijing 32982 22.04% 262.25 million
2 Sengah 24581 16.43% 1955.45 million
3 Guangdong 21684 14.49% 172.41 000
4 Zhejiang 14256 9.53% 113.35 million
5 Taiwan 12173 8.13% 96.79 million
6 Jiangsu 7335 4.90% 58.32 million
7 Chechen 7012 4.69% 55.75 million
8 Hong Kong 4678 3.13% 37.19 million
9 Hubei 4415 2.95% 35.1 million
10 Shaanxi 2815 1.88% 22.38 000
11 Fujian 2405 1.61% 19.12 million
12 Shandong 2035 1.36% 16.18 million
13 Hunan 1858 1.24% 14.77 000
14 Chongqing 1833 1.22% 1457 000
15 Annah 1487 0.99% 11.82 million

Ranking and data in the above table reveal the relevance of Chinese open-source developers and regional economic development levels:

  • The number of open source developers in the North, Upper and Zhej's four major cities has surpassed one million classes, particularly in Beijing;
  • The fifth and eighth places respectively of Taiwan and Hong Kong, highlighting the importance of Hong Kong and the Taiwan Strait;
  • The open source developer in the Long Triangle (Jijjiang Zhejushu) region has reached almost 38.8 million;
  • The central western regions, such as Sichuan, Hubei and Shaanxi, have also shown good performance, particularly in Sichuan, which has attracted a large number of developers through their suitable, fast-growing software industries.

1.4 Open source licenses

1.4.1 Number of warehouses using open-source licenses

The graph below shows the number of open-source licenses that GitHub's active repository uses.

1-11

Figure 1.12 Number of warehouses using open source licenses

The analysis revealed that the most used open-source licenses are currently available, including MIT licenses, Apache licenses v2.0, GNU General Public Licence v3.0, and BSD 3-Clause licenses. Of these, MIT licenses rank first to reach 60%. The MIT license is named after the Massachusetts Institute of Technology. The simplicity and flexibility of MIT licenses have made it one of the licenses chosen by many developers and have provided the least legal restrictions to encourage developers to use and disseminate software freely.

1.4.2 Trends in Open-Source Licensing Types

Statistical analysis has been conducted on the trends of open-source license types, as shown in the following figures.

1-12

Figure 1.13 Trends in the Number of Open Source License Types

Overall, the number of open-source license types has continuously increased since 2017. Introducing licenses such as the Eclipse Public License 2.0, the European Union Public License 1.2, and others contributed to the growth observed between 2017 and 2018. Subsequently, the growth rate of open-source license types slowed down. Between 2021 and 2022, a new batch of open-source licenses, such as the Mulan Series Licenses and the CERN License v2, began to emerge. Following this, the development trend stabilized, and currently, the mainstream license types on GitHub have remained steady at 46 types for two years.

1.4.3 Trends in the Number of Repositories Using Open Source Licenses

According to Github's log data, in 2023, nearly 7.7 million active repositories used various open-source licenses, accounting for 8.76% of all active repositories. We present the MIT License's data separately due to its significant influence.

1. Trends in the Number of Repositories Using the MIT License

Statistical analysis of the trends in the number of repositories using the MIT License is shown in the following figure.

1-13

Figure 1.14: Trends in the Number of Repositories Using the MIT License

Observations:

  • The MIT License is currently the most popular open-source license, with 1.58 million active repositories in 2023.
  • The trends in the number of repositories using the MIT License are similar to those of the total repository count, with significant growth observed. However, the growth rate slowed down in 2022 and 2023, which correlates with the overall slowdown in project growth.

2. Trends in the Number of Repositories Using Other Top Five Open Source Licenses

The following figure shows a statistical analysis of the trends in the number of repositories using other top-five open-source licenses.

1-14

Figure 1.15: Trends in the Number of Repositories Using Other Licenses

Observations:

  • The number of open-source licenses is growing, with MIT, Apache, and GNU licenses remaining the top choices.
  • Differences between niche and popular open-source licenses still exist.
  • Since 2022, the usage of GNU General Public License (GPL) versions 2 and 3 has been declining overall, while GNU Affero General Public License version 3 has been increasing yearly.

1.4.3 Trends in the Number of Repositories Using the Mulan Series Licenses

The following figure shows a statistical analysis of the trends in the number of repositories using the Mulan Series Licenses.

1-15

Figure 1.16 Accumulative Trends in the Number of Repositories Using the Mulan Series Licenses

The Mulan Series Licenses (including the Mulan Permissive Software License and the Mulan Public License, among others) are drafted, revised, and released by Peking University, with the support of the National Standardization Technical Committee on Cloud Computing and the China Open Source Cloud Alliance. As the first open-source software agreement recognized by the Open Source Initiative (OSI) in China, the Mulan Permissive Software License (Mulan PSL) holds significant influence.

Observations indicate a growth in repositories utilizing the Mulan licenses starting September 2022. By December 2023, there were 220 such active repositories, showcasing the increasing influence of Mulan open-source licenses.

1.5 Programming Languages

1.5.1 Top Programming Languages Used by Developers in 2023

The popularity of programming languages is of great interest to developers. The analysis below presents the most popular programming languages among developers in 2023, as shown in the following table.

Table 1.3: Top 15 Programming Languages Used by Developers
Rank Programming Language Number of Developers Using Number of Repositories Using
1 JavaScript 765,589 1,806,477
2 Python 629,423 653,025
3 HTML 564,121 676,364
4 TypeScript 462,729 886,453
5 Java 368,795 463,660
6 CSS 190,480 239,187
7 C++ 177,905 135,330
8 C# 158,159 180,537
9 Go 143,433 165,367
10 PHP 128,186 272,980
11 Jupyter Notebook 122,475 102,708
12 Shell 122,456 108,209
13 C 107,918 80,159
14 Rust 69,370 72,778
15 Ruby 66,857 374,835
16 Kotlin 64,307 62,709
17 Vue 56,099 170,639
18 SCSS 50,526 44,672
19 Dart 46,143 43,006
20 Swift 33,839 35,978

From the table above:

  • The top five programming languages most used by developers are JavaScript, Python, HTML, TypeScript, and Java, which represent the leading programming languages developers use. Starting from the sixth-ranked CSS, the number of users decreased by nearly half compared to Java, the fifth-ranked language.

1.5.2 Trends in Programming Language Usage from 2019 to 2023

Statistical analysis of developers' programming language usage trends from 2019 to 2023 is depicted in the following figure.

1-16

Figure 1.17: Trends in Programming Language Usage from 2019 to 2023

Observations from the figure:

  • JavaScript, Python, HTML, TypeScript, and Java are the leading programming languages developers use.
  • Python and TypeScript have shown rapid growth compared to the other three primary languages and have maintained a consistently rapid growth trend over the past five years.
  • TypeScript, in particular, has experienced rapid growth in the number of users over the past five years. In 2021, it significantly surpassed other programming languages, becoming one of the main programming languages developers use. Perhaps by 2024, the number of developers using it will be comparable to the number of developers using HTML, which is ranked third.

2. OpenRank Rankings

Rankings are a popular form of presenting analysis results.

The 2023 China Open Source Annual Report separates the rankings into a dedicated section for centralized display. This is partly to showcase better the development trends of various entities (repositories/projects, countries/regions, enterprises, foundations, developers, etc.) in the open source ecosystem, and another important reason is the maturation of the OpenRank indicators and the completeness of global data.

With the addition of global data from both GitHub and Gitee this year, we are able to take a global perspective with China's open source as the starting point, allowing the world to see the joint efforts and contributions of Chinese enterprises, foundations, developers, and other entities in developing the global open-source ecosystem, which is not available in other reports on the market.

2.1 Global Open Source Repository OpenRank Rankings

2-1

Figure 2.1 Global Open Source Project OpenRank Rankings (Top 20)

2.2 China Open Source Project OpenRank Rankings

2-2

Figure 2.2 China Open Source Project OpenRank Rankings (Top 20)

Chinese open-source projects are based on data from the OpenDigger project tags, and a single project may include multiple organizations or repositories on GitHub or Gitee platforms.

2.3 Global Enterprise OpenRank Rankings

2-3

Figure 2.3 Global Enterprise OpenRank Rankings (Top 20)

Enterprise rankings are based on data from OpenDigger project tags, meaning the sum of all open source projects initiated by a certain enterprise's OpenRank, including projects donated to foundations.

2.4 China Enterprise OpenRank Rankings

2-4

Figure 2.4 China Enterprise OpenRank Rankings (Top 20)

2.5 Global Foundation OpenRank Rankings

2-5

Figure 2.5 Global Foundation OpenRank Rankings (Top 10)

2.6 Country and Region OpenRank Rankings

2-6

Figure 2.6 Country and Region OpenRank Rankings (Top 20)

Country and region data is based on location information filled in by GitHub developers, with a sample size of the top 10 million OpenRank users globally.

2.7 Global Developer OpenRank Rankings

2-7

Figure 2.7 Global Developer OpenRank Rankings (Top 30)

2.8 China Developer OpenRank Rankings

2-8

Figure 2.8 China Developer OpenRank Rankings (Top 30)

Chinese developer accounts are based on OpenDigger tag data.

3. Enterprise Insights

Enterprises are the core force driving the development of the global open-source ecosystem. They are initiators, as well as developers and maintainers, at the forefront of the development and commercial exploration of open-source projects.

3.1 Evolution of Global Enterprise OpenRank Over the Past 10 Years

3-1

3-1

Figure 3.1 Changes in China Enterprise OpenRank Rankings

Observations on the global impact of enterprise open source are as follows:

  • Microsoft began laying out open source over a decade ago (in 2008) and reached the pinnacle of global open source influence in 2016, a position it has held unchallenged to this day.
  • Since being officially sanctioned by the United States in 2019, Huawei has made open source a strategic priority. It has been soaring ever since and surpassed Google and Amazon this year.
  • Alibaba has been a leader in domestic open source until 2021 and has maintained its sixth position globally.
  • Ant Group's performance in the past three years has been remarkable, and it officially entered the top ten in the world in 2023.
  • Baidu, the fourth largest player in domestic open source, has fallen to 12th globally due to rapid changes in the domestic open source landscape.
  • According to the OpenLeaderboard, Chinese enterprises entering the top 30 globally also include ByteDance (18), PingCAP (19), Feizhiyun (24), Deepin (25), Tencent (26), and Espressif (27).

3.2 Evolution of China Enterprise OpenRank Over the Past 10 Years

3-2

Figure 3.2 Changes in China Enterprise OpenRank Rankings

This chart effectively demonstrates the open-source strategies of domestic companies and their changing trends:

Huawei began to make efforts in 2019 and, in just two years, achieved first place in China and second place globally. As traditional domestic leaders in open source, Alibaba and Ant have shown stable performance.

  • Baidu has slipped to fourth place due to competition from the first three.
  • ByteDance has made visible and rapid progress in recent years.
  • Espressif (Espressif Systems) is a relatively low-profile semiconductor open-source leader in China.
  • Fit2Cloud is another low-key but pragmatic open-source enterprise, with several open-source software under its belt being highly favored by developers.
  • Tencent, PingCAP, JD, and TAOS have shown a slight downward trend in the past two years, indicating that competition in the post-pandemic era will intensify.

3.3 Proportion of China Enterprises' OpenRank on GitHub/Gitee Platforms

3-3 3-4
Figure 3.3 Proportion of China Enterprises' OpenRank among Global Enterprises (Left) and Comparison of OpenRank between Chinese and American Enterprises at the Project Level (Right)

The left chart shows the trend of increasing influence of Chinese enterprises in the global open source ecosystem, while the right chart reflects the trend of ups and downs between China and the United States in the post-trade war era, especially after the pandemic. The influence of Chinese open source has risen significantly, as has the influence of companies like Huawei. However, it can also be seen that the gap between Chinese and American enterprises in overall open source influence is still significant (about 3 times the difference). Still, this momentum is very promising for the future.

4. Foundations Insights

This section examines the development of open-source ecology from a foundation perspective. Foundations are non-profit organizations that play a crucial role in organizing, developing, and innovating open-source projects and communities. They provide comprehensive support in technology, operations, and law to incubate open-source software and guide the building and operation of open-source communities. Foundations act as incubators and accelerators and are essential organizers of the open-source ecosystem. This year, we have included a separate section on insights from open-source foundations, where we can see the global impact of China's open-source foundations.

4.1 Global Foundation OpenRank trend analysis

Figure 4.1 Global Foundation OpenRank Overall Trend

The following trends can be seen in:

  • The Apache Foundation's #1 ranking has evolved at a mature and steady pace, and today it remains the first choice for many companies to develop globalization projects;
  • OpenAtom Open Source Foundation was founded more than three years ago, the rapid development of its projects, and the total impact of its projects beyond the Linux Foundation's sub-foundations, ranked second only after the Apache Foundation;
  • LF AI & Data ranked third, outpacing CNCF in cloud-native due to advancements in AI.;
  • The development of the other (sub)foundations has generally been relatively stable..

4.2 Global Foundation project OpenRank trend analysis

Figure 4.2 Global Foundation Project OpenRank Trends

In terms of open source projects under the Global Foundation:

  • Kubernetes continues to rank first, but influence declines every year, giving way to projects in emerging areas;
  • Doris, an open source real-time data warehouse initiated by Baidu under the Apache Foundation, has grown rapidly in recent years and ranks second;
  • OpenHarmony, a project of OpenAtom Open Source Foundation, and its various sub-repositories are a close second. If combined, they would rank #1.

4.3 Analysis of Trends in OpenRank Projects under Foundation in China

Figure 4.3 Trends in OpenRank Projects under Foundation in China

Chinese projects under various foundations are examined separately:

  • Doris and OpenHarmony are developing most noticeably;
  • The Milvus Vector Database has experienced rapid growth due to demand in the AIGC domain;
  • Projects like Flink and ShardingSphere are relatively stable.

4.4 Analysis of Trends in OpenRank Projects under the Open Atom Foundation

Figure 4.4 Trends in OpenRank Projects under the Open Atom Foundation

This year marks the first time we can observe the development of projects under the Open Atom Flag:

  • The top three are OpenHarmony, openEuler, and Anolis, representing the absolute status of the operating system, especially OpenHarmony, which is developing the fastest;
  • Other listed projects are developing steadily, and we look forward to their progress in the new year.

5. Technological insights

The technology field is rapidly evolving, especially in various subfields. Operating systems are being developed in new architectures, cloud native are driving digital transformation, databases are becoming the infrastructure for data innovation, big data is facilitating intelligent decision-making, artificial intelligence is accelerating automation in various industries, and front-end technologies are focusing on interaction and aesthetics. These areas are at the forefront of technology, attracting innovators and investors and creating a booming trend. In this section, we will provide insights into these six areas in terms of two metrics: influence and activity.

5.1 Overall development trend of six major technology areas in the past five years

5-1

Figure 5.1 Trends in OpenRank by subfield over the last 5 years

5-2

Figure 5.2 Trends in activity by subfield over the past five years

Cloud-native computing and artificial intelligence (AI) have gained popularity in the past five years, reflected in their increased number of repositories. Databases remain critical, while the influence of front-end development is shrinking. Operating systems have a smaller number of repositories but hold great value.

5.2 5-Year Trends in OpenRank and Activity for the Top 10 Projects in Each Technology Area

5.2.1 Cloud Native

5-3

Figure 5.3 Trends in the Cloud-Native Top 10 OpenRank Projects over the Last Five Years

5-4

Figure 5.4 Cloud-Native Top 10 Active Project Trends in the Last Five Years

Both indicators of Kubernetes have significantly decreased, while Grafana has emerged as the top influencer. The llvm-project has shown remarkable growth and has become the most active project in the past three years. LLVM is a compiler framework that comprises a collection of modular and reusable compiler as well as toolchain technologies. Its rapid growth in popularity among developers is a testament to its effectiveness.

5.2.2 Artificial intelligence

5-5

Figure 5.5 Trends in the AI Top 10 OpenRank Projects over the Last Five Years

5-6

Figure 5.6 Artificial Intelligence Top 10 Active Project Trends in the Last Five Years

TensorFlow has been declining and is out of the top 5, while Pytorch is growing and widening the gap. LangChain, an open-source software project by Harrison Chase, is in second place in both indicators since it launched in October 2022 and is now one of the most popular frameworks for LLM development.

5.2.3 Big Data

5-7

Figure 5.7 Trends in the Big Data Top 10 OpenRank Projects in the Last Five Years

5-8

Figure 5.8 Big Data Top 10 Active Projects Trends in the Last 5 Years

Kibana and Grafana are the top two big data solutions, with a consistent upward trend. Grafana is predicted to surpass Kibana and become the top-ranked solution in the future.

Kibana is an open-source tool for data visualization and exploration, tightly integrated with ElasticSearch.

Grafana is an open-source tool for monitoring and reporting. It can visualize data from various sources, including Prometheus, InfluxDB, and Graphite, among others. Grafana's data processing and visualization features enable the creation of different charts and dashboards.

5.2.4 Database

5-9

Figure 5.9 Trends in the Database Top 10 OpenRank Projects over the Last Five Years

5-10

Figure 5.10 Database Top 10 Active Project Trends in the Last Five Years

Doris is the fastest-growing database, with activity metrics nearing the top spot, while ElasticSearch is dropping back in popularity. It is predicted that Doris will surpass ClickHouse in the future.

ClickHouse is an open source MPP architecture designed by Yandex. It analyzes large amounts of data and is claimed to be 100-1000x faster than traditional databases. Key feature: high-performance vectorized execution engine. Also known for rich functionality and reliability.

Apache Doris is contributed by Baidu open source MPP analytical database products , distributed architecture is simple , easy to operate and maintain .

5.2.5 Frontend

5-11

Figure 5.11 Trends in the Frontend Top 10 OpenRank Projects over the Last Five Years

5-12

Figure 5.12 Frontend Top 10 Active Project Trends in the Last Five Years

While declining in both indicators year over year, Flutter still has a clear advantage over Next.js, which started to gain momentum in 2023 and is rising significantly. The 3-10 ranked programs are highly competitive, with little gap between them.

Flutter is a framework developed and supported by Google. Front-end and full-stack developers use Flutter to build the user interface of applications for multiple platforms with a single code base.

Next.js is an open source platform created by Vercel, built with Node.js and Babel translators and designed for use with React Single Page Application Framework. In addition, Next.js provides many useful features, such as preview mode, rapid developer compilation and static export.

5.2.6 Operating system

5-13

Figure 5.13 Trends in the Operating System Top 10 OpenRank Projects over the Last Five Years

5-14

Figure 5.14 Operating System Top 10 Active Project Trends in the Last Five Years

As you can see, several repositories under the OpenHarmony project are in the top 10 list. This insight combines data from the Gitee platform so you can more intuitively see the advantages of domestic operating systems in various aspects (there are several repositories under the OpenHarmony project, and this insight analyzes them in terms of repositories). SerenityOS has fallen back a bit since 2021 and is second only to OpenHarmony and OpenEuler, which also have good performance.

5.3 OpenRank Top 10 list for each field in 2023

Below are the OpenRank rankings for projects in each field for 2023.

5.3.1 Cloud Native

Table 5.1 Top Projects in Cloud Native

Number Project Name OpenRank
1 grafana/grafana 7134.37
2 lvm/llvm-project 7049.62
3 kubernetes/kubernetes 5374.14
4 ClickHouse/ClickHouse 4941.99
5 cilium/cilum 3215.42
6 ceph/ceeph 3172.49
7 keycloak/keycloak 3095.56
8 gravitational/teleport 3082.18
9 envoyproxy/envoy 2929.08
10 backstopage/package 2903.39

5.3.2 Artificial Intelligence

Table 5.2 Top Projects in Artificial Intelligence

Number Project Name OpenRank
1 pytorch/pytorch 10182.45
2 langchain-ai/langchain 6080.25
3 Paddle/Paddle 5408.62
4 huggingface/transformers 4422.84
5 AUTOMATIC1111/stable-diffusion-webui 3881.6
6 openvinoolkit/openvinvinino 3857.31
7 microsoft/onnxruntime 3006.75
8 tensorflow/tensor 2723.26
9 Significant-Gravitas/AutoGPT 2664.85
10 ggerganov/llama.cpp 2339.8

5.3.3 Big Data

Table 5.3 Top Projects in Big Data

Number Project Name OpenRank
1 elastic/kibana 7601.04
2 grafana/grafana 7134.37
3 ClickHouse/ClickHouse 4941.99
4 airbytehq/airbyte 4658.86
5 apache/doris 4307.26
6 elastic/elasticsearch 3729.39
7 apache/airflow 3642.9
8 StarRocks/starrocks 3194.56
9 trinodb/trino 2703.4
10 apache/spark 2654.02

5.3.4 Database

Table 5.4 Top Projects in Database

Number Project Name OpenRank
1 ClickHouse/ClickHouse 4941.99
2 apache/doris 4307.26
3 elastic/elasticsearch 3729.39
4 cockroachdb/cockroach 3443.7
5 StarRocks/starrocks 3194.56
6 trinodb/trino 2703.4
7 apache/spark 2654.02
8 pingcap/tidb 2200.38
9 milvus-io/milus 2001.11
10 yugabyte/yugabyte-db 1940.75

5.3.5 Frontend

Table 5.5 Top Projects in Frontend

Number Project Name OpenRank
1 flutter/futter 9361.81
2 vercel/next.js 6638.65
3 appsmithorg/appsmith 3474.07
4 nuxt/nuxt 3387.23
5 facebook/react-native 3260.55
6 Ant-design/ant-design 3053.25
7 nodejs/node 2736.37
8 angular/angular 2273.82
9 Electron/electron 1773.31
10 denoland/denoo 1654.01

5.3.6 Operating system

Table 5.6 Top Projects in Operating System

Number Project Name OpenRank
1 openharmony/docs 3277.69
2 openharmony/arkui_ace_engagement 2818.09
3 SerenityOS/serenity 2257.68
4 openharmony/graphic_graphic_2d 1239.6
5 openeuer/docs 1206.9
6 openharmony/xts_acts 1186.06
7 openharmony/arkcompiler_ets_runtime 961.99
8 openharmony/interface_sdk-js 910.91
9 reactos/reactos 745.23
10 armbian/build 679.1

6. Insights on open source projects

In 2023, large AI models like GPT-4 and CLIP emerged, leading to competition among global enterprises to invest in research and development for cutting-edge technologies like language understanding and image generation. The industry saw rapid evolution, marking the beginning of a new era in the broad application of AI. The database field experienced a trend of innovation with various technologies like distributed databases, time-series databases, and graph databases emerging to cater to different application scenarios. Cloud-native databases became popular, offering flexible scaling and high availability. This section provides data insights on project types by statistically analyzing project topics. In-depth insights are also provided into the two core areas of database and AI.

6.1 Type of project

This subsection selects the top 10,000 active GitHub repositories for statistical analysis.

6.1.1 Ratios for different project types

6-1

Figure 6.1 Ratios for different project types
  • Software development primarily comprises components and frameworks (libraries and frameworks), which constitute 31.36% of it. Developers enjoy using these open-source collaborative innovations, which are the most popular types to contribute to;
  • The Application Software category is second only to the Component Framework category (24.34%) due to its utility, enabling all users (not just developers) to utilize open source software in a variety of industries and domains;
  • Non-Software content holds a significant share of 23.17%. It shows the growing trend of open-source as a collaborative development model that extends to the entire content domain, including documentation, education, art, hardware, and other non-programming-related areas;
  • Developers find the Software Tools category valuable as it allows them to focus on building software applications and products, making up 18.9% of their work;
  • The System Software category comprises fundamental software, accounting for only 2.3% of the total despite its immense value and complexity.

6.1.2 Percentage of OpenRank by Project Type

6-2


Figure 6.2 Percentage of OpenRank by Project Type

Let's take this a step further and look at these categories through the lens of OpenRank influence:

  • The most significant change is that content resource type (Non-Software) projects have relatively low impact, although they have high activity;
  • System Software, on the other hand, has a small percentage of activity but a relatively large percentage of influence, and a similar phenomenon can be observed with Software Tools projects;
  • The component framework type and the application software type have not changed much, and both are among the more prevalent types.

6.1.3 OpenRank Trends by Project Type in the Last 5 Years

6-3


Figure 6.3 OpenRank Trends by Project Type in the Last 5 Years

As you can see from the five-year OpenRank evolution chart above, the influence of the System Software category is increasing year by year, while the influence of the Non Software category is decreasing.

6.2 Project Topic Analysis

This section also analyzes the top 10,000 active GitHub repositories and obtains insights from the Topic tags under the repositories.

6.2.1 Top Topic

6-4

Figure 6.4 Top 10 appearances of Topic

The top 10 topics cover a diverse range of areas, demonstrating the broad interest of the open-source community. JavaScript, Hacktoberfest, and Python are some of the most popular topics, representing hotspots for cutting-edge technologies, active community activities, and versatile programming languages. These topics highlight the interest in front-end development, open-source contributions, and interdisciplinary programming.

6.2.2 Overall OpenRank Trends for Repositories of Popular Topics

6-5

Figure 6.5 OpenRank trends for repositories with top 10 Topic occurrences (2019 - 2023)

  • Hacktoberfest is an annual event that takes place in October. It aims to promote the open-source community and is organized by DigitalOcean in collaboration with GitHub. The goal of the event is to encourage more people to participate in open-source projects and contribute to the community. OpenRank is used to measure people's enthusiasm for open-source projects, community involvement, and contributions. Developers play an active role in the campaign by submitting Pull Requests to open-source projects, thus helping to increase the reputation and influence of the repository.
  • JavaScript and Python:technologies have maintained relatively stable trends over the past few years, with no significant growth or decline.

6.3 Project analysis in databases

This section uses information from open-source databases, which are disclosed in the Database of Databases and DB-Engines Ranking. The field is divided into 18 subcategories based on the storage structure and usage of databases. These subcategories include Relational, Key-value, Document, Search Engine, Wide Column, Time Series, Graph, Vector, Object Oriented, Hierarchical, RDF, Array, Event, Spatial, Native XML, Multivalue, Content, and Network. We then collect and analyze corresponding database information on GitHub. We examine the corresponding open-source projects for each database and gather and analyze their collaboration log data on GitHub. This helps us gain detailed insights into the field.

6.3.1 2023 OpenRank and Activity Lists by Subdomain in the Database Domain

1, OpenRank Rankings for Database Subdomains

Table 6.1 OpenRank Rankings for Database Subdomains

Ranking Subfield Name OpenRank
1 Relational 58092.36
2 Key-value 21834.08
3 Document 17264.93
4 Search Engine 8093.77
5 Wide Column 7896.43
6 Time Series 7813.54
7 Graph 5196.52
8 Vector 4965.41
9 Object Oriented 3104.07
10 Hierarchical 1355.4
11 RDF 592.68
12 Array 383.95
13 Event 256.59
14 Spatial 224.05
15 Native XML 209.51
16 Multivalue 15.89
17 Content 3.43

2, Activity Rankings for Database Subdomains

Table 6.2 Activity Rankings for Database Subdomains

Ranking Subfield Name Activity
1 Relational 161025.44
2 Key-value 62501.64
3 Document 49400.11
4 Search Engine 23799.87
5 Time Series 22077.57
6 Wide Column 21292.17
7 Vector 16395.88
8 Graph 14947.43
9 Object Oriented 8418.14
10 Hierarchical 3406.55
11 RDF 1701.67
12 Array 1280.14
13 Native XML 737.94
14 Spatial 680.79
15 Event 654.42
16 Content 33.94
17 Multivalue 12.68

The OpenRank and activity rankings for 2023 for each sub-domain of the database domain show that:

  • Relational, key-value, and document databases are the top three subdomains, accounting for over 70% of the database domain;
  • Relational's two indicators exceeded those of the second through fifth-place finishers combined and accounted for more than 40 percent of the database field, making it a mega-subcategory.

6.3.2 Trends over the last five years in projects under the various subfields of the database area

6-6

Figure 6.6 Trends in OpenRank by Subdomain in Database Domain (2019 - 2023)

6-7

Figure 6.7 Trends in Activity by Subdomain in Database Domain (2019 - 2023)

The trend of OpenRank and the trend of activity of projects in each subdomain of the database domain over the past five years shows that:

  • Over the past five years, Relational, Key-value, and Document have consistently ranked in the top three in both indicators;
  • Search Engine, Wide Column, Time Series, Graph, Vector, and Object Oriented ranked fourth through ninth, with both indicators trending upward;
  • Search Engine and Vector subcategories have shown a fast growth rate. Search Engines have jumped two positions to become the fourth largest subcategory. Vector is still competing with the Graph subcategory and has the potential to improve its OpenRank. The influence created by the large model has not yet subsided, and it is predicted that Vector will overtake Graph by 2024.

6.3.3 Open source quadrant map of projects under each sub-domain of the database domain

There are three metrics involved in the Open Source Quadrant diagram: Activity, Openrank, and CommunityVolume. CommunityVolume is the same formula as the Attention metric in open-digger, i.e. a weighted sum of the number of stars and the number of forks of the target project in a given period of time:sum(1*star+2*fork).

Quadrant plotting methods:

  1. Select the Top 10 projects by activity for each database subcategory;
  2. Make a log(x)-log(y) scatterplot of log(openrank)-log(communityvolume), the base of the log is 2, denote the number of half-lives required for the spatial influence openrank and the temporal influence communityvolume to decay to 1, respectively.
  3. The vertical line corresponding to the mean value of the horizontal coordinates of all points on the graph is used as the vertical axis, and the horizontal line corresponding to the mean value of the vertical coordinates of all points on the graph is used as the horizontal axis to divide into four quadrants.

There are a total of 18 subcategory labels in the database domain, and the top 9 categories that account for more than 1% of activity in 2023 were selected for statistical analysis to map the open source quadrant as follows:

Figure 6.8 Relational Database OpenRank-CommmunityVolume log-log Open Source Quadrant Map

Figure 6.9 Key-Value Database OpenRank-CommmunityVolume log-log Open Source Quadrant Map

Figure 6.10 Document-based databases OpenRank-CommmunityVolume log-log Open Source Quadrant Chart

Figure 6.11 Search Engine OpenRank-CommmunityVolume log-log Open Source Quadrant Chart

Figure 6.12 Time series database OpenRank-CommmunityVolume log-log Open Source Quadrant Chart

Figure 6.13 wide column database OpenRank-CommmunityVolume log-log Open Source Quadrant Chart

Figure 6.14 Vector database OpenRank-CommmunityVolume log-log Open Source Quadrant Chart

Figure 6.15 Graph database OpenRank-CommmunityVolume log-log Open Source Quadrant Chart

Figure 6.16 object-oriented database OpenRank-CommmunityVolume log-log Open Source Quadrant Chart

Figure 6.17 Top 9 Subcategory Databases by Activity OpenRank-CommmunityVolume log-log Open Source Quadrant Chart

The search engine category is highly polarized, with projects like ElasticSearch with high OpenRank and CommmunityVolume, and projects like Sphinx and Xapian with very low OpenRank and CommmunityVolume.

From the first quadrant: relational, document, search engine, and vector are all database types with strong openrank influence and CommmunityVolume focus, while object_oriented is relatively weak in both areas.

The Open Source Quadrant plot shows the vertical distribution of the Top 9 subclasses of databases in terms of activity. Among these subclasses, two stand out - search engine and vector. These two subclasses have a higher community volume than OpenRank, which means they have more active contributors. They also have a higher community voice, meaning their opinions and feedback are more valued. Additionally, they are known for faster development expectations compared to the other subclasses.

6.4 Project Analysis of Generative AI Area

This section will examine the open-source projects related to generative AI, using the Generative AI Open Source (GenOS) Index as a reference point. We will classify these projects into four subcategories: tools, models, applications, and infrastructure. The detailed insights are outlined below:

6.4.1 Growth trends in subfields of generative AI over the past five years

6-8

Figure 6.18 OpenRank Trends in Generative AI by Subdomain, 2019 - 2023

6-9

Figure 6.19 Activity Trends in Generative AI by Subdomain, 2019 - 2023
  • Categorization analysis of activity and influence across models, tools, apps, and infrastructure reveals consistent trends;
  • AIGC open source projects in the modeling category are more influential and active than those in the tools and applications categories;
  • The modeling category has grown rapidly since 2022 and surpassed Infrastructure in 2023. AIGC's innovative application development had a significant breakthrough in 2023, leading to concurrent application growth.

6.4.2 Trends in OpenRank and Activity Top 10 for Projects in the Generative AI Domain

6-10

Figure 6.20 5-Year Trend of OpenRank Top 10 Projects in Generative AI

6-11

Figure 6.21 5-Year Trend of the Top 10 Active Projects in Generative AI
  • langchain is ranked #1 in terms of influence and activity and is highly regarded by developers;
  • transformers has been the reigning champion in the AIGC field for the past few years, and its position is expected to remain unchallenged until 2023. This project has significantly impacted both the academic and open-source communities, showcasing its groundbreaking capabilities;
  • stable-diffusion-webui is an AIGC tool that has gained a lot of attention from developers. It has surpassed "Transformers" in terms of activity and is likely to surpass it in terms of influence by 2024;
  • Since being open-sourced in 2023, several AIGC projects have gained significant influence and activity, placing them on the Top 10 list. This highlights the rapid pace of change in the field of AIGC.

6.4.3 Top 10 List of OpenRank and Activity of Projects in Generative AI in 2023

1. List of OpenRank Top 10 Projects in Generative AI

Table 6.3 OpenRank Rankings in Generative AI
Ranking Project Name OpenRank
1 langchain-ai/langchain 6080.25
2 huggingface/transformers 4422.84
3 AUTOMATIC1111/stable-diffusion-webui 3881.6
4 Significant-Gravitas/AutoGPT 2664.85
5 ggerganov/llama.cpp 2339.8
6 oobabooga/text-generation-webui 2242.5
7 milvus-io/milus 2001.11
8 run-llama/llama_index 1913.01
9 facebookincubator/velox 1589.53
10 invoke-ai/InvokeAI 1571.45

2. List of Top 10 Active Projects in Generative AI

Table 6.4 Activity Rankings in Generative AI
Ranking Project Name Activity
1 langchain-ai/langchain 22563.04
2 AUTOMATIC1111/stable-diffusion-webui 13933.03
3 huggingface/transformers 13618.11
4 Significant-Gravitas/AutoGPT 10961.81
5 cobabooga/text-generation-webui 8597.33
6 ggerganov/llama.cpp 8108.62
7 run-llama/llama_index 7532.47
8 milvus-io/milus 6488.35
9 facebookincubator/velox 4923.05
10 Chatchat-space/Langchain-Chatchat 4477.63

7. Developer Insights

Developers are vital to open-source innovation. They create and supply open-source projects and contribute significantly to them. The total number of developers and their collaboration mechanism impact the amount of contribution. In this section, we will analyze data on individual developers at national and regional levels.

7.1 Geographical distribution of developers

This analysis, like the one in Section 1.3, is based on 10 million active GitHub developers. Out of the 100 million registered users on GitHub, only 2 million developers have provided accurate geolocation information, which makes up a 2% sample.

1. GitHub Active Developers Distribution Map

The number of active developers on GitHub was first visualized on a map, as shown below.

7-1.png

Figure 7.1 2023 GitHub Active Developers Distribution Map

GitHub developers are concentrated in areas with large populations and fast internet development, such as coastal regions of China, Europe, the United States, India, and the southeast coast of Brazil. They are sparsely distributed in other areas with small populations or less developed internet.

2. GitHub Active Developers by Country / Region

7-2.png

Figure 7.2 GitHub Active Developers by Country / Region

Table 7.1 2023 Ranking of Countries/Regions by Number of Active Developers

Ranking States Number of active
1 United States 236899
2 China 113893
3 India 107066
4 Brazil 83932
5 Germany 64836
6 United Kingdom 55175
7 Canada 42238
8 France 40341
9 Russia 31534
10 Japan 21942

The United States has the largest number of developers, followed by China, India and Brazil, while other countries with a certain population and economic level, such as Canada and some European countries, also have a large number of developers on GitHub.

3. Distribution of Active Developers on GitHub in China

The graph below visualizes the distribution of the number of active developers on GitHub on a map.

7-4.png

Figure 7.3 2023 Distribution of Active Developers in China

Table 7.2 2023 Regional Ranking of Active Developers in China

Ranking Regions Quantity
1 Beijing 24151
2 Sengah 18215
3 Guangdong 16153
4 Zhejiang 10927
5 Taiwan 8823
6 Jiangsu 5437
7 Chechen 5311
8 Hong Kong 3344
9 Hubei 3273
10 Shaanxi 1993

Beijing is found to have the most GitHub users in China, followed by Shanghai, Guangzhou, and Zhejiang. Most of China's active GitHub users are in the eastern coastal regions, while some central provinces such as Shaanxi, Hunan, and Hubei also have a lot of active users, and it's worth noting that Sichuan has the most active GitHub users outside of the coastal regions.

4. GitHub China Developer Influence Distribution after OpenRank Weighting

Trying to do the aggregation with the OpenRank value of the developers in each region, we get the influence distribution map and regional ranking of Chinese developers, as shown in the following graph.

7-3.png

Figure 7.4 OpenRank influence distribution of Chinese developers

Table 7.3 OpenRank Influence Ranking in China

Ranking Regions OpenRank
1 Beijing 506624.08
2 Sengah 435804.42
3 Guangdong 306014.24
4 Zhejiang 274284.92
5 Taiwan 216991.49
6 Chechen 96881.79
7 Jiangsu 83321.13
8 Hong Kong 83238.46
9 Hubei 51370.74
10 Fujian 33482.25

As you can see from the rankings, the OpenRank regional rankings are highly consistent with the regional rankings for the number of active developers:

  • There are significant regional differences in terms of the influence of Chinese developers. Developers from Beijing and Shanghai dominate the first class, while developers from Guangdong, Zhejiang, and Taiwan fall into the second class. These regions have a different level of influence compared to those ranked lower;
  • The overall number of active people in Sichuan is smaller than in Jiangsu, but the overall influence is greater, and the same phenomenon occurs in Fujian and Shaanxi.

7.2 Developer Working Hours Analysis

This section analyzes the working hours of GitHub and Gitee developers. By default, the time is in the UTC zone, with an 8-hour lag compared to the East Eighth Time Zone, i.e., Beijing Standard Time. The data is scaled to the [1-10] range by default using the min-max method, with larger dots representing higher values in the time zone graph.

7.2.1 Distribution of working hours of global developers

Distribution of working hours of GitHub-wide developers

According to statistics on developers' working hours across GitHub, the majority of developers work between 6 and 21 hours. There is a higher concentration of developers working at 12 o'clock, likely due to timed tasks. Weekends (Saturdays and Sundays) are relatively inactive.

7-5.png

Figure 7.5 GitHub-wide developer working hours in 2023

Distribution of working hours of Gitee-wide developers

7-6.png

Figure 7.6 Gitee-wide developer working hours in 2023

The Gitee data clearly aligns more with the East Eighth Time Zone's work time routine.

Global developer working hours distribution, excluding bots

7-7.png

Figure 7.7 2023 Global Developers' Working Hours, Excluding Robots

RAfter removing the bot data, it is found that the time distribution of developers is more prevalent in the interval of 6:00 - 21:00, which is more evenly distributed.

7.2.2 Distribution of working hours on the project

Below is a comparison of the working hours distribution of the top four Chinese OpenRank repositories and the top four global OpenRank GitHub repositories in 2023.

Distribution of working hours on the top four OpenRank projects in the global GitHub repository

  1. NixOS/Nixpkg

7-8.png

Figure 7.8 NixOS/nixpgs Working Hours in 2023
  1. Home-assistanceant/core

7-9.png

Figure 7.9 home-assistant/core Working Hours in 2023
  1. microsoft/vscode

7-10.png

Figure 7.10 Microsoft/vscode Working Hours in 2023
  1. MicrosoftDocs/azure-docs

7-11.png

Figure 7.11 MicrosoftDocs/azure-docs Working Hours in 2023

Distribution of working hours of the top 4 OpenRank repositories in China

  1. OpenHarmony

7-12.png

Figure 7.12 OpenHarmony Working Hours in 2023
  1. openEuler

7-13.png

Figure 7.13 openEuler Working Hours in 2023
  1. PaddlePaddle

7-14.png

Figure 7.14 PaddlePaddle Working Hours in 2023
  1. MindSpore

7-15.png

Figure 7.15 MindSpore Working Hours in 2023

7.3 Developer Role Analysis

This section categorizes GitHub users into four roles: Explorer, Participant, Contributor, and Committer, based on events they trigger in open-source repositories. The four roles are defined in the table below.

Table 7.5 Four Roles of Developer
Roles Definitions Meaning
Explorer Users who star a project Indicates the user has some interest in the project
Participants Users who have made an Issue or Comment on a project Indicates user participation in the project
Contributor Users with Pull Requests (PRs) for a project Indicates that the user has contributed to the project's code base
Commiter Users participating in PR-review or merge Indicates that the user has contributed deeply to the project

The figure below shows the four cascaded and structured roles. Using the defined role structure, we evaluate the top 10 projects in the OpenRank rankings of GitHub-wide projects from three perspectives: number of roles, time change, and developer role evolution. This is based on the project ranking list in Part II.

7-16.png

Figure 7.16 Developer Roles and Relationships

7.3.1 Distribution of roles

Table 7.6 Distribution of the number of developer roles for the top 10 projects in the OpenRank rankings
Repository name Explorer Participant Contributor Committer
NixOS/Nixpkg 6244 3381 3074 2638
Home-assistanceant/core 17777 9116 1230 905
microsoft/vscode 20113 16027 525 339
MicrosoftDocs/azure-docs 8939 2282 1591 610
pytorch/pytorch 13237 6391 1230 685
godotenine/godot 23426 7203 1020 569
flutter/futter 14056 11101 637 334
odooo/odoo 5078 1841 930 570
digitalinnovationone/dio-lab-open-source 3619 907 504 40
microsoft/winget-pkgs 1852 1395 1384 286

7-17.png

Figure 7.17 Developer Role Distribution Map

Spring:

  • Based on the number of explorers, the three most popular projects are godotengine/godot, microsoft/vscode, and home-assistant/core, suggesting they have received widespread attention and support;
  • microsoft/vscode is the project with the largest gap between the number of participants and contributors, while microsoft/winget-pkgs has the smallest gap between the two;
  • NixOS/nixpkgs has the highest number of committers at 2,638 compared to other projects. In contrast, the digitalinnovationone/dio-lab-open-source project has the lowest number of committers.

7.3.2 New additions to roles in 2023

Role additions are counted as valid additions to role X if a user who was not in role X (e.g., a contributor or submitter role) before 2023 becomes in that role in 2023.

For example, if A submits a PR to Project B in 2021 (but never participates in the Code Review process), and A reviews the PR in Project B in 2023, A is the new committer.

The details of the roles added are shown in the graph below and the table below.

7-18.png

Figure 7.18 Map of new roles in the open source community in 2023
Table 7.7 Distribution of the number of new developer roles for the top 10 projects in the OpenRank rankings
Repository name New Committer New Contributor New Participant New Explorer
NixOS/Nixpkg 1226 1622 1591 3027
Home-assistanceant/core 538 808 4640 8998
microsoft/vscode 263 394 10216 15746
MicrosoftDocs/azure-docs 352 1420 3913 1579
pytorch/pytorch 391 802 2083 13016
godotenine/godot 386 708 2834 22996
flutter/futter 184 455 3954 13579
odooo/odoo 244 453 472 4991
digitalinnovationone/dio-lab-open-source 40 3611 732 504
microsoft/winget-pkgs 231 957 485 1373

The results showed:

  • The repository godotengine/godot received the highest number of stars, 22,996, with half added in September 2023 due to game developers seeking open-source alternatives to Unity's new charging strategy. Meanwhile, digitalinnovationone/dio-lab-open-source and Microsoft/winget-pkgs received the fewest new stars, 504 and 1,373, respectively;
  • The repository with the highest number of new participants was microsoft/vscode with 10,216; digitalinnovationone/dio-lab-open-source had the fewest new Issues with 732;
  • The repository with the highest number of new contributors was NixOS/nixpkgs with 1,622;
  • The repository with the highest number of new committers was also NixOS/nixpkgs with 1,226.

7.3.3 Perspectives on Developer Evolution

The developer evolution process is defined as the number of roles in an open-source community that moves to other roles. This report only measures the number of developers who have moved from one role to a more profound one. For example, a user who participated until 2023 will change from a participant to a contributor in 2023 when they make their first PR.

7-19.png

Figure 7.19 Developer Role Evolution Diagram
Table 7.8 Distribution of the number of role conversions for the top 10 OpenRank projects
Repository name Contributor -> Committer Participant -> Contributor Explorer -> Participant
NixOS/Nixpkg 254 122 168
Home-assistanceant/core 70 113 134
microsoft/vscode 16 70 287
MicrosoftDocs/azure-docs 129 169 21
pytorch/pytorch 60 53 187
godotenine/godot 63 131 330
flutter/futter 31 91 419
odooo/odoo 55 19 32
digitalinnovationone/dio-lab-open-source 0 0 0
microsoft/winget-pkgs 49 11 18

The results showed:

  • Across communities, we can observe the typical funnel model of an evolutionary path from explorers to participants to contributors and committers. In godotengine/godot, for example, 330 contributors successfully evolved to committers, 131 participants became contributors, while 63 explorers evolved to participants. This trend was also observed in other communities and is consistent with the general evolution of community members from initial exploration to deeper involvement.
  • In some communities, such as NixOS/nixpkgs, we observed many contributors evolving into committers. In this community, 254 contributors successfully evolved into committers, which may represent a relatively high demand for code review. This may encourage more contributors to become deeply involved in maintenance, which may help improve the quality and stability of the community's code.
  • In some communities, such as flutter/flutter and godotengine/godot, we observed a relatively high number of successful conversions of explorers into participants. In flutter/flutter, 419 explorers evolved into participants, while in godotengine/godot, 330 explorers turned into participants.
  • The digitalinnovationone/dio-lab-open-source project has no data since it was created in 2023.

7.4 Robot account analysis

Robotic (bot) automation is a significant contributor to open-source collaboration platforms. This section analyzes nearly 600 million repository events across 7.7 million open-source repositories and over 1,200 bot accounts for 2023.

7.4.1 Analysis of active data of robots

7-21 7-20
Figure 7.20 Trend in number of robot events (left) & percentage of robot events in 2023 (right)

Analyzing the robotics activity data from 2015 to 2023, some of the observations are as follows:

Since 2019, the number of bot events has increased significantly, rising from 4,217,635 to 304,257,084. This surge in bot account activity on GitHub can be attributed to the widespread adoption and advancement of GitHub's automation, continuous integration, and continuous deployment (CI/CD) tools between 2019 and 2021.

Despite the small number of bot accounts, each bot serves multiple repositories, demonstrating efficiency and broad reach.

7.4.2 Analysis of event types for robots

7-22.png

Figure 7.21 Difference in number and annual growth rate (%) of GitHub event counts (2022 vs 2023)

This graph shows the change in the number of GitHub events by type and their growth rate between 2022 and 2023. By comparing the data from these two years, we can gain insight into the trend of bot account usage in the development process:

  • Dominance of Code Push: PushEvent dominates bot account activity, with a significant rise in volume especially in 2023, suggesting that bot accounts play an important role in code maintenance and updates;
  • Changes in project creation activity: CreateEvent is very active in 2022, but declines in 2023, which may indicate a decline in bot account activity in creating new projects;
  • Importance of code review and collaboration: PullRequestEvent and IssueCommentEvent numbers were higher in both years, showing the active participation of bot accounts in code reviews and issue discussions;
  • Changes in activity types: DeleteEvent decreases in 2023 compared to 2022, while ReleaseEvent increases, reflecting the different focus of robotic accounts in project lifecycle management;
  • Increase in annotation-related events: CommitCommentEvent and PullRequestReviewCommentEvent increased in 2023, indicating that bot accounts are becoming more active in the code review process with discussions and feedback;
  • Specific uses of bot accounts: less common event types such as GollumEvent, MemberEvent, PublicEvent, and WatchEvent are relatively low in number, suggesting that bot accounts are primarily used for specific automation tasks and are less involved in social interactions.

7.4.3 Distribution of working hours for robot accounts

Similar to the developer working hours distribution, we also analyzed the data on the working hours of bot accounts.

7-23.png

Figure 7.22 Distribution of robot account working hours
  • The working hour distribution of the robot account is mainly centered on 0am to 1am and 12pm to 13pm;
  • Based on the global developer time zones it can be surmised that most automated processes are more active in the early morning and midday hours;
  • Robot work active time is less relevant to workdays and non-workdays, most automated collaborative tasks are scheduled, and fewer are related to responding to a contributor's event.

7.4.4 GitHub's top list of incidents for collaborative bots

7-24.png

Figure 7.23 2023 GitHub's top list of incidents for collaborative bots

8. Case Studies

8.1 openEuler Community Case Study

In 2023, the OpenDigger community integrated Gitee data for the first time, allowing Gitee projects to participate in OpenRank calculations. The openEuler community surpassed PaddlePaddle in the same year, achieving an OpenRank value of 16,728. This made it the second largest open source community in China, after openHarmony.

In 2023, the openEuler community attracted 3,941 developers to collaborate on Issues or PRs, with 1,934 contributors successfully contributing and merging at least one PR to the openEuler community's repository.

It's worth noting that the openEuler community started a document bug hunt in early 2023. They also integrated an interactive page contribution mechanism with Gitee on the community's official document website. This feature enables developers to correct any errors they find while reading the documents directly on the official website. With just a single click, they can launch Gitee lightweight pull requests (PRs), without having to jump to the Gitee platform or perform Git operations.

The data change from this innovative mechanism is impressive. In 2023, the openeuler/docs repository incorporated 7,764 PRs, 74% of which were submitted directly through the official web page. The launch of this mechanism also significantly increased the average number of active contributors per month (from 30 to 80), and the average number of PRs merged per month (from 116 to 722).

One noteworthy project is openeuler/mugen, which is a highly active testing framework project within the openEuler community. In 2023, 138 developers participated in discussions and contributed to the project, with 95 successfully joining PR. The project has the third-highest OpenRank within the openEuler community, after the openeuler/docs documentation repository and the openeuler/kernel kernel repository. This excellent testing framework enables developers to quickly write and test cases to verify the correctness and validity of their contributions, significantly reducing the cost of subsequent contributions.

To summarize, the openEuler community has achieved a high OpenRank value thanks to its effective contribution mechanism and testing framework. The community has designed an interactive system that allows for easy documentation contribution with minimal costs. Moreover, contributors can quickly verify the accuracy of their code through a reliable testing framework. These developer experience optimizations are excellent examples for other open-source communities to follow and implement.

8.2 List of top repositories contributed by Chinese developers

We analyzed how Chinese developers contributed to the top 30 repositories in the OpenRank ranking list for 2023 using data from almost 10 million GitHub developer accounts, including nearly 200,000 from China:

8-1.png

Figure 8.1 Top 30 Contributed Repositories by Chinese Developers on GitHub

Most of the projects are represented in the master OpenRank list, the more interesting ones include:

  • NixOS/Nixpkgs:It's also a top international project, a package management tool for a new operating system, and while most of the updates are package information updates, it also means that the ecosystem of that operating system itself is thriving.

  • Intel-analytics/BigDL:a runtime repository was created to run LLM on the Intel XPU in 2017. However, it became nearly obsolete by the end of 2021. Surprisingly, it made a comeback with the rise of LLM in 2022 and now maintains an active size of around 50 people per month.

8-2

Figure 8.2 BigDL OpenRank Trend Chart

Screenshot above from HyperCRX

  • siyuan-note/siyuan:Siyuan Notes, a privacy-first domestic open source knowledge management tool, supports bidirectional knowledge block-level references and maintains an active community size of one hundred people per month. Supports subscription commercialisation at a very affordable price.

  • baidu/amis:is an open-source low-code page generation framework developed by Baidu. In recent years, low-code projects have gained immense popularity, such as Ali's open-source LowcodeEngine, Harmony ecosystem family's DevEco Studio, etc. These projects have provided great convenience for developers to rapidly develop applications using low-code.

  • Cocos/cocos-engine:domestic game engine leader, with the rise of the concept of meta-verse, godot and other game engines become the world's important top open source projects, and domestic game engine cocos/cocos-engine also has excellent performance in China.

  • MaaAssistantArknights/MaaAssistantArknights This is a fascinating project aimed at automating daily quests for the game Tomorrow's Ark using a script assistant. The automation can be achieved through a mobile phone simulator. The project is community-maintained, open source, free, and supports all desktop platforms. It has received over 10,000 stars and has more than 300 active contributors every month, which is fantastic.

8-3.png

Figure 8.3 MaaAssistantArknights Project Screenshot