Stars
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
Open source Java implementation for Raft consensus protocol.
World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
Guice (pronounced 'juice') is a lightweight dependency injection framework for Java 11 and above, brought to you by Google.
a unified scheduler for online and offline tasks
build/validate hadoop RCs. moved into apache hadoop itself.
Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
A toolkit to run Ray applications on Kubernetes
Scalable, reliable, distributed storage system optimized for data analytics and object store workloads.
A programming framework for agentic AI 🤖 PyPi: autogen-agentchat Discord: https://aka.ms/autogen-discord Office Hour: https://aka.ms/autogen-officehour
《Hello 算法》:动画图解、一键运行的数据结构与算法教程。支持 Python, Java, C++, C, C#, JS, Go, Swift, Rust, Ruby, Kotlin, TS, Dart 代码。简体版和繁体版同步更新,English version ongoing
Coral is a translation, analysis, and query rewrite engine for SQL and other relational languages.
Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.
Bigtop is an Apache Foundation project for Infrastructure Engineers and Data Scientists looking for comprehensive packaging, testing, and configuration of the leading open source big data components.
Build highly concurrent, distributed, and resilient message-driven applications using Java/Scala
The official home of the Presto distributed SQL query engine for big data
Dinky is a real-time data development platform based on Apache Flink, enabling agile data development, deployment and operation.