Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2020 Theme Proposal] Be the Git(Hub) You Wish to See in the World #43

Closed
agentofuser opened this issue Nov 7, 2019 · 6 comments
Closed

Comments

@agentofuser
Copy link

Note, this is part of the 2020 Theme Proposals Process - feel free to create additional/alternate proposals, or discuss this one in the comments!

Theme description

Git is the original merkledagster. It ships with most OSes, has deep integration into the widest variety of tools, processes, and languages (including package managers!), a huge installed base of skiled users and educational material, and runs the gamut from non-technical user-facing content management systems (like NetlifyCMS) to enterprise-grade petadata pipelines. Its straightforward hashlinked architecture is the content-addressed lingua franca of data and software, making it the ideal nexus between the legacy and the decentralized worlds.

This proposal consists of two goals:

  1. "ipgit": Wrap the git command line interface in a fully retrocompatible way, adding ongoing, transparent, bidirectional IPFS and p2p superpowers
  2. "iphub": Make adversarial interoperability with GitHub a top priority, starting with making a read-only (Gatsby-based?) static website generator version of its repository browsing, issues, and pull-requests pages, then moving to ongoing, bidirectional bridges with writeable APIs like @MichaelMure's excellent git-bug.

Core needs & gaps

Please describe in more detail what needs or gaps in our current state this theme addresses, and how it will create value for the IPFS ecosystem.

GitHub is likely the biggest single point of failure in the open source world right now. The need to decentralize it has been stated and persuasively argued about at length, and I think by now it's a given.

In particular, IPFS' 2019 goal of decentralizing package managers tried to tackle this a level higher, which is in the distribution of software assets. This seems harder to do, as there is a multiplicity of package managers with diverging needs and behaviors.

Attacking the problem at its root source of truth, the code versioning itself, seems easier due to git's ubiquity and its importance in the reproducibility chain. Also, most package managers support installing packages from git URLs (some exclusively so), and by "ipfs-ifying" git itself there is a lot of integrations that come for free without the need to change anything in the layers above.

Why focus this year

Please provide rhetoric for why this theme deserves focus in 2020 in particular.

Pinning services and gateways have matured enough in 2019 that the time is now ripe for something like this to exist. People and teams can use automated CI pipelines to generate static web versions of their github views and pin them to Pinata, Infura, Fission, and Temporal to improve uptime and resilience. They can pin their git repo itself and use dnslink for cloning, while mirroring to github and gitlab. This was a lot harder to do a year ago. Soon people will also be able to use browsers like Brave to seamlessly browse those "iphub" pages in a p2p way, taking this even further.

Decentralizing GitHub is something that the whole developer community can rally behind, regardless of language or platform, and provides the perfect analogy for educating and familiarizing them in practice on how a generalized merkledag like IPFS works.

Milestones & rough roadmap

Please list relevant development milestones and the high-level timeline for these efforts.

TBD (to be discussed.)

Desired / expected impact

How will we measure success? What will working on this problem statement unlock for future years?

  • number of ipgit installs from package managers like brew
  • sum of stars of github repos that have an "iphub" badge
  • sum of stars of github repos that have a CI file running the "iphub" publisher
  • some way of identifying git-specific traffic on the DHT?
@momack2
Copy link
Contributor

momack2 commented Nov 9, 2019

This is really really awesome @agentofuser! Seems to build well on the work from this year and agree this is a really important problem area. A point @dirkmc and I were discussing with the radicle folks is the importance of packfiles for making git performant. Dirk, any thoughts on this proposal?

I know @mib-kd743naq has thought a bit about ipfs performance/compression too and may have thoughts on this idea. @andrew as well from the package managers side.

@agentofuser
Copy link
Author

agentofuser commented Nov 10, 2019

@momack2 thanks! Packfiles are a bit thorny because it makes the protocol stateful and puts requirements on both clients to run stuff other than pure file serving. It also makes it harder to do concurrent downloads, where different hashed blobs can be downloaded from multiple peers. While I think it is important in the long run, I think to validate the idea the usability/devx is probably higher priority than efficiency at this point.

A good async design can allow for queueing clones and fetches and let them happen in the background with optional callbacks/notifications when done. In this scenario, git's dumb http protocol, which generates one file per blob, fits like a glove and it is super easy to set up with ipfs-deploy as in this proof-of-concept I did: https://twitter.com/agentofuser/status/1178995150879678464. It also enables taking advantage of caching, gateways, and CDNs like Cloudflare's.

Another point is that, with the iphub static web UI, there would already be a need to generate and serve one static url / html page per hash anyway like this: ipfs-shipyard/ipfs-deploy@17e52e8

@MichaelMure
Copy link

Packfiles could have a smart import into ipfs to properly segment them into different chunks instead of importing the whole pack as a single file. See https://www.youtube.com/watch?v=NIu2t21QA7Y for more details.

There is also things like https://github.com/ipfs-shipyard/git-remote-ipld

@dirkmc
Copy link

dirkmc commented Nov 11, 2019

A decentralized source code repository is a great use case for the reasons mentioned above. I would add that developers are a good target market because

  • Developers are more open to being early adopters of a new technology
  • If the system used for development is decentralized, it places that idea at the center of the developer's daily activity, no matter what kind of software they are working on

The IGiS project follows the approach of requesting one file at a time, with some pre-fetching to smooth out the UI. It covers most of the basic use cases on the read side.

There are a few tricky issues to deal with:

  1. Users
    The IDM project is an effort to implement a decentralized identity system
  2. Supporting UI data (eg for creating PRs). There is a demo video showing a proof-of-concept for write functionality with OrbitDB / peer-base.
  3. Making git / ipfs fast
    Packfiles are one aspect of this, although as pointed out above, it may be something we can work towards step by step, including changing how we add files to IPFS and building a layer in between IPFS and git.

@agentofuser
Copy link
Author

@dirkmc good points. For UI data (2), one option is following ideas from gitdb projects like git-bug and use git itself (perhaps with something like isomorphic-git). That way other tools in the git ecosystem can make use of it more easily.

At first though, something as simple as the static website with read-only views would add a lot of value. An intermediate step would be to add write functionality by directly using the GitHub API, making "iphub" as complete of a (serverless) github frontend as possible, and then slowly adding a "man-in-the-middle" decentralized data layer one endpoint at a time, while keeping bidirectional ongoing interop.

@github-actions
Copy link

github-actions bot commented Oct 6, 2023

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Oct 6, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants