-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add wsgi_autodetect.py to autodetect repositories with low complexity #267
base: master
Are you sure you want to change the base?
Conversation
I like the idea, adding a few code comments. |
klaus/contrib/auto_klaus.py
Outdated
""" | ||
Alternative take on the "automatically discovered repositories" concept | ||
that requires no threads, polling or inotify. Instead the filesystem is | ||
consulted whenever a repository name is looked up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like it's important to mention here that each time the repo list is viewed by someone, the following code is run:
- 1x
listdir()
in the root - Nx
os.path.exists
in each of the folders
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a blurb to that effect to the wsgi_autodetecting.py
docstring.
klaus/contrib/auto_klaus.py
Outdated
|
||
Example usage: | ||
|
||
from klaus.contrib.auto_klaus import Klaus, SlashDynamicRepos |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can find a more descriptive name than auto_klaus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this was just the first name I came up with as I was throwing this together for my own server, heh. I'm open to suggestions. The two hardest problems in computer science: naming things, cache invalidation and off-by-one errors. 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed it to wsgi_autodetecting.py
. Still a bit generic but hopefully more descriptive.
klaus/contrib/auto_klaus.py
Outdated
|
||
application = Klaus('/srv/git', "My git repositories", False) | ||
|
||
application.wsgi_app = httpauth.AlwaysFailingAuthMiddleware( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most people using klaus won't be able to understand how to actually run a klaus server with this code. We should have something as simple as this: https://github.com/jonashaag/klaus/wiki/Autoreloader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll look into that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a wsgi_autodetect.py
in the style of wsgi_autoreload.py
.
klaus/contrib/auto_klaus.py
Outdated
self._repos = {} | ||
|
||
def __getitem__(self, name): | ||
if not name or name[0] == '.' or '/' in name or '\0' in name: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should also check for os.pardir
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's included in name[0] == '.'
for most common operating systems but we could include os.pardir
os.curdir
, os.sep
and os.altsep
explicitly, yes. Better to be safe than sorry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added all of these to the validation expression.
klaus/contrib/auto_klaus.py
Outdated
if not name or name[0] == '.' or '/' in name or '\0' in name: | ||
raise KeyError(name) | ||
|
||
repos = self._repos |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That alias is confusing, please don't do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed (and one other instance of the same pattern as well).
Not a fan of the repeated subexpressions it causes, though.
klaus/contrib/auto_klaus.py
Outdated
root = self._root | ||
return ( | ||
repo for repo in os.listdir(root) | ||
if os.path.exists(root / repo / 'git-daemon-export-ok') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we save a few exists()
checks here by checking self._repos
for a positive match first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I had at first, in fact. But then the admin cannot dynamically retract access from a repository. I suppose we could make that optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right but if we keep the safety check + .pop()
in __getitem__
that's an acceptable compromise IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow. The “safety check + .pop()
” is what we have now. The alternative is to remove the safety check and just rely on the presence of self._repos[repo]
. I don't see the compromise you're trying to get at. Edit: ahh, you meant safety for new repos.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the detect_removal
option as a compromise between performance and correctness.
klaus/contrib/auto_klaus.py
Outdated
def __len__(self): | ||
return len(self._base) | ||
|
||
class Klaus(klaus.Klaus): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's wise to create a subclass of klaus.Klaus
because it will potentially break anytime we make changes to klaus.Klaus
. Can you think of another way of solving this that doesn't require using a "private" API? (I'm happy with doing changes to klaus.Klaus
if it's necessary.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well we could change these lines in klaus/__init__.py
to always use the SlashDynamicRepos
proxy (perhaps under a different name):
dulwich_backend = dulwich.server.DictBackend(
{
"/" + namespaced_name: repo
for namespaced_name, repo in app.valid_repos.items()
}
)
Then add an optional named parameter so we can pass a constructor that determines how to handle the first argument to Klaus()
, dependency-injection style. load_repos()
would be moved to a default implementation for that.
One thing I didn't get though, is why invalid_repos
exists, it seems like it could be an information leak.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented: Klaus
now has a repo_container_factory
(apologies for the Java-esque naming) that receives the repo_paths
parameter as its argument and returns a repository container. Such a container implements the python dict
interface and additionally has an invalid
property to store any invalid repositories (exposed as invalid_repos
on Klaus
).
klaus/contrib/auto_klaus.py
Outdated
|
||
repos = self._repos | ||
path = self._root / name | ||
if not os.path.exists(path / 'git-daemon-export-ok'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about non-bare repositories?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we could make the path to check configurable, so you can configure it to be .git/git-daemon-export-ok
or even something completely different if you don't care about gitweb conventions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking for git-daemon-export-ok is quite specific to being compatible with gitweb, and a change from the behaviour that currently exists in klaus. For example, most of my repositories that are visible in klaus don't have a git-daemon-export-ok file and wouldn't work with this script. Perhaps it would make sense to call that out more explicitly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it's kind of annoying that we have multiple ways to deal with autodiscovery/autoreloading. It long overdue someone contributes a well thought out solution into Klaus proper.
I wonder if we should make this "flag file" approach part of the existing autoreloading code? To me it seems the only user-visible difference between this implementation and the existing one the git-daemon-export-ok
file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The path is now configurable and there's also a way to tell it to simply consider all directories as valid.
c30a04a
to
c968f61
Compare
This is a great idea. I'm going to close my 5 year old inotify PR that I was one day going to get round to in favour of this one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tested this on my server and it's performing well modulo the one issue described above. I like the git-daemon-export-ok
idea too as I'd otherwise create a public
directory and a bunch of symlinks which I've never entirely conveniced by.
klaus/contrib/auto_klaus.py
Outdated
|
||
repos = self._repos | ||
path = self._root / name | ||
if not os.path.exists(path / 'git-daemon-export-ok'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a bug here WRT .git
suffix handling
If I have a repo named repo.git
with repo.git/git-daemon-export-ok
, then it's properly listed on the index page, but the link generated resolves to /repo
which then 404s because ./repo/git-daemon-export-ok
doesn't exist. klaus.contrib.autoreload
exhibits the same suffix removal, but correctly resolves the shortened name.
I'm not sure what the best approach to resolve this is, but I think this mapping might need a minor rethink
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried to add functionality to deal with this situation. The KLAUS_DIRECTORY_SUFFIXES
environment variable can now contain any number of potential directory suffixes (no suffix and .git.
by default).
- Add wsgi_autodetect(ing).py with usage like the existing wsgi_autoreload(ing).py scripts. - Can handle directories with and without .git suffix. - Factor out the repository container functionality from the Klaus object into its own class hierarchy (RepoContainer). - Certain aspects of the automatic detection are configurable (specifically, the path that determines whether a subdirectory is a valid repo, whether it should detect removed repos and what are acceptable suffixes).
for more information, see https://pre-commit.ci
Alternative take on the "automatically discovered repositories" concept that requires no threads, polling or inotify. Instead the filesystem is consulted whenever a repository name is looked up.
Since
os.path.exists()
andos.listdir()
are fairly quick filesystem operations, performance should be good for small to medium sites.FancyRepo()
objects are cached.Repositories are identified (by default) by the existence of a
<reponame>/git-daemon-export-ok
file (for compatibility with gitweb).Usage is similar to
wsgi_autoreload.py
:uwsgi -w klaus.contrib.wsgi_autodetect \ --env KLAUS_SITE_NAME="Klaus Demo" \ --env KLAUS_REPOS_ROOT=/path/to/repos \ ...
To configure certain aspects you can add these flags:
Setting
KLAUS_DETECT_REMOVALS
to 0 will improve repository listing performance at the cost of no longer being able to detect if a repository was removed.You can set
KLAUS_EXPORT_OK_PATH
to.
if you do not care about gitweb conventions.