Skip to content

Commit

Permalink
stats-gatherer: add --hostname/--location/--port
Browse files Browse the repository at this point in the history
Updates docs, tests, explains how to update an old gatherer.
  • Loading branch information
warner committed May 5, 2016
1 parent d1d9884 commit 93bb3e9
Show file tree
Hide file tree
Showing 6 changed files with 158 additions and 42 deletions.
40 changes: 23 additions & 17 deletions docs/stats.rst
Original file line number Diff line number Diff line change
Expand Up @@ -279,11 +279,12 @@ boxes, as long as the stats-gatherer has a reachable IP address.)

The stats-gatherer is created in the same fashion as regular tahoe client
nodes and introducer nodes. Choose a base directory for the gatherer to live
in (but do not create the directory). Then run:
in (but do not create the directory). Choose the hostname that should be
advertised in the gatherer's FURL. Then run:

::

tahoe create-stats-gatherer $BASEDIR
tahoe create-stats-gatherer --hostname=HOSTNAME $BASEDIR

and start it with "tahoe start $BASEDIR". Once running, the gatherer will
write a FURL into $BASEDIR/stats_gatherer.furl .
Expand All @@ -295,19 +296,23 @@ under a key named "stats_gatherer.furl", like so:
::

[client]
stats_gatherer.furl = pb://qbo4ktl667zmtiuou6lwbjryli2brv6t@192.168.0.8:49997/wxycb4kaexzskubjnauxeoptympyf45y
stats_gatherer.furl = pb://qbo4ktl667zmtiuou6lwbjryli2brv6t@HOSTNAME:PORTNUM/wxycb4kaexzskubjnauxeoptympyf45y

or simply copy the stats_gatherer.furl file into the node's base directory
(next to the tahoe.cfg file): it will be interpreted in the same way.

The first time it is started, the gatherer will listen on a random unused TCP
port, so it should not conflict with anything else that you have running on
that host at that time. On subsequent runs, it will re-use the same port (to
keep its FURL consistent). To explicitly control which port it uses, write
the desired portnumber into a file named "portnum" (i.e. $BASEDIR/portnum),
and the next time the gatherer is started, it will start listening on the
given port. The portnum file is actually a "strports specification string",
as described in :doc:`configuration`.
When the gatherer is created, it will allocate a random unused TCP port, so
it should not conflict with anything else that you have running on that host
at that time. To explicitly control which port it uses, run the creation
command with ``--location=`` and ``--port=`` instead of ``--hostname=``. If
you use a hostname of ``example.org`` and a port number of ``1234``, then
run::

tahoe create-stats-gatherer --location=tcp:example.org:1234 --port=tcp:1234

``--location=`` is a Foolscap FURL hints string (so it can be a
comma-separated list of connection hints), and ``--port=`` is a Twisted
"server endpoint specification string", as described in :doc:`configuration`.

Once running, the stats gatherer will create a standard JSON file in
``$BASEDIR/stats.json``. Once a minute, the gatherer will pull stats
Expand All @@ -322,16 +327,17 @@ Other tools can be built to examine these stats and render them into
something useful. For example, a tool could sum the
"storage_server.disk_avail' values from all servers to compute a
total-disk-available number for the entire grid (however, the "disk watcher"
daemon, in misc/operations_helpers/spacetime/, is better suited for this specific task).
daemon, in misc/operations_helpers/spacetime/, is better suited for this
specific task).

Using Munin To Graph Stats Values
=================================

The misc/munin/ directory contains various plugins to graph stats for Tahoe
nodes. They are intended for use with the Munin_ system-management tool, which
typically polls target systems every 5 minutes and produces a web page with
graphs of various things over multiple time scales (last hour, last month,
last year).
The misc/operations_helpers/munin/ directory contains various plugins to
graph stats for Tahoe nodes. They are intended for use with the Munin_
system-management tool, which typically polls target systems every 5 minutes
and produces a web page with graphs of various things over multiple time
scales (last hour, last month, last year).

Most of the plugins are designed to pull stats from a single Tahoe node, and
are configured with the e.g. http://localhost:3456/statistics?t=json URL. The
Expand Down
58 changes: 56 additions & 2 deletions src/allmydata/scripts/stats_gatherer.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,59 @@

import os, sys

from twisted.python import usage
from allmydata.scripts.common import NoDefaultBasedirOptions
from allmydata.scripts.create_node import write_tac
from allmydata.util.assertutil import precondition
from allmydata.util.encodingutil import listdir_unicode, quote_output
from allmydata.util import fileutil, iputil


class CreateStatsGathererOptions(NoDefaultBasedirOptions):
subcommand_name = "create-stats-gatherer"
optParameters = [
("hostname", None, None, "Hostname of this machine, used to build location"),
("location", None, None, "FURL connection hints, e.g. 'tcp:HOSTNAME:PORT'"),
("port", None, None, "listening endpoint, e.g. 'tcp:PORT'"),
]
def postOptions(self):
if self["hostname"] and (not self["location"]) and (not self["port"]):
pass
elif (not self["hostname"]) and self["location"] and self["port"]:
pass
else:
raise usage.UsageError("You must provide --hostname, or --location and --port.")

description = """
Create a "stats-gatherer" service, which is a standalone process that
collects and stores runtime statistics from many server nodes. This is a
tool for operations personnel to keep track of free disk space, server
load, and protocol activity, across a fleet of Tahoe storage servers.
The "stats-gatherer" listens on a TCP port and publishes a Foolscap FURL
by writing it into a file named "stats_gatherer.furl". You must copy this
FURL into the servers' tahoe.cfg, as the [client] stats_gatherer.furl=
entry. Those servers will then establish a connection to the
stats-gatherer and publish their statistics on a periodic basis. The
gatherer writes a summary JSON file out to disk after each update.
The stats-gatherer listens on a configurable port, and writes a
configurable hostname+port pair into the FURL that it publishes. There
are two configuration modes you can use.
* In the first, you provide --hostname=, and the service chooses its own
TCP port number. If the host is named "example.org" and you provide
--hostname=example.org, the node will pick a port number (e.g. 12345)
and use location="tcp:example.org:12345" and port="tcp:12345".
* In the second, you provide both --location= and --port=, and the
service will refrain from doing any allocation of its own. --location=
must be a Foolscap "FURL connection hint sequence", which is a
comma-separated list of "tcp:HOSTNAME:PORTNUM" strings. --port= must be
a Twisted server endpoint specification, which is generally
"tcp:PORTNUM". So, if your host is named "example.org" and you want to
use port 6789, you should provide --location=tcp:example.org:6789 and
--port=tcp:6789. You are responsible for making sure --location= and
--port= match each other.
"""


def create_stats_gatherer(config, out=sys.stdout, err=sys.stderr):
Expand All @@ -26,6 +71,15 @@ def create_stats_gatherer(config, out=sys.stdout, err=sys.stderr):
else:
os.mkdir(basedir)
write_tac(basedir, "stats-gatherer")
if config["hostname"]:
portnum = iputil.allocate_tcp_port()
location = "tcp:%s:%d" % (config["hostname"], portnum)
port = "tcp:%d" % portnum
else:
location = config["location"]
port = config["port"]
fileutil.write(os.path.join(basedir, "location"), location+"\n")
fileutil.write(os.path.join(basedir, "port"), port+"\n")
return 0

subCommands = [
Expand Down
29 changes: 12 additions & 17 deletions src/allmydata/stats.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
from zope.interface import implements
from foolscap.api import eventually, DeadReferenceError, Referenceable, Tub

from allmydata.util import log, fileutil
from allmydata.util import log
from allmydata.util.encodingutil import quote_local_unicode_path
from allmydata.interfaces import RIStatsProvider, RIStatsGatherer, IStatsProducer

Expand Down Expand Up @@ -294,24 +294,19 @@ def __init__(self, basedir=".", verbose=False):
self.stats_gatherer = JSONStatsGatherer(self.basedir, verbose)
self.stats_gatherer.setServiceParent(self)

portnumfile = os.path.join(self.basedir, "portnum")
try:
portnum = open(portnumfile, "r").read()
with open(os.path.join(self.basedir, "location")) as f:
location = f.read().strip()
except EnvironmentError:
portnum = None
self.listener = self.tub.listenOn(portnum or "tcp:0")
d = self.tub.setLocationAutomatically()
if portnum is None:
d.addCallback(self.save_portnum)
d.addCallback(self.tub_ready)
d.addErrback(log.err)

def save_portnum(self, junk):
portnum = self.listener.getPortnum()
portnumfile = os.path.join(self.basedir, 'portnum')
fileutil.write(portnumfile, '%d\n' % (portnum,))

def tub_ready(self, ignored):
raise ValueError("Unable to find 'location' in BASEDIR, please rebuild your stats-gatherer")
try:
with open(os.path.join(self.basedir, "port")) as f:
port = f.read().strip()
except EnvironmentError:
raise ValueError("Unable to find 'port' in BASEDIR, please rebuild your stats-gatherer")

self.tub.listenOn(port)
self.tub.setLocation(location)
ff = os.path.join(self.basedir, self.furl_file)
self.gatherer_furl = self.tub.registerReference(self.stats_gatherer,
furlFile=ff)
5 changes: 5 additions & 0 deletions src/allmydata/test/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -494,6 +494,11 @@ def _get_introducer_web(self):
def _set_up_stats_gatherer(self, res):
statsdir = self.getdir("stats_gatherer")
fileutil.make_dirs(statsdir)
portnum = iputil.allocate_tcp_port()
location = "tcp:127.0.0.1:%d" % portnum
fileutil.write(os.path.join(statsdir, "location"), location)
port = "tcp:%d:interface=127.0.0.1" % portnum
fileutil.write(os.path.join(statsdir, "port"), port)
self.stats_gatherer_svc = StatsGathererService(statsdir)
self.stats_gatherer = self.stats_gatherer_svc.stats_gatherer
self.add_service(self.stats_gatherer_svc)
Expand Down
50 changes: 44 additions & 6 deletions src/allmydata/test/test_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -184,14 +184,14 @@ def run_tahoe(self, argv):
rc = runner.runner(argv, stdout=out, stderr=err)
return rc, out.getvalue(), err.getvalue()

def do_create(self, kind):
def do_create(self, kind, *args):
basedir = self.workdir("test_" + kind)
command = "create-" + kind
is_client = kind in ("node", "client")
tac = is_client and "tahoe-client.tac" or ("tahoe-" + kind + ".tac")

n1 = os.path.join(basedir, command + "-n1")
argv = ["--quiet", command, "--basedir", n1]
argv = ["--quiet", command, "--basedir", n1] + list(args)
rc, out, err = self.run_tahoe(argv)
self.failUnlessEqual(err, "")
self.failUnlessEqual(out, "")
Expand Down Expand Up @@ -226,7 +226,7 @@ def do_create(self, kind):

# test that the non --basedir form works too
n2 = os.path.join(basedir, command + "-n2")
argv = ["--quiet", command, n2]
argv = ["--quiet", command] + list(args) + [n2]
rc, out, err = self.run_tahoe(argv)
self.failUnlessEqual(err, "")
self.failUnlessEqual(out, "")
Expand All @@ -236,7 +236,7 @@ def do_create(self, kind):

# test the --node-directory form
n3 = os.path.join(basedir, command + "-n3")
argv = ["--quiet", "--node-directory", n3, command]
argv = ["--quiet", "--node-directory", n3, command] + list(args)
rc, out, err = self.run_tahoe(argv)
self.failUnlessEqual(err, "")
self.failUnlessEqual(out, "")
Expand All @@ -247,7 +247,7 @@ def do_create(self, kind):
if kind in ("client", "node", "introducer"):
# test that the output (without --quiet) includes the base directory
n4 = os.path.join(basedir, command + "-n4")
argv = [command, n4]
argv = [command] + list(args) + [n4]
rc, out, err = self.run_tahoe(argv)
self.failUnlessEqual(err, "")
self.failUnlessIn(" created in ", out)
Expand Down Expand Up @@ -282,7 +282,7 @@ def test_introducer(self):
self.do_create("introducer")

def test_stats_gatherer(self):
self.do_create("stats-gatherer")
self.do_create("stats-gatherer", "--hostname=127.0.0.1")

def test_subcommands(self):
# no arguments should trigger a command listing, via UsageError
Expand All @@ -291,6 +291,44 @@ def test_subcommands(self):
[],
run_by_human=False)

def test_stats_gatherer_good_args(self):
rc = runner.runner(["create-stats-gatherer", "--hostname=foo",
self.mktemp()])
self.assertEqual(rc, 0)
rc = runner.runner(["create-stats-gatherer", "--location=tcp:foo:1234",
"--port=tcp:1234", self.mktemp()])
self.assertEqual(rc, 0)

def test_stats_gatherer_bad_args(self):
# missing hostname/location/port
argv = "create-stats-gatherer D"
self.assertRaises(usage.UsageError, runner.runner, argv.split(),
run_by_human=False)

# missing port
argv = "create-stats-gatherer --location=foo D"
self.assertRaises(usage.UsageError, runner.runner, argv.split(),
run_by_human=False)

# missing location
argv = "create-stats-gatherer --port=foo D"
self.assertRaises(usage.UsageError, runner.runner, argv.split(),
run_by_human=False)

# can't provide both
argv = "create-stats-gatherer --hostname=foo --port=foo D"
self.assertRaises(usage.UsageError, runner.runner, argv.split(),
run_by_human=False)

# can't provide both
argv = "create-stats-gatherer --hostname=foo --location=foo D"
self.assertRaises(usage.UsageError, runner.runner, argv.split(),
run_by_human=False)

# can't provide all three
argv = "create-stats-gatherer --hostname=foo --location=foo --port=foo D"
self.assertRaises(usage.UsageError, runner.runner, argv.split(),
run_by_human=False)

class RunNode(common_util.SignalMixin, unittest.TestCase, pollmixin.PollMixin,
RunBinTahoeMixin):
Expand Down
18 changes: 18 additions & 0 deletions topfiles/2773.docs
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
The "stats-gatherer", an operation-helper service used to collect runtime
statistics from a fleet of Tahoe storage servers, must now be assigned a
hostname, or location+port pair, at creation time. The "tahoe
create-stats-gatherer" command now requires either "--hostname=", or both
"--location=" and "--port".

Previously, "tahoe create-stats-gatherer NODEDIR" would attempt to guess its
location by running something like /sbin/ifconfig to collect local IP
addresses. While this works if the host has a public IP address (or at least
lives in the same LAN as the storage servers it monitors), most sysadmins
would prefer the FURL be created with a real hostname.

To keep your old stats-gatherers working, with their original FURL, you must
determine a suitable --location and --port, and write their values into
NODEDIR/location and NODEDIR/port, respectively. Or you could simply rebuild
it by re-running "tahoe create-stats-gatherer" with the new arguments.

See docs/stats.rst for details.

0 comments on commit 93bb3e9

Please sign in to comment.