This is a docker image, to be run in a k8s cluster, to ingest Postgres logs from CloudSQL and send them off to honeycomb.
The basic infrastructure to accomplish this requires using Cloud Pub/Sub to send messages between CloudSQL and this program, which sends them to Honeycomb.
That is:
CloudSQL -> Stackdriver Logs -> Cloud Pub/Sub -> cloudsqltail (this) -> honeytail -> honeycomb
Running the default command of the provided docker image will run a pipeline of both cloudsqltail and honeytail, with the effect of reading from a specified Pub/Sub Subscription and writing the resulting events into Honeycomb.
cloudsqltail
is a small golang program that reads log messages from a Pub/Sub
Subscription, attempting to correctly order them and write them to STDOUT in a
format consumable by honeytail
. The two details to note here are the
reordering and reformatting.
Reordering: messages in a Subscription are not guaranteed to be delivered in
any order, which is problematic because Postgres query logs can span multiple
lines. Honeytail expects a strict ordering of log lines to be able to make
sense of the queries. cloudsqltail
attempts to best-effort order the log
messages it pulls from the Subscription by buffering messages in memory for
some amount of time, sorting them by their nanosecond-resolution timestamp,
then outputing the sorted list. This is not perfect, but is generally good
enough that the mangled or missing query noise becomes statistically
insignificant. See the -flush-interval
flag of cloudsqltail
for fine tuning
the time between buffer sort/flush.
Reformatting: honeytail
requires a timestamp for each logged query for
accurate event time, which is normally accomplished by modifying the Postgres
log_line_prefix
configuration to add a timestamp. However, CloudSQL does not
allow configuration of this parameter. Instead, cloudsqltail
will prepend the
timestamp associated with the Pub/Sub message to the appropriate postgres log
lines, just as if it had been added by Postgres.
You'll want to turn on query logging. Assuming no custom database flags are set on your cloudsql instance, you can run:
gcloud sql instances patch <instance_name> --database-flags log_min_duration_statement=0
Caveat 1: this requires restarting your cloudsql instance, which gcloud will do for you.
To see what flags are set: gcloud sql instances describe <instance_name>
and
look for the top-level key databaseFlags
. (You'll want to include existing
flags in your gcloud sql instances patch
command, if any are set.)
Ingestion is done via Cloud Pub/Sub.
To use this, set up a Stackdriver sink of your Postgres logs to a Pub/Sub topic. You will also need to create a subscription to that topic. Note the subscription name you provide, as it is necessary to configure cloudsqltail.
docker build -t postgres-honeytail .
Then push to your preferred docker registry.
By default, the container will run a wrapper script run.sh
, which takes the
output of cloudsqltail
and sends it along to honeycomb via honeytail
.
GOOGLE_APPLICATION_CREDENTIALS_JSON
is used to auth to gcloud; the service account must have read access to the pubsub subscription.PROJECT_ID
the gcloud project the pubsub subscription is inSUBSCRIPTION_NAME
the pubsub subscription nameHONEYCOMB_WRITEKEY
(not required ifDEBUG
is set, see below)
DEBUG
runshoneytail
with the flags--debug
(setting the log level to DEBUG) and--debug_stdout
(writing events to stdout instead of sending to honeycomb). The latter flag also means thatHONEYCOMB_WRITEKEY
is not verified, so it can be left unset.DATASET
the honeycomb dataset to write to; defaults topostgres
The cloudsqltail
command uses go modules to track it's depedencies, so you'll
need at least Go 1.11 or higher, with module mode enabled via GO111MODULE=on
.
With >=1.12, make
should work out of the box with no additional configuration
necessary.
cloudtailsql
needs Google Cloud access credentials (to fetch the Pub/Sub
Subscription). To get this locally, run docker with
-v $HOME/.config/gcloud:/root/.config/gcloud