This repository implements a martingale approach to detect changes in Telemetry histograms. Some background on using martingales for testing exchangeability online can be found here.
There are two ways to execute the tool. The first is running it from an Anaconda terminal. This is the easiest way, as the Anaconda environment comes with all the required dependencies pre-installed.
python detector\main.py --from 20160920 --to 20161019 --datadir histograms --enable-plots
If running from a different environment, dependencies can be installed using setuptools
by using the following command from the root project directory:
pip install .
And then the script can be executed as previously reported.
The following is the list of supported command line options for the current version:
--fromdate
, the beginning of the dates range to consider for analysis, inYYYYMMDD
format.--todate
, the end of the dates range to consider for analysis, inYYYYMMDD
format.--datadir
, the directory that contains the histogram data.--outdir
, the directory that will contain the detections data. Defaults todetections
.--strangeness
, the strangeness measure to use for detection. Supported values arecluster
,hellinger
,bhattacharyya
andcosine
(the default).--enable-plots
, enable plotting the discovered anomalies to the output directory.--threshold
, the threshold to use for the detection. Defaults to 20.
Fetching Telemetry histogram data can be done by using the export script from cerberus, the system Mozilla currently uses to perform anomaly detection on time-series of histogram data.
In order to do that, we start by fetching the histogram/measurement definition from the Firefox repository using wget
:
wget https://raw.githubusercontent.com/mozilla/gecko-dev/master/toolkit/components/telemetry/Histograms.json -O Histograms.json
Then we execute the script that fetches the time-series of histograms for the 3 most recent Firefox builds. Please note that the script must be run from the same directory as the Histogram.json
file.
nodejs export.js
This will take a bit to run and will output the histogram data as JSON files under the histograms
directory.
A sample archive for the histogram data aggregated in the period from the 20th of September 2016 to the 19th October 2016 can he downloaded from here. This is the same date we have used to test the system.