updated readme, almost done. Need to figure out galaxy integration (a…

…gain)
DrPsychick · Sep 7, 2018 · 6a120c0 · 6a120c0
1 parent cfd9b68
commit 6a120c0
Show file tree

Hide file tree

Showing 7 changed files with 92 additions and 49 deletions.
diff --git a/.travis.yml b/.travis.yml
@@ -1,5 +1,5 @@
 ---
-services: 
+services:
   - docker
 language: python
 python: "2.7"
@@ -20,7 +20,7 @@ before_install:
 install:
   # Install ansible and jq.
   - "pip install ansible jq"
-  
+
   # Add ansible.cfg to pick up roles path.
   - printf "[defaults]\nroles_path = ../" > ansible.cfg
 
@@ -35,16 +35,16 @@ script:
 
   # Run the role/playbook again, checking to make sure it's idempotent.
   - >
-    ansible-playbook -i tests/inventory tests/test.yml --connection=local --sudo 
+    ansible-playbook -i tests/inventory tests/test.yml --connection=local --sudo
     | grep -q 'changed=0.*failed=0'
     && (echo 'Idempotence test: pass' && exit 0)
     || (echo 'Idempotence test: fail' && exit 1);
 
   # TEST = migration
   - >
-    if [ "$TEST" = "migration" ]; then 
+    if [ "$TEST" = "migration" ]; then
     curl -s http://localhost:8086/query?db=test_agg --data-urlencode "q=SELECT * FROM test_agg.rp_7d.test";
-    result=$(curl -s http://localhost:8086/query?db=test_agg --data-urlencode "q=SELECT MEAN(value) FROM test_agg.rp_7d.test" | jq .results[0].series[0].values[0][1]); 
+    result=$(curl -s http://localhost:8086/query?db=test_agg --data-urlencode "q=SELECT MEAN(value) FROM test_agg.rp_7d.test" | jq .results[0].series[0].values[0][1]);
     [ "$result" -ne 35 ] && (echo "Aggregation test failed: '$result' != 35"; exit 1) || (echo "Aggregation test: pass"; exit 0);
     fi
 

diff --git a/README.md b/README.md
@@ -3,45 +3,94 @@
 Configure influxDB for downsampling
 ===================================
 
+Motivation:
+InfluxDB uses a default retention policy that keeps data **forever** in 7 day shards - in RAW format (data points every 10 or 30 seconds, depending on your input configuration).
+Of course this is a good default, but once you have old data and want to introduce downsampling without loosing data, its a **lot** of manual work to setup all the queries etc.
+
+So ... I have done this for you!
+
 Two usage scenarios:
 * You already have an influxdb running and it's getting BIG, so you want to introduce downsampling on-the-fly to make things faster and cheaper
-* You intend to use influxdb and want to set it up with downsampling in mind
+* You intend to use influxdb and want to set it up with downsampling in mind (so it does not grow big over time in the first place)
 
-honestly the two use cases are not much different. The biggest difference is the time it takes to run through the playbook. Of course, if you work on existing data, don't forget to have a proper backup!
+Honestly the two use cases are not much different. The biggest difference is the time it takes to run through the playbook when you enable backfilling. Of course, if you work on existing data, don't forget to have a proper backup!
 
 Preparation
 -----------
-As preparation you don't need much, expect knowing how exactly you want to downsample your data.
+As preparation you don't need much, expect knowing how exactly you want to downsample your data as you need to setup your configuration first.
+
+Setup
+-----
+
+Easiest setup is create a role in your own repository and adding this:
+* Decide on the name of the setup, let's call the role "" and the setup"frank"
+* *hint* you can have any number of setup configured in this role. You just always have to load first **your** role (defining the setup) and then **DrPsychick.ansible-influx-downsampling** for each setup.
+
+`tasks/main.yml`
+```---
+
+- name: "Include definition from influxdb_{{vars_name}}.yml"
+  include_vars: influxdb_{{vars_name}}.yml
+    when: vars_name is defined
+```
+
+`vars/influxdb_frank.yml`
+--> take one from the examples directory as a base for your own: [examples/](examples/)
+
+Now in your playbook, include both roles:
+```
+- name: InfluxDB 
+  hosts: localhost
+  roles: 
+    - { role: , vars_name: "frank" }
+    - { role: DrPsychick.ansible-influxdb-downsampling}
+```
+
+
+Attention
+=========
+If you enable **backfill**:
+* Check the size of your data first. Depending on the amount of series in a measurement, you need to configure the time range for backfilling. A good default is "1d".
+* Timeouts: Your InfluxDB as well as the calls in this playbook may time out! Or you may hit other limits in the influxdb.conf.
+
+My Settings for backfilling 9GB of data on 5 aggregation levels on a docker container with 3GB of RAM (no CPU limit for backfilling)
+* `ansible_influx_databases`, 5 levels: 14d@1m, 30d@5m, 90d@15m, 1y@1h, 3y@3h
+* `ansible_influx_timeout`: 600 (10 minutes)
+* influxdb.conf: `query-timeout="600s", max-select-point=200000000, max-select-series=1000000, log-queries-after="10s"`
+* duration:
+
+My full setup can be found in [examples/full-5level-backfill-compact/](examples/full-5level-backfill-compact/)
 
 History
--------
+=======
 
 Version 0.3:
 
-T more tests:
-T * run backfill without CQ to switch RP on existing data
-T * run backfill without CQ during operation (configurable timing of input)
-T howto switch retention policy (cleanup after all is setup)
-T * Case: copy from "autogen", no CQ, drop source after backfill + set default RP -> see test
-T shift RPs by "spread" seconds: 60+/-5sec EVERY 5m+-1s,2s,3s,... + step in seconds
+* [ ] full readme -> docs
+* [ ] multiple examples -> docs/example
+* [ ] more tests:
+* [ ] * run backfill without CQ and switch RP on existing data (compact/evict old data)
+* [ ] * run backfill without CQ during operation (configurable timing of input) and switch RP
+* [ ] howto switch retention policy (cleanup after all is setup)
+* [ ] * Case: copy from "autogen", no CQ, drop source after backfill + set default RP -> see test
+* [ ] shift RPs by "spread" seconds: 60+/-5sec EVERY 5m+-1s,2s,3s,... + step in seconds
+* [ ] add RP shard duration option
 
 Version 0.2:
 
-W Check variables upfront (define clear dependencies) and print useful error messages before acting
-* fix: continuous_query is required even if empty (bad usability)
-T full readme -> docs
-T multiple examples -> docs/example
-W more tests: 
-* test parallel tests
-T * prepare seeding (generator or file?)
-T * run downsampling + backfill on existing DB (needs seed)
-T * run downsampling + backfill + switch RP on existing DB (needs seed)
-T * run backfill with step 7 (on RP with 7d)
-* set RP default yes/no
-* improve/extend dict structure (BC break!)
-* update continuous queries (drop+create)
-* stats (total data points written per DB / average downsampling ratio)
-* support selective group by in backfill and continuous query
+* [ ] Update description + basic readme
+* [ ] Check variables upfront (define clear dependencies) and print useful error messages before acting
+* [x] fix: continuous_query is required even if empty (bad usability)
+* [ ] more tests: 
+** [x] test parallel tests
+** [x]prepare seeding (generator or file?)
+** [x] run downsampling + backfill on existing DB (needs seed)
+** [ ] run backfill with step X (on RP with 7d)
+* [x] set RP default yes/no
+* [x] improve/extend dict structure (BC break!)
+* [x] update continuous queries (drop+create)
+* [x] stats (total data points written per DB / average downsampling ratio)
+* [x] support selective group by in backfill and continuous query
 
 Version 0.1:
 

diff --git a/defaults/main.yml b/defaults/main.yml
@@ -17,11 +17,11 @@ ansible_influx_databases:
     measurements: { cpu }
 
 # Predefined set of queries for standard telegraf inputs
-# You can overwrite these with selectively the variable "my_ansible_influx_queries" 
+# You can overwrite these with selectively the variable "my_ansible_influx_queries"
 # Use the same structure as below.
 
 # Attention!
-# columns have to be named explicitly, otherwise influxdb will prepend the aggregation 
+# columns have to be named explicitly, otherwise influxdb will prepend the aggregation
 # method name (e.g. mean(usage_user) -> mean_usage_user)
 # see https://github.com/influxdata/influxdb/issues/7332
 ansible_influx_queries:

diff --git a/examples/basic.yml b/examples/basic.yml
@@ -14,5 +14,3 @@ ansible_influx_databases:
     # cq_resample: only useful when doing cq
     # backfill: only needed when doing cq
     # measurements: only needed when doing cq
-
-
diff --git a/tasks/influxdb_database.yml b/tasks/influxdb_database.yml
@@ -120,4 +120,3 @@
 - name: '{{db_prefix}} Average series downsampling'
   debug: msg="Average series downsampling = {{ (mm_downsampling_totals|map('float')|sum(start=0) / mm_downsampling_totals|length) |round(2) }} %"
   when: ifx_backfill and mm_downsampling_totals|length > 0
-
diff --git a/tasks/influxdb_measurement.yml b/tasks/influxdb_measurement.yml
@@ -34,9 +34,9 @@
   when: ifx_db.source is defined
 
 - name: '{{mm_prefix}} Get SOURCE measurement count(*)'
-  uri: 
-    url: "{{ansible_influx_url}}/query?db={{ifx_db.source.name}}" 
-    method: POST 
+  uri:
+    url: "{{ansible_influx_url}}/query?db={{ifx_db.source.name}}"
+    method: POST
     body: "q=SELECT COUNT(*) FROM {{source_mm}} WHERE time >= now() - {{ifx_db.retention_policy.amount+ifx_db.retention_policy.unit}}"
     return_content: yes
   register: ansible_influx_mm_count
@@ -50,7 +50,7 @@
 
 - name: '{{mm_prefix}} Count on SOURCE'
   debug: msg="Max count on SOURCE {{measurement}} = {{mm_count_source}}"
-  when: mm_count_source|int > 0 
+  when: mm_count_source|int > 0
 
 - name: '{{mm_prefix}} Create list of fields'
   set_fact:
@@ -67,7 +67,7 @@
     # now() - rp.amount+rp.unit - x*bf.step
     body: >
       q={{cq_select}} INTO {{target_mm}} FROM {{source_mm}}
-      WHERE time >= now() - {{seq}}{{ifx_db.retention_policy.unit}} 
+      WHERE time >= now() - {{seq}}{{ifx_db.retention_policy.unit}}
       AND time < now() - {{seq|int - ifx_db.backfill.step|default(1)|int}}{{ifx_db.retention_policy.unit}}
       {{bf_where}} GROUP BY time({{cq_interval}}),{{cq_groupby|join(',')}}
     return_content: yes
@@ -84,9 +84,9 @@
 
 - name: '{{mm_prefix}} Print result from backfill'
   debug: var=ansible_influx_mm_backfill
-  when: (ansible_influx_mm_backfill is succeeded 
-    and ansible_influx_mm_backfill is not changed 
-    and (ansible_influx_mm_backfill.results|map(attribute='skipped')|flatten|default([])|unique != [ true ])) 
+  when: (ansible_influx_mm_backfill is succeeded
+    and ansible_influx_mm_backfill is not changed
+    and (ansible_influx_mm_backfill.results|map(attribute='skipped')|flatten|default([])|unique != [ true ]))
     or ansible_influx_mm_backfill is failed
 
 - name: '{{mm_prefix}} Sum up written data points'
@@ -99,9 +99,9 @@
   when: mm_backfill and ansible_influx_mm_backfill is changed
 
 - name: '{{mm_prefix}} Drop continuous query {{ifx_cq_name}}'
-  uri: 
-    url: "{{ansible_influx_url}}/query" 
-    method: POST 
+  uri:
+    url: "{{ansible_influx_url}}/query"
+    method: POST
     body: 'q=DROP CONTINUOUS QUERY "{{ifx_cq_name}}" ON "{{ifx_db.name}}"'
   when: ifx_db.source is defined and ifx_cq_name in ifx_cqs
 

diff --git a/tasks/main.yml b/tasks/main.yml
@@ -24,12 +24,9 @@
 #- debug: var=ansible_influx_cqs
 - set_fact:
     ifx_cqs: "{{ansible_influx_cqs.json.results[0].series |rejectattr('values', 'callable') |map(attribute='values') |flatten |select('match', '^(?!CREATE).*') |list if ansible_influx_cqs.json.results[0].series is defined else []}}"
-
 #- debug: var=ifx_cqs
 
 - name: Setup databases
   include_tasks: influxdb_database.yml database={{db_item}}
   with_items: "{{ansible_influx_databases|sort}}"
   loop_control: { loop_var: db_item }
-
-
Original file line number	Diff line number	Diff line change
Expand Up		@@ -14,5 +14,3 @@ ansible_influx_databases:
		# cq_resample: only useful when doing cq
		# backfill: only needed when doing cq
		# measurements: only needed when doing cq
Original file line number	Diff line number	Diff line change
Expand Up		@@ -120,4 +120,3 @@
		- name: '{{db_prefix}} Average series downsampling'
		debug: msg="Average series downsampling = {{ (mm_downsampling_totals\|map('float')\|sum(start=0) / mm_downsampling_totals\|length) \|round(2) }} %"
		when: ifx_backfill and mm_downsampling_totals\|length > 0