Addressing multiple issues with timing out with `check-ssl-qualys.rb`: #36

majormoses · 2018-03-17T06:47:38Z

added an overall timeout parameter to short circuit slow api
when you call the api beyond the initial attempt it needs to wait until the report is done. Previously it would sleep for a specified arbitrary value and would often timeout. Now we look at the api response and if it has an ETA and the ETA is larger than the value of --between-checks we use it. We check if its greater than the specified as in many cases this has been found to be unreliable when seeing small numbers but seems legit when the api is slower to complete the report. In some cases we see a 0 eta which one might assume is complete or near complete but often saw several 0's in a row leading me to put in that stipulation on using it.
adding documentation about how long it takes and that you probably need to configure your sensu check to have a longer timeout or sensu will kill the process.
moved to v3 of the api (backwards compatible) I saw little to no difference in speed but the docs indicate that v2 is deprecated

testing artifact: https://gist.github.com/majormoses/b549f92fc02894a15a842b29f528da6f

Signed-off-by: Ben Abrams [email protected]

Pull Request Checklist

Fixes #35, #33

General

Update Changelog following the conventions laid out here
Update README with any necessary configuration snippets
Binstubs are created if needed
RuboCop passes
Existing tests pass

Purpose

Make this check actually work with defaults consistently

Known Compatibility Issues

Essentially the check may run longer than in previous versions. There are now multiple nobs to fine tune what you want but it now works every time I have tried it.
Knobs:

--timeout: this is an overall timeout circuit breaker and defaults to 5 minutes
--number-checks: this indicates how many requests will be attempted before giving up, left default at 24
--time-between-checks: previously this was a constant amount of time in between polling for report being done. Now this acts more as a fallback value. The api response includes an eta key and value that indicates when the api thinks you should check back. It seemed very temperamental and you still need to have a value to wait between the initial request and polling for completion so we decided to use the same value and treat this as a fallback. In cases where the eta is less than this we use this value instead

majormoses · 2018-03-17T06:48:49Z

The failure is unrelated to the change and is currently failing on master as well.

majormoses · 2018-03-17T06:51:48Z

I found this in the tests: https://github.com/sensu-plugins/sensu-plugins-ssl/blob/1.5.0/test/check-ssl-hsts-preloadable_spec.rb#L2-L5 I am gonna comment it out and raise an issue to address it.

majormoses · 2018-03-17T06:55:38Z

I have commented out the flaky test and raised an issue to address.

jhoblitt · 2018-03-17T14:58:54Z

bin/check-ssl-qualys.rb

+                          'off'
+                        else
+                          'on'
+                        end


I'm not sure if passing startNew=off is needed. The v3 api doc is vague but it seems to suggests the parameter be omitted completely for some reason.

startNew - if set to "on" then cached assessment results are ignored and a new assessment is started. However, if there's already an assessment in progress, its status is delivered instead. This parameter should be used only once to initiate a new assessment; further invocations should omit it to avoid causing an assessment loop.

https://github.com/ssllabs/ssllabs-scan/blob/master/ssllabs-api-docs-v3.md#invoke-assessment-and-check-progress

This seems a bit odd -- perhaps we should ask for clarification?

That feels pretty straightforward the first request should be with it off and the second one should be on. This assumes that you do not want cached or in-progress results. This keeps the old behavior and I am not sure the value of checking for a new report on a somewhat regular basis if you are gonna take cached results. Maybe we can expose that as a param so the user can decide whether they want it be sure that it has just been initiated. The reason I think that the current implementation makes sense is that say a report was in progress and you make a change to fix it you will not get the new results.

I suppose we could have an option to control if all requests should ask for cached.

jhoblitt · 2018-03-17T15:05:01Z

bin/check-ssl-qualys.rb

@@ -107,11 +129,37 @@ def ssl_check(from_cache)

  def ssl_recheck
    1.upto(config[:num_checks]) do |step|
-      json = ssl_check(step != 1)
+      p "step: #{step}" if config[:debug]


If these debug statements are intended to be long term, wouldn't puts be better?

well p shows the objects by doing an inspect while puts does a .to_s which makes it harder to spot issues like improper types. For debug purposes it is much better for normal user interaction puts is generally preferable.

jhoblitt · 2018-03-17T15:06:53Z

bin/check-ssl-qualys.rb

+      p "step: #{step}" if config[:debug]
+      start_time = Time.now
+      p "start_time: #{start_time}" if config[:debug]
+      json = if step == 1


Couldn't this be shorted to ssl_check(step != 1)?

It could and is closer to what was originally there. I write my code for 3 AM troubleshooting as that is when I care the most about being able to grok it. It's more so a matter of preference.

jhoblitt · 2018-03-17T15:11:57Z

bin/check-ssl-qualys.rb

      return json if json['status'] == 'READY'
-      sleep(config[:between_checks])
+      if json['endpoints'] && json['endpoints'].is_a?(Array)


Ruby 2.2 is supposed to be EOL'd in 2 weeks. It sure would be nice to be able to use the safe nav operator here...

I feel you, it's on my list of things to run through all the plugins after it is EOL and remove support for ruby < 2.3

jhoblitt · 2018-03-17T15:14:37Z

bin/check-ssl-qualys.rb

+        # insanely long time periods. The highest I have seen the eta go was
+        # around 250 seconds but put it in just in case as the api has very
+        # erratic response times.
+        if json['endpoints'].first.is_a?(Hash) && json['endpoints'].first.key?('eta') && json['endpoints'].first['eta'] > config[:between_checks]


This expression is a bit long and it requires considering the precedence of && and >. I think it would be better if it was at least newline escaped across several lines.

I can do that, how'd you like to see it? Does something like this work for you?

if json['endpoints'].first.is_a?(Hash) \ && json['endpoints'].first.key?('eta') \ && json['endpoints'].first['eta'] > config[:between_checks] p "eta: #{json['endpoints'].first['eta']}" if config[:debug] sleep(json['endpoints'].first['eta']) else p "sleeping with default: #{config[:between_checks]}" if config[:debug] sleep(config[:between_checks]) end

As I did not hear back I left it as is, I welcome you to open a PR to clean it up.

eheydrick · 2018-03-27T00:21:14Z

README.md

@@ -46,6 +46,14 @@ or an online CRL:

 Critical and Warning thresholds are specified in minutes.

+### `bin/check-ssl-qualysis.rb`
+
+Checks the ssllabs qualysis api for grade of your server, this check can be quite long so it should not be scheduled with a low interval and will probably need to adjust the check `timeout` options per the [check attributes spec](https://docs.sensu.io/sensu-core/1.2/reference/checks/#check-attributes) based on my tests you should expect this to take around 3 minutes.


s/qualysis/qyalys/

- added an overall timeout parameter to short circuit slow api - when you call the api beyond the initial attempt it needs to wait until the report is done. Previously it would sleep for a specified arbitrary value and would often timeout. Now we look at the api response and if it has an ETA and the ETA is larger than the value of `--between-checks` we use it. We check if its greater than the specified as in many cases this has been found to be unreliable when seeing small numbers but seems legit when the api is slower to complete the report. In some cases we see a 0 eta which one might assume is complete or near complete but often saw several 0's in a row leading me to put in that stipulation on using it. - adding documentation about how long it takes and that you probably need to configure your sensu check to have a longer timeout or sensu will kill the process. - moved to v3 of the api (backwards compatible) I saw little to no difference in speed but the docs indicate that v2 is deprecated Signed-off-by: Ben Abrams <[email protected]>

majormoses · 2018-03-27T18:41:03Z

released: https://rubygems.org/gems/sensu-plugins-ssl/versions/2.0.0

majormoses added Type: Bug Status: Needs Review Impact: Breaking Change labels Mar 17, 2018

majormoses self-assigned this Mar 17, 2018

majormoses requested a review from a team March 17, 2018 06:47

majormoses force-pushed the fix/35 branch from 7b13dda to cd31678 Compare March 17, 2018 06:58

jhoblitt reviewed Mar 17, 2018

View reviewed changes

majormoses force-pushed the fix/35 branch from cd31678 to 9544e31 Compare March 20, 2018 02:12

eheydrick reviewed Mar 27, 2018

View reviewed changes

majormoses force-pushed the fix/35 branch from 9544e31 to b420a4f Compare March 27, 2018 03:08

majormoses merged commit 4276db0 into master Mar 27, 2018

majormoses deleted the fix/35 branch March 27, 2018 18:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Addressing multiple issues with timing out with `check-ssl-qualys.rb`: #36

Addressing multiple issues with timing out with `check-ssl-qualys.rb`: #36

majormoses commented Mar 17, 2018 •

edited

Loading

majormoses commented Mar 17, 2018

majormoses commented Mar 17, 2018

majormoses commented Mar 17, 2018

jhoblitt Mar 17, 2018

majormoses Mar 17, 2018

majormoses Mar 22, 2018

jhoblitt Mar 17, 2018

majormoses Mar 17, 2018

jhoblitt Mar 17, 2018 •

edited

Loading

majormoses Mar 17, 2018

jhoblitt Mar 17, 2018

majormoses Mar 17, 2018

jhoblitt Mar 17, 2018

majormoses Mar 20, 2018 •

edited

Loading

majormoses Mar 27, 2018

eheydrick Mar 27, 2018

majormoses Mar 27, 2018

majormoses commented Mar 27, 2018

Addressing multiple issues with timing out with check-ssl-qualys.rb: #36

Addressing multiple issues with timing out with check-ssl-qualys.rb: #36

Conversation

majormoses commented Mar 17, 2018 • edited Loading

Pull Request Checklist

General

Purpose

Known Compatibility Issues

majormoses commented Mar 17, 2018

majormoses commented Mar 17, 2018

majormoses commented Mar 17, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jhoblitt Mar 17, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

majormoses Mar 20, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

majormoses commented Mar 27, 2018

Addressing multiple issues with timing out with `check-ssl-qualys.rb`: #36

Addressing multiple issues with timing out with `check-ssl-qualys.rb`: #36

majormoses commented Mar 17, 2018 •

edited

Loading

jhoblitt Mar 17, 2018 •

edited

Loading

majormoses Mar 20, 2018 •

edited

Loading