-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Addressing multiple issues with timing out with check-ssl-qualys.rb
:
#36
Conversation
The failure is unrelated to the change and is currently failing on master as well. |
I found this in the tests: https://github.com/sensu-plugins/sensu-plugins-ssl/blob/1.5.0/test/check-ssl-hsts-preloadable_spec.rb#L2-L5 I am gonna comment it out and raise an issue to address it. |
I have commented out the flaky test and raised an issue to address. |
'off' | ||
else | ||
'on' | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if passing startNew=off
is needed. The v3 api doc is vague but it seems to suggests the parameter be omitted completely for some reason.
startNew - if set to "on" then cached assessment results are ignored and a new assessment is started. However, if there's already an assessment in progress, its status is delivered instead. This parameter should be used only once to initiate a new assessment; further invocations should omit it to avoid causing an assessment loop.
This seems a bit odd -- perhaps we should ask for clarification?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That feels pretty straightforward the first request should be with it off and the second one should be on. This assumes that you do not want cached or in-progress results. This keeps the old behavior and I am not sure the value of checking for a new report on a somewhat regular basis if you are gonna take cached results. Maybe we can expose that as a param so the user can decide whether they want it be sure that it has just been initiated. The reason I think that the current implementation makes sense is that say a report was in progress and you make a change to fix it you will not get the new results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we could have an option to control if all requests should ask for cached.
@@ -107,11 +129,37 @@ def ssl_check(from_cache) | |||
|
|||
def ssl_recheck | |||
1.upto(config[:num_checks]) do |step| | |||
json = ssl_check(step != 1) | |||
p "step: #{step}" if config[:debug] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these debug statements are intended to be long term, wouldn't puts
be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well p
shows the objects by doing an inspect while puts
does a .to_s
which makes it harder to spot issues like improper types. For debug purposes it is much better for normal user interaction puts
is generally preferable.
p "step: #{step}" if config[:debug] | ||
start_time = Time.now | ||
p "start_time: #{start_time}" if config[:debug] | ||
json = if step == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't this be shorted to ssl_check(step != 1)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could and is closer to what was originally there. I write my code for 3 AM troubleshooting as that is when I care the most about being able to grok it. It's more so a matter of preference.
return json if json['status'] == 'READY' | ||
sleep(config[:between_checks]) | ||
if json['endpoints'] && json['endpoints'].is_a?(Array) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ruby 2.2 is supposed to be EOL'd in 2 weeks. It sure would be nice to be able to use the safe nav operator here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel you, it's on my list of things to run through all the plugins after it is EOL and remove support for ruby < 2.3
# insanely long time periods. The highest I have seen the eta go was | ||
# around 250 seconds but put it in just in case as the api has very | ||
# erratic response times. | ||
if json['endpoints'].first.is_a?(Hash) && json['endpoints'].first.key?('eta') && json['endpoints'].first['eta'] > config[:between_checks] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This expression is a bit long and it requires considering the precedence of &&
and >
. I think it would be better if it was at least newline escaped across several lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do that, how'd you like to see it? Does something like this work for you?
if json['endpoints'].first.is_a?(Hash) \
&& json['endpoints'].first.key?('eta') \
&& json['endpoints'].first['eta'] > config[:between_checks]
p "eta: #{json['endpoints'].first['eta']}" if config[:debug]
sleep(json['endpoints'].first['eta'])
else
p "sleeping with default: #{config[:between_checks]}" if config[:debug]
sleep(config[:between_checks])
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I did not hear back I left it as is, I welcome you to open a PR to clean it up.
README.md
Outdated
@@ -46,6 +46,14 @@ or an online CRL: | |||
|
|||
Critical and Warning thresholds are specified in minutes. | |||
|
|||
### `bin/check-ssl-qualysis.rb` | |||
|
|||
Checks the ssllabs qualysis api for grade of your server, this check can be quite long so it should not be scheduled with a low interval and will probably need to adjust the check `timeout` options per the [check attributes spec](https://docs.sensu.io/sensu-core/1.2/reference/checks/#check-attributes) based on my tests you should expect this to take around 3 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/qualysis/qyalys/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
- added an overall timeout parameter to short circuit slow api - when you call the api beyond the initial attempt it needs to wait until the report is done. Previously it would sleep for a specified arbitrary value and would often timeout. Now we look at the api response and if it has an ETA and the ETA is larger than the value of `--between-checks` we use it. We check if its greater than the specified as in many cases this has been found to be unreliable when seeing small numbers but seems legit when the api is slower to complete the report. In some cases we see a 0 eta which one might assume is complete or near complete but often saw several 0's in a row leading me to put in that stipulation on using it. - adding documentation about how long it takes and that you probably need to configure your sensu check to have a longer timeout or sensu will kill the process. - moved to v3 of the api (backwards compatible) I saw little to no difference in speed but the docs indicate that v2 is deprecated Signed-off-by: Ben Abrams <[email protected]>
--between-checks
we use it. We check if its greater than the specified as in many cases this has been found to be unreliable when seeing small numbers but seems legit when the api is slower to complete the report. In some cases we see a 0 eta which one might assume is complete or near complete but often saw several 0's in a row leading me to put in that stipulation on using it.testing artifact: https://gist.github.com/majormoses/b549f92fc02894a15a842b29f528da6f
Signed-off-by: Ben Abrams [email protected]
Pull Request Checklist
Fixes #35, #33
General
Update Changelog following the conventions laid out here
Update README with any necessary configuration snippets
Binstubs are created if needed
RuboCop passes
Existing tests pass
Purpose
Make this check actually work with defaults consistently
Known Compatibility Issues
Essentially the check may run longer than in previous versions. There are now multiple nobs to fine tune what you want but it now works every time I have tried it.
Knobs:
--timeout
: this is an overall timeout circuit breaker and defaults to 5 minutes--number-checks
: this indicates how many requests will be attempted before giving up, left default at 24--time-between-checks
: previously this was a constant amount of time in between polling for report being done. Now this acts more as a fallback value. The api response includes aneta
key and value that indicates when the api thinks you should check back. It seemed very temperamental and you still need to have a value to wait between the initial request and polling for completion so we decided to use the same value and treat this as a fallback. In cases where theeta
is less than this we use this value instead