Discussion:
How to manage / delete acknowledgements? (icinga core and classic ui 1.11)
David Young
2014-04-17 10:44:26 UTC
Permalink
Hi all,

I'm running into a problem whereby users can accidentally acknowledge a
critical alarm for an unlimited period of time, which is a potential
risk to our platform. I'd like to restrict how long an service can be
acknowledged for, based on hostgroup / hostname.

I've mocked up a similar system using an eventhandler on particular
services, which simply deletes any acknowledgements for the service if
it changes state, using the status.cmd file. However, I'm not sure
whether there's a more elegant way to tackle this.

Any ideas?
Thanks,
D
Michael Friedrich
2014-05-21 21:02:58 UTC
Permalink
Post by David Young
Hi all,
I'm running into a problem whereby users can accidentally acknowledge
a critical alarm for an unlimited period of time, which is a potential
risk to our platform. I'd like to restrict how long an service can be
acknowledged for, based on hostgroup / hostname.
I've mocked up a similar system using an eventhandler on particular
services, which simply deletes any acknowledgements for the service if
it changes state, using the status.cmd file. However, I'm not sure
whether there's a more elegant way to tackle this.
Not sure if that's just a gui problem calculating a value based on some
thresholds (for example custom variables being set on the host) and
ignoring the value a user adds.
If you're requiring acknowledgements to be deleted on *every* state
change, you should set the sticky option to '0' instead of '2'.
Otherwise they will only get cleared once a service recovers from a
not-ok state.

Never seen such a use case before tbh, so no good ideas from my side here.

kind regards,
Michael
--
DI (FH) Michael Friedrich

michael.friedrich-***@public.gmane.org || icinga open source monitoring
https://twitter.com/dnsmichi || lead core developer
dnsmichi-***@public.gmane.org || https://www.icinga.org/team
irc.freenode.net/icinga || dnsmichi
Carl R. Friend
2014-05-22 10:44:43 UTC
Permalink
Post by Michael Friedrich
Post by David Young
I'm running into a problem whereby users can accidentally acknowledge a
critical alarm for an unlimited period of time, which is a potential risk
to our platform. I'd like to restrict how long an service can be
acknowledged for, based on hostgroup / hostname.
Never seen such a use case before tbh, so no good ideas from my side here.
Take a look at the "set_expire_ack_by_default" and
"default_expiring_acknowledgement_duration" configuration variables
in cgi.cfg (and hopefully in the doco).

I made the case for those based on the deeply disfunctional
organisation I used to work for (and which I mercifully have escaped)
where stuff would get ack'ed with the best of intentions and then all
the priorities changed and then never got worked on (incompetent
management); then there were times where they'd get ack'ed just to
shut the problems up and mask them because the monitoring system was
used as a cudgel to beat up the techs (pernicious management).

Expiring ACKs by default made those scenarios less likely as
it requires a deliberate action on the part of the person acknowledging
a problem to make it non-expiring. It helped a lot in my sad case;
perhaps it'll help here, too.

Cheers!

+------------------------------------------------+---------------------+
| Carl Richard Friend (UNIX Sysadmin) | West Boylston |
| Minicomputer Collector / Enthusiast | Massachusetts, USA |
| mailto:crfriend-***@public.gmane.org +---------------------+
| http://users.rcn.com/crfriend/museum | ICBM: 42:22N 71:47W |
+------------------------------------------------+---------------------+
Loading...