Welcome to It-Slav.Net blog
Peter Andersson
peter@it-slav.net

I've already got a female to worry about. Her name is the Enterprise.
-- Kirk, "The Corbomite Maneuver", stardate 1514.0

Background

I use MythTV quite frequently and noticed that it is instable when using sasc-ng as a decoder to decrypt encrypted DVB-T channels. So approximatly every third day the MythTVbackend server stops and need to be started again. I have wriiten an earlier article about howto monitor MythTV with Nagios or op5 Monitor so I get noticed that it has stopped. But I need to manually start it again. This article describe howto make Nagios or op5 Monitor to start a stopped MythTVbackend. It can be used for starting almost any service.

I have used the examples provided by Ethan at Nagios official documentation describing eventhandlers.

Normally it is not recommended to let a tool like Nagios or op5 Monitor start a service that has stopped, because it is probably a reason why the service has stopped and the correct procedure is to fix the root cause of the problem, not the symptom.

The MythTV backend runs on one machine called lala (after a character in Teletubbies) which is not the same as the Nagios or op5 Monitor server. I use nrpe to run the start script i.e.

 /etc/init.d/mythtv-backend start

There is several options here but I already setup the nrpe agent and it is simple to make Nagios or op5 Monitor to use nrpe to run a script.

 

 

Implementation

I used the script I found at Nagios documentation about eventhandlers as a base and modiied it slightly.

 

At my op5 Monitor machine

/opt/plugins/custom/restart-mythtv-lala.sh
#!/bin/sh
#
# Event handler script for restarting the mythTVbackend server on lala
#
# Note: This script will only restart the mythtvbackend if the service is
#       retried 2 times (in a "soft" state) or if the service somehow
#       manages to fall into a "hard" error state.
#

# What state is the mythbackend service in?
case "$1" in
OK)
	# The service just came back up, so don't do anything...
	;;
WARNING)
	# We don't really care about warning states, since the service is probably still running...
	;;
UNKNOWN)
	# We don't know what might be causing an unknown error, so don't do anything...
	;;
CRITICAL)
	# Aha!  The HTTP service appears to have a problem - perhaps we should restart the server...

	# Is this a "soft" or a "hard" state?
	case "$2" in

	# We're in a "soft" state, meaning that Nagios is in the middle of retrying the
	# check before it turns into a "hard" state and contacts get notified...
	SOFT)

		# What check attempt are we on?  We don't want to restart the web server on the first
		# check, because it may just be a fluke!
		case "$3" in

		# Wait until the check has been tried 3 times before restarting the web server.
		# If the check fails on the 4th time (after we restart the web server), the state
		# type will turn to "hard" and contacts will be notified of the problem.
		# Hopefully this will restart the web server successfully, so the 4th check will
		# result in a "soft" recovery.  If that happens no one gets notified because we
		# fixed the problem!
		2)
			echo "`date` Restarting mythtv service (2rd soft critical state)..." >> /tmp/mythtvstart
			# Call the init script to restart the mythbackend server
			#/etc/rc.d/init.d/httpd restart
			#date >> /tmp/mythtvstart
			/opt/plugins/check_nrpe -H lala -c start_mythtvbackend
			;;
			esac
		;;

	# The mythtvbackend service somehow managed to turn into a hard error without getting fixed.
	# It should have been restarted by the code above, but for some reason it didn't.
	# Let's give it one last try, shall we?
	# Note: Contacts have already been notified of a problem with the service at this
	# point (unless you disabled notifications for this service)
	HARD)
		echo "`date` Restarting mythtv service (hard state)..." >> /tmp/mythtvstart
		# Call the init script to restart the HTTPD server
		#/etc/rc.d/init.d/httpd restart
		#date >> /tmp/mythtvstart
		/opt/plugins/check_nrpe -H lala -c start_mythtvbackend
		;;
	esac
	;;
esac
exit 0

 

 

/opt/monitor/misccomands.cfg
# command 'restart-mythtv-lala'
define command{
    command_name                   restart-mythtv-lala
    command_line                   /opt/plugins/custom/start-mythtv-lala.sh $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$
    }

 

/opt/monitor/etc/services.cfg

# service 'Mythbackend'
define service{
    use                            default-service
    host_name                      lala
    service_description            Mythbackend
    check_command                  check_tcp!6543
    servicegroups                  MythTV,it-slav
    event_handler                  restart-mythtv-lala!$SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$
    contact_groups                 it-slav_sms,it-slav_jabber,it_slav_mail
    }

At my mythbackend machine lala

/etc/nrpe.d/mycommands.cfg
command[start_mythtvbackend]=/usr/bin/sudo /etc/init.d/mythtv-backend start

/etc/sudoers
nobody ALL= (root) NOPASSWD:/etc/init.d/mythtv-backend start
Notice that my nrpe agent run as user nobody
 
 
 
 

Test

I stopped the mythtvbackend by running:

peter@lala:/etc/nrpe.d$ date
Mon Jun 15 20:40:55 CEST 2009
peter@lala:/etc/nrpe.d$ sudo /etc/init.d/mythtv-backend stop
 * Stopping MythTV server: mythbackend

And run

[root@op5 ~]# tail -f /tmp/mythtvstart
Mon Jun 15 20:47:09 CEST 2009 Restarting mythtv service (2rd soft critical state)...

YES it works!

 

Links:

  • op5 Monitor a Nagios based supported enterprise Monitoring software.
  • MythTV a free OpenSource Digital Video Recorder
  • Nagios Open Source Monitoring

 

Share

2 Responses to “Using Nagios or op5 Monitor eventhandler to start a service that has stopped”

  1. tepisi Says:

    My event_handler does not even start! What is cause?

  2. peter Says:

    How do you know that?

Leave a Reply


+ 5 = six





Book reviews
FreePBX 2.5
Powerful Telephony Solutions






Asterisk 1.6
Build a feature rich telephony system with Asterisk






Learning NAGIOS 3.0





Cacti 0.8 Network Monitoring,
Monitor your network with ease!