Index of /safte-monitor

[ICO]NameLast modifiedSizeDescription

[DIR]Parent Directory  -  
[   ]CHANGELOG15-Feb-2005 00:28 1.3K 
[TXT]README.html15-Feb-2005 00:13 6.6K 
[DIR]archive/14-Oct-2004 05:43 -  
[DIR]debian/15-Feb-2005 00:59 -  
[   ]safte-monitor-0.0.6-1.i386.rpm14-Oct-2004 05:50 36K 
[   ]safte-monitor-0.0.6-1.src.rpm14-Oct-2004 05:50 61K 
[   ]safte-monitor-0.0.6.tar.gz14-Oct-2004 05:49 58K 
[   ]safte-monitor-1.0.0rc1-1.i386.rpm14-Feb-2005 23:44 39K 
[   ]safte-monitor-1.0.0rc1-1.src.rpm14-Feb-2005 23:43 61K 
[   ]safte-monitor-1.0.0rc1.tar.gz14-Feb-2005 23:46 56K 
[   ]safte-monitor_0.0.6-1_i386.deb14-Oct-2004 05:49 37K 
[TXT]web-interface-example.html14-Oct-2004 08:37 16K 

safte-monitor - Linux SAF-TE SCSI enclosure monitor

Latest release: safte-monitor-1.0.0rc1.tar.gz

safte-monitor reads disk enclosure status information from SAF-TE capable enclosures (SCSI Accessible Fault Tolerant Enclosures). SAF-TE is common on many intelligent SCSI disk enclosures and some rackmount servers with SCSI hotswap drive bays. safte-monitor can monitor multiple SAF-TE devices and will automatically probe and detect them.

The information retreived includes power supply, fan, temperature, audible alarm, drive faults, array critical / failed / rebuilding state and door lock status. safte-monitor logs changes in the status of these enclosure elements to syslog and can optionally execute an alert help program with details of the component failure. This could send a pager message for example. Temperature alert limits can also be set.

safte-monitor is specifcally useful if you have equipment deployed in remote locations or unattended over weekends when no-one may hear an audible alarm.

usage:	./safte-monitor [-h] [-p] [-n] [-a] [-T] [-t <max_temp>] [-A <alert_prog>]

-h     show this help message
-p     print - print device scan information then exit
-T     log temperature changes
-N     alert for non critical state changes
-A <f> program to run for alerts
-t <n> max temperature (default 35.0 celcius)
-n     numeric sg device names eg. /dev/sg0 (default)
-a     alpha sg device names eg. /dev/sga

By default temperatures and temperature limits are in Celcius. This can be changed to Farenheit by removing the -DUSE_CELCIUS from the Makefile.

Example alert helper program:

#!/bin/sh
echo ALERT device=$1 message=$2 system=$3 partno=$4 code=$5 | \
	mail -s 'safte alert' root

Example output from a scan

# ./safte-monitor -p
SAF-TE Device ESG-SHV SCA HSBP M10 (0:0:6:0)
no. of fans           = 0
no. of power supplies = 1
no. of device slots   = 4
door lock installed   = 0
no. of temp sensors   = 1
audible alarm         = 0
no. of thermostats    = 0
power supply 0 is okay and on
device slot 0 disk present,active,unconfigured
device slot 1 insert ready
device slot 2 insert ready
device slot 3 insert ready
temp sensor 0 is 25.0 celcius and okay
overall temperature is okay

SAF-TE Device CNSi G8324 (2:0:0:50)
no. of fans           = 4
no. of power supplies = 2
no. of device slots   = 6
door lock installed   = 1
no. of temp sensors   = 3
audible alarm         = 1
no. of thermostats    = 0
power supply 0 is okay and on
power supply 1 is okay and on
fan 0 is operational
fan 1 is operational
fan 2 is operational
fan 3 is operational
device slot 0 disk not present,no error
device slot 1 disk not present,no error
device slot 2 disk not present,no error
device slot 3 disk present,no error
device slot 4 disk present,no error
device slot 5 disk present,no error
door lock is locked
speaker is off
temp sensor 0 is 25.6 celcius and okay
temp sensor 1 is 25.6 celcius and okay
temp sensor 2 is 21.7 celcius and okay
overall temperature is okay

Example syslog messages

a temperature change (if option is selected to log temp changes)

SAF-TE Device CNSi G8324 (2:0:0:52): temp sensor 2 changed from '23.9 degrees' to '22.8 degrees'

a power supply failing

SAF-TE Device CNSi G8324 (2:0:0:51): power supply 1 changed from 'okay and on' to 'malfunctioning and commanded on'
SAF-TE Device CNSi G8324 (2:0:0:51): ALERT power supply 1 malfunctioning and commanded on

the power supply back up again

SAF-TE Device CNSi G8324 (2:0:0:51): power supply 1 changed from 'malfunctioning and commanded on' to 'okay and on'

a drive in an array fails

SAF-TE Device CNSi G8324 (2:0:0:51): device slot 4 changed from 'disk present,no error' to 'disk present,critical array'
SAF-TE Device CNSi G8324 (2:0:0:51): ALERT device slot 4 is disk present,critical array
SAF-TE Device CNSi G8324 (2:0:0:51): device slot 5 changed from 'disk present,no error' to 'disk present,faulty'
SAF-TE Device CNSi G8324 (2:0:0:51): ALERT device slot 5 is disk present,faulty

after a rebuild has been started

SAF-TE Device CNSi G8324 (2:0:0:51): device slot 4 changed from 'disk present,critical array' to 'disk present,rebuilding,critical array'
SAF-TE Device CNSi G8324 (2:0:0:51): ALERT device slot 4 is disk present,rebuilding,critical array
SAF-TE Device CNSi G8324 (2:0:0:51): device slot 5 changed from 'disk present,faulty' to 'disk present,rebuilding,critical array'
SAF-TE Device CNSi G8324 (2:0:0:51): ALERT device slot 5 is disk present,rebuilding,critical array

and now the array is okay again

SAF-TE Device CNSi G8324 (2:0:0:51): device slot 4 changed from 'disk present,rebuilding,critical array' to 'disk present,no error'
SAF-TE Device CNSi G8324 (2:0:0:51): device slot 5 changed from 'disk present,rebuilding,critical array' to 'disk present,no error'

Bugs

Please send bug reports to michael@metaparadigm.com

Anonymous CVS

# export CVSROOT=:pserver:anoncvs@cvs.metaparadigm.com:/cvsroot
# cvs login
Logging in to :pserver:anoncvs@cvs.metaparadigm.com:2401/cvsroot
CVS password: <enter 'anoncvs'>
# cvs co safte-monitor

To do

The SAF-TE spec can be found here.

Copyright Metaparadigm Pte. Ltd. 2001-2005. Michael Clark

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License version 2 as published by the Free Software Foundation.