Servers Alive Case Study: Automation
The Holy Grail of most systems and network administrators is the complete automation of every mundane and routine task. Often, they wind up continuing to perform those same tasks manually because they lack the time to locate and learn the tools available to automate that task or to write the tool themselves. This can be especially frustrating at four in the morning when a service has stopped for the third time that week.
The company in our case study is primarily interested in the monitoring features of Servers Alive, but becoming aware of some of the automated features, realizes that it can help handle certain routine tasks that result in after hours calls, as well. Within the last year, the company has streamlined and automated it's backup processes to the point where they no longer need a dedicated individual to act as a backup operator during the overnight shift, so no one is on site when an alarm is triggered after business hours.
In addition to a wide range of monitoring capability, Servers Alive can use it's functionality to attempt to recover from a problem that might otherwise require human intervention. Those tasks include setting a check to reboot a Windows server if a certain condition is met or restarting Windows services. On the next cycle it will continue with it's normal check so there is a a safety net to make sure the server has come up correctly and all processes and services are started. This can be particularly useful if you have services that may hang and need to be restarted after a connectivity interruption (such as a server being rebooted).
Servers Alive is also able to perform a "second knock" during it's check cycle. If the server does not respond or appears down and this option is enabled, Servers Alive will try to perform the check again at the end of it's cycle. If you have slow responding servers, this option can help to reduce false alarms.
Also supported in Servers Alive are external checks written in Visual Basic. External errorlevel checks should return an errorlevel to Servers Alive to determine whether an alarm should be triggered. External COM checks are a little more sophisticated and when added to Servers Alive, will provide their own configuration options. Both of these can help to increase the scope of what Servers Alive can monitor automatically.
With some of the Servers Alive add-ons, the software is also capable of checking for the existence of certain files, matching by size, name, or time and date stamp. This is particularly helpful if you need to verify the existence of a file, for instance a report that gets uploaded to an FTP server. Some of the add-ons available are also capable of performing their own simple tasks. For example, the FileFirstLine Check can move a file matching the specified pattern to a different location.
The company plans to use that functionality to check the existence of several backup files that right now staff must verify manually every morning. It also plans to set up a check to verify that a log file is being updated. In the past, that log file update has failed and up to a day's data had been lost. With a solution like Servers Alive, the system administrator would have been notified the first time it failed.
In environments where on-call and after-hours time is billable, and especially if it is contracted support, some automatic problem resolution can result in significant cost savings over the long term. In addition, Servers Alive being able to reboot systems or restart services could potentially prevent the need to make on site visits after hours. For instance, if the VPN gateway server locks up and systems administrators are not able to connect remotely, Servers Alive may be able to detect that and correct the issue during it's next cycle of checks.
Servers Alive is not just a potent network monitoring solution, it is also capable of performing simple automated tasks to attempt restoration of a service or recovery from a problem. The impact of this ability is to allow trained IT personnel to focus on more demanding projects rather than routine corrective actions.
What's new in v8.x?
Prices from 199 EURO
Stephen A. De La Marche
Data Center Operations Manager
AtlantiCare Health System, Atlantic City Medical Center & InfoShare