 ¿Qué desea comprobar con un servicio como WatchMouse? As I explained in my earlier column, you could be using a monitoring
service in a number of roles. Common to all these roles is the fact that
you are keeping alive some services for the benefit of your customers,
suppliers, employees or partners. These users are, in the end, all that
counts.
What are the objects that you should be checking? Obviously, the least
you want to do is check the service that is most visible to these users.
This could be the webserver, or a POP or FTP server for example. You
would start by setting up a rule to check the server and a URL. The frequency with which you can monitor (that is: the elapsed
time between checks) is typically limited by the type of subscription
that you have. Only in specific cases you would not check as often as your
subscription allows.
Note that there is a difference between a CONNECT on port 80 rule and a HTTP rule.
The first just connects to the port that the webserver is supposed to
use. The HTTP rule also checks if the webserver can produce a valid HTTP
response, and the documant can be found. You probably want the latter check.
Similar reasoning applies to POP and FTP checks. If you set up two different rules on the same host, this allows you to distinguish for example between a broken webserver and a host that is down. If you want even more content
oriented checks, have a look at the so-called PLUG-IN rules.
Additionally, you can set up checks to make sure that your
users are actually using the services that you intend them to. The whole
Internet is very dependent on a correctly functioning domain name system
(DNS). If it does not work properly your users may be directed to
another site than you intended. This could be a configuration error, but
it could also be a defamation hack. In either case, you want to know.
First of all you want to check whether the root servers of the Internet
accurately find the DNS that is serving you. This can be checked with a
DNSNS rule. What you are checking with this rule is whether the registrar's databases are correct. Second, you want to check if that DNS server (and its
slaves) are serving up the proper IP address for the server. For this
you can use the DNSA rule, and it will warn you if the DNS server is not
working or serves up the wrong address. (Note that the hosting party can
change that address at its discretion, as part of a renumbering
operation for example.)
Who should you notify of rule failures? Again, different roles have
different information requirements. You want to notify the person who
can fix things as soon as possible. Mail or SMS them directly, you do
not want to be in the loop. You might set up an escalation chain, which
fires off after a certain amount of errors. Note: make sure that
you send the message on a channel that is not affected by the outage: if
your e-mail system does not work, delivering a message to that effect
should not depend on that e-mail system.
The people in charge of overseeing somebody else's service levels should
only get escalation messages, if at all. Rather they should get the
weekly or monthly service reports.
Peter van Eijk is a management consultant specialised in management of network infrastructures. He can be reached via his contact page. |