What do you want to check with a service such as Watchmouse? (2005-01-31)
As I explained in my previous column, you can use a monitoring service in a number of roles. Common to all these roles is the fact that
you are keeping alive some services for the benefit of your customers,
suppliers, employees or partners. These users are, in the end, all that
counts.
What are the objects that you should be checking? Obviously, the least
you want to do is check the service that is most visible to these users.
This could be the webserver, or a POP or FTP server for example. You
would start by setting up a rule to check the server and a URL. The frequency with which you can monitor (that is: the elapsed
time between checks) is typically limited by the type of subscription
that you have. Only in specific cases would you not check as often as your
subscription allows.
Note that there is a difference between a CONNECT on port 80 rule and a HTTP rule.
The first just connects to the port that the webserver is supposed to
use. The HTTP rule also checks whether the webserver can produce a valid HTTP
response, and whether the document can be found. You probably want the latter check.
Similar reasoning applies to POP and FTP checks. If you set up two different rules on the same host, this allows you to distinguish for example between a broken webserver and a host that is down. If you want even more content
oriented checks, have a look at the so-called PLUG-IN rules.
Additionally, you can set up checks to make sure that your
users are actually using the services that you intend them to. The whole
Internet depends heavily on the domain name system(DNS) functioning correctly. If it does not work properly your users may be directed to
another site than you intended. This could be a configuration error, but
it could also be a defamation hack. In either case, you want to know.
First of all you want to check whether the root servers of the Internet
accurately find the DNS that is serving you. This can be checked with a
DNSNS rule. What you are checking with this rule is whether the registrar's databases are correct. Second, you want to check if that DNS server (and its
slaves) are serving up the proper IP address for the server. For this
you can use the DNSA rule, and it will warn you if the DNS server is not
working or serves up the wrong address. (Note that the hosting party can
change that address at its discretion, as part of a renumbering
operation for example.)
Who should you notify of rule failures? Again, different roles have
different information requirements. You want to notify the person who
can fix things as soon as possible. Mail or SMS/text them directly, you do
not want to be in the loop. You might set up an escalation chain, which
fires off after a certain amount of errors. Note: make sure that
you send the message on a channel that is not affected by the outage: if
your e-mail system does not work, delivering a message to that effect
should not depend on that e-mail system.
The people in charge of overseeing somebody else's service levels should
only get escalation messages, if at all. Rather, they should get the
weekly or monthly service reports.
Peter van Eijk is a management consultant specialized in management of network infrastructures. He can be reached via his contact page.
Website performance is the key to customer satisfaction (2007-06-27)
How often have you typed in the Google URL and received a page that will not load? I am willing to bet that this is a rare occurrence. Despite its busy traffic, Google is a textbook example of a web site that has almost perfect performance and therefore serves a great number of satisfied customers. The market share of the search engine is a resounding confirmation of this. You are assisted quickly, so you come back sooner. Research conducted by JupiterResearch has revealed that visitors to a site only have 4 seconds of patience. If the site has not been loaded by that time, they leave. Error messages also prompt potential customers to go to the competition.
Why do organisations still devote so little attention to the effective availability of their site? Performance is the key to satisfied customers. For many companies, their web site is the face of the organisation. Consumers and also business users of the Internet use the wealth of information on the web to compare purchasing options. It is of immeasurable importance that they are also actually able to find what they are looking for. If this is not possible at one company, competitors are straining at the leash to offer their services through a correctly functioning site.
Coming back to the praise that we had for Google, we see that the search engine has made significant investments in the availability of its web site. The page is run by several machines at various sites. If one crashes there are enough back-up servers that can take over the traffic flows to guarantee optimum performance. In addition, the search machine invests a great deal of time and money in the right hardware and people. Although the site has a difficult task – searching through an index of billions of documents – it is almost always available and loads fast.
The actual site is unspectacular in construction. This applies to the majority of sites with a high level of availability. Simple sites such as the news site NU.nl are almost always easy to access. Nevertheless, it is not only the layout of the site that determines how the web page performs. Too many photos, long symbols and frills make web sites slower to respond. The fact that the ‘back end’ of the site is not efficiently programmed also contributes to longer loading times. Frequent consultation of background databases is also detrimental to the speed of the page.
Where it often goes wrong is when different people are working on a site, thereby disturbing the links between the various elements. The different parts of the site will work correctly, but the site as a whole will fail to perform. This means long waiting times for people who want to use the services of a company.
Service providers at the upper end of the market are becoming increasingly aware of this. The contracts that they use frequently include a service level agreement (SLA) for the part for which they are responsible. Nevertheless, they regularly make mistakes due to the fact that the promised performance is not subsequently verified (by an independent party). Although it is now essentially part of the contract, there is insufficient actual verification. Ideally, web site performance should become a permanent component of a contract. In addition, clear internal agreements must be made on who has final responsibility for the efficient loading and availability of a site.
Regular testing is also essential for the facilitation of good availability. This will prevent a great deal of errors, keeping the site up and running at crucial times. The storm that blew over the Netherlands at the end of January was a good opportunity to see which sites were prepared for extreme loads and which were not. The site of the Dutch weather institute, KNMI, was almost unreachable, while some logical thought could have protected them from this eventuality. If you know that a major storm is heading towards the country you can be sure that people will search for information on the weather and roads on the Internet. Sites such as those of KLM and Schiphol were also unreachable, while the specially created site Crisis.nl, which had been kept as simple as possible, was able to serve a large number of people.
Including ‘stress tests’ in a SLA or conducting them regularly in-house is therefore to be recommended. Companies can easily take control by ensuring that their service provider executes this type of test or by putting their own site under pressure. This is the best method of checking whether your web site can handle a sudden increase in visitor numbers. It is also good to know whether the servers on which your site is running actually ensure that your page is always available and loads correctly. For companies, it is crucial to see when they are off air. This can save them a large amount of money every year and will also reduce the number of irritated visitors to the site. This is how you keep customers satisfied and keep the company running.
Mark Pors
Chief Technology Officer at WatchMouse
WatchMouse provides site performance monitoring and stress test services
Cisco Releases Security Advisory to Address Multiple Vulnerabilities in Unified CallManager and Presence Server (2007-03-30)
Cisco Systems has released Security Advisory cisco-sa-20070328-voip to address multiple vulnerabilities in the Cisco Unified CallManager (CUCM) and Cisco Unified Presence Server (CUPS). The advisory indicates that the following attack vectors could be used against a vulnerable system:
- It may be possible to crash a CallManager system, resulting in a denial of service, by sending a series of specially crafted packets to the Skinny Call Control Protocol (SCCP) service port.
- It may be possible to cause various CUCM / CUPS services to crash, resulting in a denial of service, by sending a large amount of ICMP Echo Requests (Ping) to a CUCM or CUPS system.
- It may be possible to cause various CUCM / CUPS services to fail, resulting in a denial of service, by sending a specific UDP packet to the IPSec Manager Service on UDP port 8500.
There are no workarounds for these vulnerabilities; however, Cisco has released free software to address the flaws described in this report.
More information, including links to the fixes, can be found in Cisco Security Advisory cisco-sa-20070328-voip - Multiple Cisco Unified CallManager and Presence Server Denial of Service Vulnerabilities.