Charter:
Changing the paradigm of infrastructure monitoring from monitoring
devices to monitoring service delivery. When applicable replacing
or augmanting current monitoring strategies to fill gaps.
Current monitoring strategies
- Servers UNIX
Watch Dog – daily monitors for servers and some services
ORCA – long term performance monitors
- Servers Windows
SiteScope – both daily and medium term (memory, CPU, ping,
some services)
- Network
OpenView - device monitor
Gaps:
1. We monitor devices not services. Many of our services run on
multiple machines ie web front end with a back end database. We
currently rely on knowledge of operators to correlate nonfunctioning
device with problems in service delivery.
2. We currently have no good way to relate an issue on a network
device to a potential service delivery problem.
3. Inconsistent notification and escalation. Which servers need
24x7? Who gets notified when? Etc. Bob Steffen is working on a procedural
solution. OpenView can be the technology, which will make the implementation
of such policies easier