This is a repost. This is an excellent list of all the operation areas that need to be well managed. The theme is: with automation, templates, and monitoring. Not with manual fiddling until it works. See below, for the link to the full post.
Structured system management is a concept that covers the fundamentals of building, securing, deploying, monitoring, logging, alerting, and documenting networks, servers and applications. Structured system management implies that you have those fundamentals in place, you execute them consistently, and you know all cases where you are inconsistent. The converse of structured system management is what I call ad hoc system management, where every system has it own plan, undocumented and inconsistent, and you don’t know how inconsistent they are, because you’ve never looked.
You know you have structured system management when:
- You manage with scripts, not mouse clicks.
- You manage consistently across platforms.
- You deploy servers from images, not install DVD’s.
- You can re-install a server and its applications from your documentation.
- You install and configure to the least bit principle (WebCite) for all your devices, servers, operating systems and applications.
- You have full remote management of all your servers and devices.
- You build router, switch and firewall configurations from templates, not from scratch.
- You have version control and auditing on critical system and application configuration files.
- You automatically monitor, strip chart and alert on at least memory, network and disk I/O, and CPU.
- You automatically monitor, strip chart and alert on application response time and availability.
- You have a structure and process for your documentation
- You have change management and change auditing sufficient to determine who/what/when/why on any change to any system or application critical file.
- You have a patch strategy,
- You have neat cabling and racks.
- You determine root cause of failures, outages and performance slowdowns
- You have centralized logging, alerting, log rotation and basic log analysis tools.
- You have installed, configured and are actually using your vendor provided platform management software.
- You keep it simple