Last In – First Out: Ad-Hoc Verses Structured System Management

This is a repost. This is an excellent list of all the operation areas that need to be well managed. The theme is: with automation, templates, and monitoring. Not with manual fiddling until it works. See below, for the link to the full post.

Structured system management is a concept that covers the fundamentals of building, securing, deploying, monitoring, logging, alerting, and documenting networks, servers and applications. Structured system management implies that you have those fundamentals in place, you execute them consistently, and you know all cases where you are inconsistent. The converse of structured system management is what I call ad hoc system management, where every system has it own plan, undocumented and inconsistent, and you don’t know how inconsistent they are, because you’ve never looked.

You know you have structured system management when:

  • You manage with scripts, not mouse clicks.
  • You manage consistently across platforms.
  • You deploy servers from images, not install DVD’s.
  • You can re-install a server and its applications from your documentation.
  • You install and configure to the least bit principle (WebCite) for all your devices, servers, operating systems and applications.
  • You have full remote management of all your servers and devices.
  • You build router, switch and firewall configurations from templates, not from scratch.
  • You have version control and auditing on critical system and application configuration files.
  • You automatically monitor, strip chart and alert on at least memory, network and disk I/O, and CPU.
  • You automatically monitor, strip chart and alert on application response time and availability.
  • You have a structure and process for your documentation
  • You have change management and change auditing sufficient to determine who/what/when/why on any change to any system or application critical file.
  • You have a patch strategy,
  • You have neat cabling and racks.
  • You determine root cause of failures, outages and performance slowdowns
  • You have centralized logging, alerting, log rotation and basic log analysis tools.
  • You have installed, configured and are actually using your vendor provided platform management software.
  • You keep it simple

For details see the blog: Ad Hoc Verses Structured System Management (WebCite)

Advertisements

No, you are not doing DevOps (…and nor am I)

Good rant and insights.

A subset of #2, keep the operators and sysadmins, but make them subordinate to developers, who will define new production architectures that can be changed just as fast as their continuous development environments. They get to play around in production, ignoring many years of experience in managing large scale infrastructures. Oh they will eventually get things stable, and working well, with a custom mix of tools and patches. However the next batch of developers will wonder “what it going on?” and they will start rewriting things until they understand it. Of course business will not mind unstable web sites, down time, and lots of engineers working to continuously reinvent things.

A sysadmin's logbook

A word of caution

This post is addressed to all those people who think they know what DevOps means, but they don’t. If you don’t recognize yourself in this post, then maybe it’s not for you. If you recognize yourself, then beware: you’re going to be insulted: read at your own risk and don’t bother asking for apologies.

View original post 1,170 more words