Monthly Archives: September 2014

Why do we need configuration management?

Why use Configuration Management?

Gartner says that an average of 80 percent of mission-critical application service downtime is directly caused by people or process failures. The other 20 percent is caused by technology failure, environmental failure or a disaster. Most of us in IT Services are very much aware of high profile downtime incidents in the last year caused to some degree by changes to our production configuration.  In yesterday’s Miami Matters announcement of our new AVP of Enterprise Operations, stabilizing our core operational services was called a “critical need” and his primary charge.

The success of our new strategic imperative Stabilize The Core will depend first and foremost, on our ability to improve our Service Asset Configuration Management (SACM) decisions and processes. In yesterday’s communication, we defined What a Configuration Item (CI) is. This communication will address Why we need to get better at identifying, controlling and maintaining our CIs throughout their lifecycle.

You Shouldn’t Change What You Don’t Know

Unintended consequences such as service outages and lost data can often be traced to something the requester and/or implementer of a change didn’t know about our production configuration, or something about our configuration that was incorrect or out of date. Better SACM is about institutionalizing what we know about our configuration in a consistent manner in a Configuration Management Database (CMDB) using a Configuration Management System (CMS) enabling more mature CM processes.

Better, Faster, Cheaper

Changes can be performed more efficiently and effectively by maintaining an accurate, complete and accessible understanding of all of the CIs involved in or impacted by changes we make and the dependencies between them. If we are diligent about maintaining the integrity of the knowledge of our CIs and their relationships in a CMDB using a CMS, it won’t require a major research effort to feel good about making a change.

What Happened?

SACM is not a panacea and won’t result in 100% uptime. In the event that a change to our configuration fails or a major incident occurs, we need the ability to isolate the cause and return our configuration to a functional state or resolve the incident as quickly as possible. Stability can be more of a perception to those who depend on it than a reality to those charged with providing it. Better SACM can dramatically enhance both the perception and reality of the stability of our core services through more efficient resolution of failed changes and major incidents. This can be accomplished through more accurate, complete and accessible information about the past, present and future state of our services and CIs.

Who did that?

The ultimate self-inflicted major incident or failed change to a service or CI is caused by someone or something without proper authorization. Better SACM means ensuring that the right CIs are changed by the right people.

There are many more reasons for improving our SACM. We have outlined the primary motivation behind our SACM maturity efforts and hope you have a better understanding of why SACM is important to all of us. Feel free to contact any member of the SACM process improvement team with questions or more information:

What is a CI?

We use many two-letter abbreviations and three-letter abbreviations in IT Services (I just used one there).

CI is a two-letter abbreviation that you’ll be hearing much more about in the weeks leading up to the roll-out of Service Excellence in Team Dynamix Release 2. You might already be familiar with using the two-letter abbreviation of CI as shorthand for continual improvement or continuous integration.

The glossary on the Service Excellence site defines a CI as a configuration item

Configuration items are anything that needs to be managed in order to deliver an IT service.

Configuration items are under the control of change management; they are the things that change management actually changes. A CI might be an application, a piece of infrastructure, another IT service, a facility, a documented process or even a person.

If change management is changing the configuration items, then how do you make one in the first place? Or retire one once it is no longer useful? Where are configuration items recorded, stored, and maintained? Is there a process that spells all of this out for us that we could adopt? Stay tuned for much more information about the Service Asset and Configuration Management process.

Change Management Roles

Change Management Roles

Happy Autumn — Change is in the air! But who is making the changes and when and why? Here is a brief rundown of the roles involved in Change Management (there may be a quiz):

Change Manager
Person charged by the CIO to Stabilize the Core — ensure that improper Changes do not reduce the robustness of the IT environment
Also known as the Change Manager, and Chair of the CAB
Currently, Dave Beitz is the Change Manager (and also the Champion of Production, reflecting his role to ensure a robust infrastructure).
Change Process Owner
Person charged with working with the Change Manager to ensure effective and efficient an Change Process exists and is followed to ensure a minimum amount of interruptive effects upon the running operation.
Currently, Micah Cooper is the Change Process Owner
Requestor
Can be someone inside or outside IT Services who asks that a change be made in either technical or non-technical terminology.
e.g.: Bursar asks for a transaction backout — the Bursar is the Requestor.
Implementer
The person who actually executes the change.
e.g.: George backs out the transaction. George is the implementer.
Local Change Manager (LCM)
Often the Service Owner or First Line Manager (FLM) over the Service affected by the Change.
This person is responsible for verifying the risk of the change, scheduling the change, and either approving the change for low risk (Category 3 and Category 4) changes or escalating the change to the CAB for higher risk changes (Category 1 and Category 2).
If a change affects more than one Service, you will probably need approval from more than one LCM.
e.g.: Dan Johnson is the FLM and Service Owner over the portion of Banner where the financial blackout needs to occur. He is the Local Change Manager.
Change Authority
The person or persons charged with approving the Change. Depending on factors involved in the change, this may be the implementer, the Local Change Manager, CAB, or eCAB.
e.g.: If the Bursar’s requested Change is Category 3, Dan Johnson is also the Change Authority.
Change Advisory Board (CAB)
An ongoing team of individuals approved by the IT Services Leadership Team to provide on-going advice on Change and improvement in Change Management processes.
The CAB also reviews and authorizes the list of Standard Changes
Responsibilities are listed here
The CAB meets for 1 hour on Monday afternoon and for 1.5 hours on Thursday morning.
Emergency Change Advisory Board (eCAB)
A team of individuals gathered to act as the Change Authority for a Change that does not fit into an approved Change Window.
For Category 3 & 4, the eCAB comprises the Implementer, optionally the requestor, and at least one of the Implementer’s manager, Service Owner, Major Incident Leader, LT member, and/or the Change Manager
For Category 1 & 2, the eCAB comprises the Implementer, an LT member, the Change Manager, and one or more of the Implementer’s manager, Service Owner, Major Incident Leader

These are the key roles in Change Management. There may be a few other roles that get brought up from time to time, but these are enough to get you started and fascinate friends at your next Fall Harvest party. Stand by for more communication about Change Management! If you have any questions, reach out to any of us.

Micah Cooper, Change Management Process Owner
Dave Beitz, Change Manager (and Champion of Production)
Release 2 Team

Why do we need change management?

You know change is going to happen, and since change is going to happen, you have two choices — manage it or let it happen on its own. Changes that are left to happen on their own never come out anywhere as good as ones that were properly managed. The credibility of IT Services across the university begins with offering stable, predictable services and consistent experiences for all stakeholders involved.

A solid change management process enables us to:

  • Minimize disruption and protect the production environment.
  • Exert some control over the schedule as well as the change.
  • Reduce the overall pain of the change.
  • Getting affected stakeholders involved in planning and executing the change, which leads to better adoption.

Establish a foundation for change management that will make future changes easier.

Change management is not magic, nor is it impossibly difficult. It requires careful planning, involving everyone affected; frequent communications, with all stakeholders; clearly defining the goal, including a way of measuring success; and finally, executing the plan with appropriate safety measures in place.

The change management process the Service Excellence Release 2 team is building will enable us to achieve the beneficial outcomes. For more information about this process, our timeline, and various design documents, visit our google site at https://sites.google.com/a/miamioh.edu/service-excellence-project/

What’s in a name? Evidently Shakespeare wasn’t in IT

The Service Excellence Release 2 team has been designing our future change and configuration management processes. In this pursuit, we are defining the word “change”, which is critically important in establishing the scope of the change process.

Our working definition states that a change is “the addition, modification, or removal of anything that could have an effect on customer-facing services.”

As an example, “Wireless Internet” is an IT service, and one we will use here to illustrate the scope of “Change”. Things used to provide the Wireless Internet service include:

  • machine room and network architecture
  • software on servers and network devices
  • data (e.g. list of users that can use the service)
  • established IT processes (e.g. process for adding new access points)
  • metric definitions (e.g. how bandwidth is measured at a client’s workstation)
  • official end-user documentation (e.g. knowledge base cases)

Adding, removing, or changing any of the above constitutes a change, and is in-scope for Release 2!

For a more detailed look at the definition the team has created, see our full working definition.

We know what you’re thinking: Overhead. Bureaucracy! Waste. Fortunately the team shares that concern and is working hard to create a process that scales to optimize our risk as we deploy changes to production.

Stay tuned for more information about how this new process will allow us to:

  • Respond to changing business requirements and align services with business needs
  • Ensure that changes are properly handled and result in positive outcomes
  • Ensure that all stakeholders are properly prepared for the change to take place