Monday, February 28, 2011

Monitoring

Most enterprise IDM systems are very complicated and complex beasts with redundancy in not only the presentation layer but in the application layer and the data persistence layer. This can make it hard to answer the simple question "Is the system working properly or not". In most cases you also want to be able to spot issues early so that you can fix them before they become a problem that may take down the entire system.

The best way to ensure that all components are healthy and all services are up is to implement a comprehensive monitoring program so what are the things that you tend to want monitor?

The most basic monitoring policy is the "wait until the end user yells" approach. In this approach you simply wait for the end user to start screaming and as long no one is screaming then things must be fine. This approach have some significant limitations so it is not the way I would recommend.

Once you start talking monitoring you usually discover that the corporation has some kind of standardized monitoring tool that you should/must use. These tools usually can provide the following functionality:

  • Host monitoring (is the server OS up)
  • CPU/Memory/disk monitoring
  • Process monitoring
Monitoring can be done either with a monitoring agent that is installed on each server or through an agentless approach. Having basic infrastructure monitoring can be very useful as it will alert you about creeping issues such as small memory leaks or logs that slowly but steadily eats up all available disk space. The trick is to make sure you actually can fine tune the alarm threshold and response level as you go along. In most cases you do want to be told if the CPU suddenly spikes from a max of 25% to 95% but as being woken up at 2 am every second Wednesday because the CPU load spikes for a few seconds during a batch load may not be ideal for you (or your marriage) you do want the ability to put in exceptions in the logic.

In most cases you will have some kind of network or port monitoring as part of your load balancer setup. Given that port oriented network monitoring configuration tends to get very complex I will write about this in a separate post.

Next step is to look at the application aware monitoring. This is usually accomplished by looking at the application logs and escalating entries that are following a certain pattern (i.e. whose log level is ERROR). You can also look for specific error messages that you know are thrown when a specific error condition occurs.

Once you have monitoring in place you should be able to sleep better at night. At least as long as your monitoring doesn't wake you up reporting nuisance errors.

Wednesday, February 9, 2011

UAT and requirements gathering

In provisioning projects requirements gathering and UAT testing is always an interesting area. The general problem with UAT and system level testing is that in order for the testing to be useful you really need to know what result is the correct result and what is a failure.

When you write bespoke software you can usually find the answer to this question in the design which in turn is derived from the functional requirements which are derived from the business requirements. In a typical IAM implementation on the other hand you tend to use very feature rich platforms that provides huge amounts of functionality most of which you will never use in any given project. It is also often hard to shut down this "bonus functionality" so the functionality often is available even if it really shouldn't be used. Many times when you get to UAT the testers starts touching all kinds of buttons and levers and unless you have an experienced team that has spent substantial amount of time on actively locking things down you will have some issues with this.

Business process wise IAM implementations can fall on anywhere scale ranging from totally transformative to literal refactoring where the business process doesn't change at all. In the first case the challenge is that the UAT needs to reflect the new business process and you also need to ensure that the new business process actually supports all the functions that the business needs. In the second case things are easier because you basically just need to capture an already existing process.

No matter the strategic focus of the IAM project a proper UAT needs to be business process focused. The whole point of UAT is to ensure that all business critical processes are present and are working as expected or at least in a way that is acceptable. The irony is of course that if you discover substantial process breaks in the UAT you are usually in deep trouble as it usually is too late to fix things before the planned go live.

The best way to avoid this situation is to ensure that you do a form of dry run UAT very early in the process. In my experience one of the best ways to do this is to utilize simple Visio workflows. You start with the requirements, which usually are more or less useless, and create Visio workflows. You then show the workflows for the people that knows or should know the business process. These people can rarely tell you what is needed but they can usually tell you if you get it wrong or tell you things that you should take into considerations. After a couple of cycles you usually have a pretty good process and you have also gained the buy in from one of the stakeholders.

If your architect is experienced he or she can usually produce a pretty good set of flows that can act as a starting point. Most more sophisticated consultancy organizations will be able to provide a standard set of flows as well. If your implementation team starts coding without first establishing, vetting and communicating the business process it is time to start getting alarmed. The project may still deliver successfully but it will most likely have a very painful UAT or even worse a very painful go live in front of it.

Tuesday, February 8, 2011

Pulse 2011

I will be in a panel at Pulse 2011 so if you want to hear me and some very distinguished copanelists talk about IAM you can visit session 1925 Identity and Access Management 2-3 pm on Mon Feb 28 in room 123 MGM Conference Center level 1.

My talk will be about the IAM challenges that BCBS MA currently are facing in the IAM space which is largely the same thing as what I am writing about in this blog so if you like this blog you may enjoy meeting me at Pulse as well.