Monday, November 29, 2010

Initial load

When you plan a typical internal focused provisioning system roll out one of the problems that you have to solve is how to get the information about the already existing users and accounts loaded into your new and shiny IDM system. Lets talk a little bit about design patterns for solving this problem.

In most cases you start out with one or more sources of basic user identities. These are the canonical truths about who actually works for your company. In most cases it includes a human resources system of some kind. If this system is connected to payroll it tends to contain very good data as the employees tends to complain if they don't get payed and the company don't like to continue paying people that no longer works for the company. In some cases you will discover that the HR system is only linked to payroll in some parts of the world while other, i.e. the UK office, uses another system to feed payroll. This usually results in the data in the HR system being less well maintained which can cause serious issues.

In many cases there are entities that needs to go into the IDM system that aren't present in the HR system i.e. contractors. Getting hold of data about these users is often not easy but there may be a contractor database somewhere. Worst case you may have to settle with data from the security badging system or from the corporate active directory. Even when you find basic information about the contractors you will often discover that the data quality can be very bad. Information such as manager id, current status or end date may not necessarily be well maintained. If you for example are planning to send warning emails to the manager of the contractor it will not be good if all 200 contractors in the manufacturing division reports to the division VP.

Assuming that you managed to discover your base identities the next step is to identify what target system accounts belongs to the base identities. In a perfect world there should be a unique identifier in each target system account such as an employeeid that can be traced back to one and only one account in the trusted source (i.e. the HR system). In practice this is rarely true. In most cases some target system accounts contains the unique identifier while a large percentage will need to be linked using less exact methods such as email addresses or in worst case names. Name based linking can be very time consuming and there is also a substantial risk that you will end up with false matches.

There are many tools available that will make the linkage process and if your account volume is decently high you want to start by doing automated matching using some form of script or program. Once you have cleared out the obvious matches you may want to switch over to a manual process that utilizes a simple Excel sheet to match between trusted source accounts and target system accounts.

If you can use a divide and conquer approach and divide the trusted source accounts and the target system accounts into distinct buckets things gets much easier. Lets say you have 3000 unmatched trusted source accounts and 5000 unmatched target system accounts you want to investigate if you can divide the accounts into ten country buckets based on the country attribute in both the trusted source account and the target account. This will reduce the problem to ten instances of matching 200-400 trusted accounts to 300-700 target accounts which is a much easier problem to solve. This approach of course assumes that there is a suitable "bucketing" attribute available in the target system as well as in the trusted source.

To sum things up preparing for the initial load essentially consists of the following steps:

  1. Discovering the trusted sources that will provide the base identities
  2. Extracting the user data from the trusted sources
  3. Cleaning or at least evaluating the quality of the data contained in the trusted sources
  4. Discovering the target system accounts
  5. Linking the trusted identities to the target system accounts

If the data is in good shape this can be a quick process but if the data is bad it is not that uncommon that you need to spend 3-6 months on the data cleanup. It is therefor a good idea to include a data clean up and correlation thread in your IDM program that starts at the same time as you kick off the provisioning implementation project.

Thursday, November 18, 2010

Polyarchies in pratice

In A bridge(stream) too far I talked a little bit about using polyarchies to implement role mining. Lets take a closer look at this concept including a simple example.

Lets say you have a 1000 AD groups and 2000 users in your system and you would like to do some role mining in order to figure out if you could apply a role based approach to automatically grant the correct entitlements (AD groups) to the right user.

First you look at the information you have available about your users. You may find that you are able to place them in a number of different hierarchies. Lets start by looking at the location based hierarchy.

The company has ten locations and the locations can be organized in a country -> city -> building pattern. So for example the US has offices in two cities: Boston and LA. In LA there is a single location while Boston contains two locations.

Now sort the users according to their location. You may end up with something like:

USA total: 380 users
-> Boston total: 250 users
---> Boston FS (Federal Square): 150 users
---> Boston Haymarket: 100 users

->LA downtown: 130 users

Next step is to associate AD group membership with each site and sort them according to how many members that has that specific location exists in each group. The Boston Federal Square location may have the following groups:

Boston Distribution List 143 members
Boston FS Distribution List 140 members
Sales Distribution List 138 Members
Boston FS printers and fileshare 132 Members

Out of these it looks like Boston FS Distribution List and Boston FS printers and fileshares should be given to any users that have a Boston FS location. The Boston Distribution list could be checked against the parent node to see if it is also given to the Boston Haymarket users. If not then perhaps it is an additional group used for Boston FS.

The Sales Distribution List may be assigned through location but it looks more likely that it is tied to the functional hierarchy. It just happens that many sales people are based out of the Boston Federal Square office.

Doing this work by hand using Excel or a small database is very time consuming but it is fairly easy to automate this using Java or whatever is your favorite programming language.

You basically need:

  1. Extract your base user data out of the trusted source (often an HR csv file feed)
  2. Enumerate the unique values of suitable attributes (i.e. list all unique locations) that is present in the trusted source
  3. Extract the group memberships (JNDI is my favorite) as well as user identities from the target system
  4. Correlate the users form the trusted source and the target system
  5. Calculate the user population in each unique attribute value 
  6. Get the group memberships of the user population in 5
  7. Sort the groups according to the number of members
  8. Output the result in a user friendly format (Excel sheets works great) 
  9. Attach some kind of cut off value i.e. only list groups where at least 75% of the users in a particular location is a member
  10. Look at the results and pick the likely candidates
As always in role mining this is not an exact science but it will help you finding the groups that are associated with a particular attribute.

If you prefer the COTS approach there are lots of different options. In my opinion the Oracle Identity Analytics (ex Sun Role Manager, ex Vauu RBACx)  is a quite nice implementation. IBM has also included some capability in TIM 5.1 that is worth taking a closer look at if you are an IBM shop.

For further reading Oracle published a whitepaper on this subject this summer that is well worth reading.

Happy mining!

Thursday, November 11, 2010

New to OIM

Back in the bad old days when you started with Thor Xellerate (or a little bit later OIM) you usually got the three day Thor training workshop and then you basically had to figure out things on your own. There was some documentation but most things in the central OIM platform you had to basically reverse engineer and good decompiler and sniffer were your best friends.

Oracle has made great steps forward on the documentation side over the last few years and a lot of the material is available to anyone.

If you are new to OIM I would suggest starting with downloading the OIM install and taking a look at the documentation folder (9.1 release). I would start by reading through the concepts document as this will give you a good overview of what the OIM tool actually can do.

Next step is to implement a couple of the exercises in Oracle by Example. I would suggest the following:

  1. Install and configure OIM
  2. Flat file recon with GTC
  3. Database provisioning using GTC

This will give you an overview of the basic function of OIM, Once you are done with the basics you can continue to explore how to customize the user interface and creating your own connectors.

The Oracle IDM forum is another great resource and if you have a metalink account there is also a lot of good information in the support knowledge DB.

On a slightly more humorous note I suggest So You Want to be a Game Designer. The skill set that makes a good Game Designer is actually quite similar to the skill set you need as an access and identity designer although knowledge of world myths and world religions may be slightly more useful to the game designer (although both can benefit from knowing what Kerberos is).

Welcome and best of luck!

OIM Howto: Add process form child

There is a number of posts on this blog that talks about managing target system groups by manipulating child form contents.  In a recent OTN IDM discussion thread Deepika posted some useful example code so in order to make the previous posts on this subject more useful I thought it would be a good idea to link in the example.

public voidAddProcessChildData(long pKey){
   try {
      tcFormInstanceOperationsIntf f = (tcFormInstanceOperationsIntf)

      getUtility("Thor.API.Operations.tcFormInstanceOperationsIntf ");
      tcResultSet childFormDef = f.getChildFormDefinition(f.getProcessFormDefinitionKey(pKey),f.getProcessFormVersion(pKey));

      long childKey = childFormDef.getLongValue("Structure Utility.Child Tables.Child Key");
      //if there is only 1 child table fo rthe parent form else you need to iterate through result set

      Map attrChildData = new HashMap();
      String groupDN = "someValue";

      attrChildData .put("UD_ADUSRC_GROUPNAME",groupDN);
   }catch (Exception e){

Monday, November 8, 2010

A bridge(stream) too far

Over the years I have run into a number of products that had lots of good ideas but perhaps simply didn't have enough resources to implement them properly. One of these is Bridgestream Smartroles (aka Oracle Role Manager). This product is no longer with us having been trumped by Sun Role Manager (aka Vauu RBACx) but I really think that the product came with some good concepts that you can use in your IDM implementation no matter what product you are using.

Business roles vs IT roles

Roles are basically a tool to group entitlements together into manageable packets. If you ask an IT person and a business person what manageable packets are and how they should be named the two groups tend to disagree a lot.

One approach to solve this problem is to let the business have their business roles (teller level two) and let the IT guys build their own roles (two AD groups, logins to two applications configured with specific application roles). Then you just combine the IT roles into business roles and the business can then be asked to certify that Emma Smith the teller should have this business role. The fact that the business role actually results in three IT roles which in turn results in a bucket load of entitlements (AD groups etc) is not really relevant to the certification decision.

In reality things rarely works out this smoothly but I have found the approach useful.

Everything is temporal

One of the nifty features in Bridgestream was the temporal engine which made the eternal problem of the ever changing nature of everything.

In many role related IDM projects it is very easy to forget that everything including the roles and the entitlement has a lifecycle and will need to change as some point. Manging this without support in the base framework can get very complex so building in proper support for temporality is key to making maintenance cheap and easy.


Hierarchies are really useful when you want to organize users. One problem that you often run into is the fact that a specific user may be in a number of different hierarchies. One may be the geographical location, another may be the reporting chain and a third may be the kind of position you are in (i.e. embedded HR or IT departments that report into the business unit with dotted lines to the corporate HR or IT). It is literally impossible to capture all these relationships in a single hierarchy.

Bridgestream introduced, at least to me, the concept of polyarchy. Instead of trying to wrestle all these relationships into a single hierarchy you simply create multiple hierarchies where each hierarchy reflects one aspect of the users relationship with surrounding nodes. If you also are able to divide up the entitlements into buckets where each specific bucket is likely to be assigned due the users position in this specific hierarchy (a role called "Cambridge campus" or "Floor 3 - building 6 - Cambridge campus" are likely location based) you have a good tool that can reduce the complexity of the role discovery substantially.

There is a more expanded example of Polyarchys in action in the post Polyarchies in pratice