Monday, 5 September 2011

To DB or not to DB (part 1).



 My previous position was Data analytic in corporation from fortune 100. Since our era is data centric, you might imagine that successful companies have strong position in that area.. Not at all. I can compare my job with stamp collector. It was real hunting for data, gathering them into one fold, matching them with what you already have.  For example I knew about 5 different sources for countries list (that lists did not match)- and every department used its own! Same with product hierarchy, customer hierarchy, etc. Another problem was that data may come to you system through 4-5 another system (so think about delays, forget about online data and consider possible changes to original).
There were even separate project just to promote single customer identifier so all part of companies would use same id across all business processes - and took about 1 year to implement that.
Root of that evil? Databases! Well not as such, but that easiness to install, create and maintain your own database.  Thanks to MS - cost ownership was really low and administration was so easy, that even manager could handle it :) I am not sure about licensing - but it seems we had something like unlimited. So every new project would not bother to reuse existing database, instead it's created own db and multiple all that mess.
Another possible root cause – is project oriented approach which is cultivated in modern corporations. Let’s say one project implemented product hierarchy for example for engineering department. And after 1 year of completion of that project sales department decides to use that hierarchy in their web site. But in order to this sales people want to filter some products based on couple of attributes, which does not exists in initial hierarchy. Easiest way that could possible work – is to duplicate database, create some additional table or tables with that attributes. Right thing – is to add this attributes to initial table. But team which implemented hierarchy for engineering is long gone, only one support guy is left, who is afraid to do such changes.  And the project management simply does not have additional resource or political power to do changes in source system, but management still want local success with new project.  So in one minute bang!  - you have data duplicated and modified in second system.
Example with additional columns is not complex. Sometime you not only need to add new data attributes but you need to modify existing data structures. Project oriented approach will simply ignore any efforts to maintain data quality and to avoid data duplication.
All data a priory should be considered as enterprise level, no such thing is department level.  Refactoring of data structures is crucial, or you will end up with hundreds of databases entangled with one another.