This is even more important when you realise that there is an issue in the patterns/framework and you need to recreate them. DVs are all about efficient build processes due to the shear volume of tables you will need to load. This is not a project where you want to be dragging and dropping, ETL has to be meta data driven. data modelers and ETL developers) have a solid understanding of the DV 2.0 concepts. Changing your mind later will be potentially very costly. Get your patterns and framework designed, tested and signed off up front.If your are going to capture all changes in the source system, within the raw vault, joining the data together in order to create values in the type 2 dimensions will be a nightmare if you haven’t decided on the grain of changes you want to present to end-users.In no particular order, these are some painful lessons learnt: Views can help to overcome this, however, they may not perform well. Querying a DV can be complex due to the level or normalisation and amount of history.Adds complexity when populating the Dims and Facts in the presentation layer..Does not provide added tangible business value.Takes longer to build than a simple staging area..Proliferation of tables therefore a larger number of ETL jobs to build..This means that you can create link records without doing a lookup to the relevant Hubs you are joining.ĭVs present some unique challenges due to the following Hash the business keys to generate a surrogate key.If you need a link satellite, keep the attributes to a minimum (i.e.Concatenate source system ID and source system business key to ensure uniqueness if required..Start building a model from business interviews, not from source systems.Use the date/time from the staging area as meta data, not from the source system..Place a measure or descriptive attribute on a hub or link..If you need to, keep it to a minimum, so as not to compromise the Data Vaults SOR responsibility. a type attribute on an address satellite). Building an overly complex model that links all reference tables..Having satellites on links if possible..When Hubs, Links and Satellites are linked together they form a logical Ensemble A satellite record can be updated to effectively end historic record.The key of a Satellite is made up of the surrogate key from the connecting Hub and the date time the record was loaded which enables tracking of all history from the source Satellite tables are where all the descriptive information about a Hub, or Link, are stored.It reflects the state of the record as it was first received. All links are many to many, therefore you do not need to change to accommodate changing relationships over time..Load date time, for when the record entered the data warehouse..Surrogate keys from each joining Hub that form that relationship.This is how one business subject area connects to another. Link tables form a relationship between Hubs.It reflects the state of the record as it was first received Any other meta data deemed necessary by the solution architect.. ![]() Load date time, for when the record entered the data warehouse.Hub tables contain the business key from the source system(s). A Hub table reflects a core business concept like Customer, Sale, Invoice, Employee etc.but they are beyond the scope of this post Hub There are other options like helpers, bridge, etc. Encourage direct access by business users (some data scientist types might stray in there)Īs mentioned above, the core objects are Hubs, Links and Satellites.Data Vault 2.0 (use of hash keys as the surrogate key, evolution of links, etc.).Business Vaults (Applied business rules and business key alignment).Raw Vaults (No business rules, auditable).The Data Vault approach has evolved in several areas: Allow data loading to commence before data modelling has finished (see Raw Vault).Base data modelling on three main objects (hubs, links, satellites) that create a logical Ensemble.Provide flexibility when a source schema change.Enable parallelism (loading data in parallel).It is not intended to dissuade or persuade you in the choice of using a Data Vault (DV), there are many of those posts already and typically the stray into fundamentalism … What is a data vaultĪ data vault (DV) is a layerwithin a data warehouse architecture that is designed to: This post is intended for those who are about to dip their toe into the world of Data Vaults.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |