addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscontroller-playcredit-cardcrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobe--smallglobegmailgooglegroupshelp-with-circleimageimagesinstagramFill 1languagelaunch-new-window--smalllight-bulblinklocation-pinlockm-swarmSearchmailmediummessagesminusmobilemoremuplabelShape 3 + Rectangle 1ShapeoutlookpersonJoin Group on CardStartprice-ribbonprintShapeShapeShapeShapeImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruserwarningyahooyoutube

Data Vault and DWH modeling interest group (Brisbane) Message Board › Comparisons between modelling techniques - Hybrid (including Data Vault)

Comparisons between modelling techniques - Hybrid (including Data Vault)

Roelant V.
user 75273612
Brisbane, AU
Post #: 6
This post provides an overview of the main pros and cons for various Data Modelling techniques:

  • Third Normal Form (3NF) - The Corporate Data Model.
  • Dimensional Modelling - Facts and Dimensions.
  • Hybrids - Best of both worlds? Data Vault, Anchor Modelling and similar.

Currently Hybrid techniques compete mainly with 3NF in the realm of the Enterprise or Corporate Data Warehouses. In these situations there is already consensus that a three-layered solution is required. Hybrid techniques are not meant to compete with pure reporting environments, except in some cases when run on MPPs where views can replace the Dimensional Model. In most cases a Hybrid model serves as core DWH model and is extended with a Dimensional Model as Presentation Layer or Data Mart.

The pros for Hybrids (e.g. Data Vault) are:

  • Completely auditable architecture.
  • DWH model is aligned with the business model.
  • Extremely adaptable to (business) changes.
  • Designed and optimized for the EDW.
  • Durable, consistent and predictable.
  • Consistency pays back over time.
  • Lends itself for real-time processing.
  • Simple and consistent.
  • Decoupling of key distribution enables a very high degree of parallelism (due to a reduction of ETL dependencies).
  • Easy to load to a Dimensional Model.
  • Layered isolation from change.
  • Can be incrementally built, easily extended. When properly modelled no rework is required when adding additional information to the core Data Warehouse model. If new information is added there is always work involved in the greater picture such as development of an interface or update of the view, report or Data Mart but no existing components of the core model require rework.
  • Scalability; provides for multi-terabyte storage.
  • Easier detection of 'dead data'.
  • Handles combinations of datasets with different arrival speeds (loading frequencies).
  • Isolated, flexible and incremental development (organic growth).

The downsides for Hybrids (e.g. Data Vault) are:

  • Scaling versus performance: lots of outer joins and tables in queries. Because Hybrid techniques allow denormalisation to a large extent this typically leads to more tables compared to a Dimensional Model (though not that more compared to a 3NF model). When building a Report, Cube or Dimension typically a lot of tables need to be joined to achieve the result. The counter argument is that Hybrid techniques are extremely consistent many of these tasks can be automated, either by using clever scripting or ETL automation. In a sense it's a lot of the same type of join but a very easy to understand one.
  • Additional storage of data. As with 3NF an Integration Layer / Core Data Warehouse layer is essentially a copy of the data which can be regarded as redundant.
  • Counter argument: Storage is cheap. Not all storage requires a database, other types of archives can be used as well. There always is a balance between this, performance and complexity. In this case, when weighting costs of additional storage, memory of performance against ease of maintenance, flexiblity and the other benefits (i.e. audit trail, ability to regenerate, no single point of failure etc.) it is clear to see that adding some additional hardware is by far the better option.
  • Not intended for ad-hoc end user access (including BI tools and OLAP). In other words: requires a Presentation Layer. A counter argument is that a Hub-and-Spoke architecture lends itself really well for Data Mining and in larger implementation Analytics / Data Scientist teams usually do have direct access. Another counter argument is that a Presentation Layer is implemented to remove the need for users to write complex queries and present data in an easily understandable structure.
  • Aging relationships. The counter argument is that there is no reason to remove older relationships as they still show the state of the system at a point in time and they can easily be removed from use in the Presentation Layer.
  • Currently not an ‘open’ platform as implementation guidelines are fairly hard to get hold off and/or get agreement on. There are no strict or formal freely available implementation guidelines. The conceptual side is fairly well documented. This may lead to different people having different views on the implementation side.
  • Does not provide solutions for Data Marts. The counter argument is that a Hybrid core Data Warehouse model is a perfect solution for the Data Staging concept in Dimensional Modelling and together they reduce some of the downsides of having a Dimensional Model.
  • Requires highly structured architect role.
  • A higher than average number of database and ETL objects. The number of objects in a DV vs. 3NF is approximately a 3:1 ratio at the beginning of the EDW build and approx. 2:1 for mature warehouses. Counter: Objects in DVs are generally smaller and easier to confederate.

Powered by mvnForum

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy