Thursday, July 4, 2024

One Massive Cluster Caught: Knowledge Asset Standardization

Knowledge asset standardization is the purposeful and thoroughly deliberate consolidation of redundant, contradictory studies, processes, and databases into enterprise requirements. The proliferation of information belongings can have the best hostile influence on environmental well being; standardization has many well being advantages:

  • Reduces the probability that ill-constructed belongings take down processes, nodes, and clusters
  • Reduces rivalry and competitors for compute and storage
  • Reduces course of and repair failures and related troubleshooting effort
  • Reduces effort spent sustaining and supporting redundant belongings

Though the impacts of information asset standardization on environmental well being will be increased than some other class on this collection, the enterprise worth advantages exponentially outweigh them: customary information definitions, improved information governance, constant information interpretation, higher information trustworthiness, and improved data-driven resolution making. Ideally you’re realizing these advantages utilizing Cloudera Knowledge Catalog.  

Complete information standardization is a multiyear journey and sure pointless, however the low hanging fruit is ripe for the choosing. We strongly advocate embarking on this journey till returns diminish.

Report Standardization

Take these steps:

  1. Stock studies, together with possession, utilization statistics, and report frequency. 
  2. Goal for retirement any studies unused within the final 12 months, then within the final 6 months. Pay specific consideration to report frequency as low utilization of an annual report could also be applicable.
  3. Choose a report archival methodology commensurate along with your buyer partnership dynamic (we hope you’re not in information purgatory )
    1. Two weeks earlier than, per week earlier than, and the day of the archival, notify report house owners as to which studies you plan to archive, permitting them a grace interval to object and supply justification for the studies continued existence. 
    2. Conversely, archive them with out notification and restore a report when anybody shouts about it.
  4. Archive focused studies. In Tableau, we choose to easily assign report possession to a system consumer which prohibits additional use whereas enabling us to simply restore it if requested and justified.
  5. Repeat the train quarterly. In our expertise, 80-90% of reporting stock will be archived in as little as 2 quarters.
    1. In case your visualization device employs extract jobs, cease them, and be aware any database archival targets.
  6. Often examine the appropriateness of report refresh charges and negotiate.
  7. Over time, consolidate further belongings by grafting closely used report options and capabilities into enterprise customary dashboards then retire redundant legacy studies. Admittedly, that is troublesome and time consuming work often undertaken as a method to trusted information, not environmental well being. 

DB Standardization

  1. As earlier than, stock database belongings, possession, refresh frequency, and related utilization statistics. 
  2. Goal non permanent/testing databases and consumer databases owned by former FTE. 
  3. Talk far and vast. We’re not as courageous as to archive dbs with out notification and permission most often. We get pleasure from our jobs and need to hold them.
  4. Archive the databases. We often archive into a standard archival database. In our expertise, this could scale back 35-55% of manufacturing tables. 
  5. Often negotiate refresh charges and information retention insurance policies with database house owners.
  6. We strongly advocate taking the multiyear journey to standardize centralized information belongings into enterprise requirements as a lot as potential as it may considerably enhance information trustworthiness and correct data-driven resolution making.

Pipelines and Jobs Standardization

Database asset standardization will determine archival alternatives for (1) the pipeline stock, right here referring to processes which transfer information from one repository or supply to a different repository or curated dataset, in addition to (2) the roles stock, right here referring to queries which give views or persist information inside the surroundings. Standardizing processes is excessive effort with diminishing returns on environmental well being; subsequently, start with processes that:

  • Continuously fail
  • Are most important
  • Are most often up to date
  • Are essentially the most useful resource intensive

As all the time, for those who want help figuring out or executing information asset standardization, interact our Skilled Companies specialists. We did!

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles