Thursday, November 21, 2024

Improve monitoring and debugging for AWS Glue jobs utilizing new job observability metrics, Half 3: Visualization and development evaluation utilizing Amazon QuickSight

In Half 2 of this collection, we mentioned the right way to allow AWS Glue job observability metrics and combine them with Grafana for real-time monitoring. Grafana supplies highly effective customizable dashboards to view pipeline well being. Nonetheless, to research tendencies over time, combination from totally different dimensions, and share insights throughout the group, a purpose-built enterprise intelligence (BI) instrument like Amazon QuickSight could also be more practical for what you are promoting. QuickSight makes it simple for enterprise customers to visualise information in interactive dashboards and reviews.

On this put up, we discover the right way to join QuickSight to Amazon CloudWatch metrics and construct graphs to uncover tendencies in AWS Glue job observability metrics. Analyzing historic patterns means that you can optimize efficiency, establish points proactively, and enhance planning. We stroll by ingesting CloudWatch metrics into QuickSight utilizing a CloudWatch metric stream and QuickSight SPICE. With this integration, you need to use line charts, bar charts, and different graph sorts to uncover every day, weekly, and month-to-month patterns. QuickSight allows you to carry out combination calculations on metrics for deeper evaluation. You may slice information by totally different dimensions like job identify, see anomalies, and share reviews securely throughout your group. With these insights, groups have the visibility to make information integration pipelines extra environment friendly.

Resolution overview

The next structure diagram illustrates the workflow to implement the answer.

The workflow contains the next steps:

  1. AWS Glue jobs emit observability metrics to CloudWatch metrics.
  2. CloudWatch streams metric information by a metric stream into Amazon Information Firehose.
  3. Information Firehose makes use of an AWS Lambda operate to remodel information and ingest the remodeled data into an Amazon Easy Storage Service (Amazon S3) bucket.
  4. An AWS Glue crawler scans information on the S3 bucket and populates desk metadata on the AWS Glue Information Catalog.
  5. QuickSight periodically runs Amazon Athena queries to load question outcomes to SPICE after which visualize the newest metric information.

All the assets are outlined in a pattern AWS Cloud Improvement Equipment (AWS CDK) template. You may deploy the end-to-end answer to visualise and analyze tendencies of the observability metrics.

Pattern AWS CDK template

This put up supplies a pattern AWS CDK template for a dashboard utilizing AWS Glue observability metrics.

Usually, you could have a number of accounts to handle and run assets in your information pipeline.

On this template, we assume the next accounts:

  • Monitoring account – This hosts the central S3 bucket, central Information Catalog, and QuickSight-related assets
  • Supply account – This hosts particular person information pipeline assets on AWS Glue and the assets to ship metrics to the monitoring account

The template works even when the monitoring account and supply account are the identical.

This pattern template consists of 4 stacks:

  • Amazon S3 stack – This provisions the S3 bucket
  • Information Catalog stack – This provisions the AWS Glue database, desk, and crawler
  • QuickSight stack – This provisions the QuickSight information supply, dataset, and evaluation
  • Metrics sender stack – This provisions the CloudWatch metric stream, Firehose supply stream, and Lambda operate for transformation

Conditions

It is best to have the next stipulations:

  • Python 3.9 or later
  • AWS accounts for the monitoring account and supply account
  • An AWS named profile for the monitoring account and supply account
  • The AWS CDK Toolkit 2.87.0 or later

Initialize the CDK mission

To initialize the mission, full the next steps:

  1. Clone the cdk template to your office:
    $ git clone git@github.com:aws-samples/aws-glue-cdk-baseline.git 
    
    $ cd aws-glue-cdk-baseline.git

  2. Create a Python digital atmosphere particular to the mission on the shopper machine:

We use a digital atmosphere in an effort to isolate the Python atmosphere for this mission and never set up software program globally.

  1. Activate the digital atmosphere in keeping with your OS:
    • On MacOS and Linux, use the next code:
      $ supply .venv/bin/activate

    • On a Home windows platform, use the next code:
      % .venvScriptsactivate.bat

After this step, the next steps run inside the bounds of the digital atmosphere on the shopper machine and work together with the AWS account as wanted.

  1. Set up the required dependencies described in necessities.txt to the digital atmosphere:
    $ pip set up -r necessities.txt

  2. Edit the configuration file default-config.yaml primarily based in your environments (change every account ID with your individual.
    create_s3_stack: false
    create_metrics_sender_stack: false
    create_catalog_stack: false
    create_quicksight_stack: true
    
    s3_bucket_name: glue-observability-demo-dashboard
    
    firehose_log_group_name: /aws/kinesisfirehose/observability-demo-metric-stream
    firehose_lambda_buffer_size_mb: 2
    firehose_lambda_buffer_interval_seconds: 60
    firehose_s3_buffer_size_mb: 128
    firehose_s3_buffer_interval_seconds: 300
    
    glue_database_name: observability_demo_db
    glue_table_name: metric_data
    glue_crawler_name: observability_demo_crawler
    glue_crawler_cron_schedule: "cron(42 * * * ? *)"
    
    athena_workgroup_name: main

Bootstrap your AWS environments

Run the next instructions to bootstrap your AWS environments:

  1. Within the monitoring account, present your monitoring account quantity, AWS Area, and monitoring profile:
    $ cdk bootstrap aws://<MONITORING-ACCOUNT-NUMBER>/<REGION> --profile <MONITORING-PROFILE> 
    --cloudformation-execution-policies arn:aws:iam::aws:coverage/AdministratorAccess

  2. Within the supply account, present your supply account quantity, Area, and supply profile:x
    $ cdk bootstrap aws://<SOURCE-ACCOUNT-NUMBER>/<REGION> --profile <SOURCE-PROFILE> 
    --cloudformation-execution-policies arn:aws:iam::aws:coverage/AdministratorAccess

Once you use just one account for all environments, you possibly can simply run thecdk bootstrapcommand one time.

Deploy your AWS assets

Run the next instructions to deploy your AWS assets:

  1. Run the next command utilizing the monitoring account to deploy assets outlined within the AWS CDK template:
    $ cdk deploy '*' --profile <MONITORING-PROFILE>

  2. Run the next command utilizing the supply account to deploy assets outlined within the AWS CDK template:
    $ cdk deploy MetricSenderStack --profile <SOURCE-PROFILE>

Configure QuickSight permissions

Initially, the brand new QuickSight assets together with the dataset and evaluation created by the AWS CDK template usually are not seen for you as a result of there are not any QuickSight permissions configured but.

To make the dataset and evaluation seen for you, full the next steps:

  1. On the QuickSight console, navigate to the consumer menu and select Handle QuickSight.
  2. Within the navigation pane, select Handle belongings.
  3. Underneath Browse belongings, select Evaluation.
  4. Seek for GlueObservabilityAnalysis, and choose it.
  5. Select SHARE.
  6. For Consumer or Group, choose your consumer, then select SHARE (1).
  7. Await the share to be full, then select DONE.
  8. On the Handle belongings web page, select Datasets.
  9. Seek for observability_demo.metrics_data, and choose it.
  10. Select SHARE.
  11. For Consumer or Group, choose your consumer, then select SHARE (1).
  12. Await the share to be full, then select DONE.

Discover the default QuickSight evaluation

Now your QuickSight evaluation and dataset are seen to you. You may return to the QuickSight console and select GlueObservabilityAnalysis below Evaluation. The next screenshot exhibits your dashboard.

The pattern evaluation has two tabs: Monitoring and Insights. By default, the Monitoring tab has the next charts:

  • [Reliability] Job Run Errors Breakdown
  • [Reliability] Job Run Errors (Whole)
  • [Performance] Skewness Job
  • [Performance] Skewness Job per Job

  • [Resource Utilization] Employee Utilization
  • [Resource Utilization] Employee Utilization per Job
  • [Throughput] BytesRead, RecordsRead, FilesRead, PartitionRead (Avg)
  • [Throughput] BytesWritten, RecordsWritten, FilesWritten (Avg)

  • [Resource Utilization Disk Available GB (Min)
  • [Resource Utilization Max Disk Used % (Max)

  • [Driver OOM] OOM Error Rely
  • [Driver OOM] Max Heap Reminiscence Used % (Max)
  • [Executor OOM] OOM Error Rely
  • [Executor OOM] Max Heap Reminiscence Used % (Max)

By default, the Insights tab has following insights:

  • Backside Ranked Employee Utilization
  • Prime Ranked Skewness Job

  • Forecast Employee Utilization
  • Prime Mover readBytes

You may add any new graph charts or insights utilizing the observability metrics primarily based in your necessities.

Publish the QuickSight dashboard

When the evaluation is prepared, full the next steps to publish the dashboard:

  1. Select PUBLISH.
  2. Choose Publish new dashboard as, and enter GlueObservabilityDashboard.
  3. Select Publish dashboard.

Then you possibly can view and share the dashboard.

Visualize and analyze with AWS Glue job observability metrics

Let’s use the dashboard to make AWS Glue utilization extra performant.

Wanting on the Skewness Job per Job visualization, there was spike on November 1, 2023. The skewness metrics of the job multistage-demo confirmed 9.53, which is considerably larger than others.

Let’s drill down into particulars. You may select Controls, and alter filter situations primarily based on date time, Area, AWS account ID, AWS Glue job identify, job run ID, and the supply and sink of the information shops. For now, let’s filter with the job identify multistage-demo.

The filtered Employee Utilization per Job visualization exhibits 0.5, and its minimal worth was 0.16. It looks as if that there’s a room for enchancment in useful resource utilization. This commentary guides you to allow auto scaling for this job to extend the employee utilization.

Clear up

Run the next instructions to wash up your AWS assets:

  1. Run the next command utilizing the monitoring account to wash up assets:
    $ cdk destroy '*' --profile <MONITORING-PROFILE>

    Run the next command utilizing the supply account to wash up assets:

    $ cdk destroy MetricSenderStack --profile <SOURCE-PROFILE>

Issues

QuickSight integration is designed for evaluation and higher flexibility. You may combination metrics primarily based on any fields. When coping with many roles without delay, QuickSight insights enable you to establish problematic jobs.

QuickSight integration is achieved with extra assets in your environments. The monitoring account wants an AWS Glue database, desk, crawler, and S3 bucket, and the power to run Athena queries to visualise metrics in QuickSight. Every supply account must have one metric stream and one Firehose supply stream. This could incur extra prices.

All of the required assets are templatized in AWS CDK.

Conclusion

On this put up, we explored the right way to visualize and analyze AWS Glue job observability metrics on QuickSight utilizing CloudWatch metric streams and SPICE. By connecting the brand new observability metrics to interactive QuickSight dashboards, you possibly can uncover every day, weekly, and month-to-month patterns to optimize AWS Glue job utilization. The wealthy visualization capabilities of QuickSight permit you to analyze tendencies in metrics like employee utilization, error classes, throughput, and extra. Aggregating metrics and slicing information by totally different dimensions akin to job identify can present deeper insights.

The pattern dashboard confirmed metrics over time, prime errors, and comparative job analytics. These visualizations and reviews could be securely shared with groups throughout the group. With data-driven insights on the AWS Glue observability metrics, you possibly can have deeper insights on efficiency bottlenecks, frequent errors, and extra.


Concerning the Authors

Noritaka Sekiyama is a Principal Huge Information Architect on the AWS Glue workforce. He’s accountable for constructing software program artifacts to assist prospects. In his spare time, he enjoys biking together with his new street bike.

Chuhan LiuChuhan Liu is a Software program Improvement Engineer on the AWS Glue workforce. He’s captivated with constructing scalable distributed programs for large information processing, analytics, and administration. In his spare time, he enjoys taking part in tennis.

XiaoRun Yu is a Software program Improvement Engineer on the AWS Glue workforce. He’s engaged on constructing new options for AWS Glue to assist prospects. Outdoors of labor, Xiaorun enjoys exploring new locations within the Bay Space.

Sean Ma is a Principal Product Supervisor on the AWS Glue workforce. He has a observe report of greater than 18 years innovating and delivering enterprise merchandise that unlock the facility of information for customers. Outdoors of labor, Sean enjoys scuba diving and faculty soccer.

Mohit Saxena is a Senior Software program Improvement Supervisor on the AWS Glue workforce. His workforce focuses on constructing distributed programs to allow prospects with interactive and easy to make use of interfaces to effectively handle and remodel petabytes of information seamlessly throughout information lakes on Amazon S3, databases and data-warehouses on cloud.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles