AWS Glue is a serverless information integration service that makes it simpler to find, put together, and mix information for analytics, machine studying (ML), and software improvement. You need to use AWS Glue to create, run, and monitor information integration and ETL (extract, remodel, and cargo) pipelines and catalog your belongings throughout a number of information shops.
Some of the frequent questions we get from prospects is successfully optimize prices on AWS Glue. Through the years, we’ve got constructed a number of options and instruments to assist prospects handle their AWS Glue prices. For instance, AWS Glue Auto Scaling and AWS Glue Flex will help you cut back the compute price related to processing your information. AWS Glue interactive periods and notebooks will help you cut back the price of creating your ETL jobs. For extra details about cost-saving greatest practices, check with Monitor and optimize price on AWS Glue for Apache Spark. Moreover, to know information switch prices, check with the Value Optimization Pillar outlined in AWS Effectively-Architected Framework. For information storage, you possibly can apply common greatest practices outlined for every information supply. For a price optimization technique utilizing Amazon Easy Storage Service (Amazon S3), check with Optimizing storage prices utilizing Amazon S3.
On this publish, we sort out the remaining piece—the price of logs written by AWS Glue.
Earlier than we get into the fee evaluation of logs, let’s perceive the explanations to allow logging to your AWS Glue job and the present choices accessible. If you begin an AWS Glue job, it sends the real-time logging data to Amazon CloudWatch (each 5 seconds and earlier than every executor stops) through the Spark software begins operating. You may view the logs on the AWS Glue console or the CloudWatch console dashboard. These logs give you insights into your job runs and aid you optimize and troubleshoot your AWS Glue jobs. AWS Glue presents quite a lot of filters and settings to cut back the verbosity of your logs. Because the variety of job runs will increase, so does the quantity of logs generated.
To optimize CloudWatch Logs prices, AWS not too long ago introduced a brand new log class for occasionally accessed logs known as Amazon CloudWatch Logs Rare Entry (Logs IA). This new log class presents a tailor-made set of capabilities at a decrease price for occasionally accessed logs, enabling you to consolidate all of your logs in a single place in an economical method. This class offers a cheaper possibility for ingesting logs that solely have to be accessed sometimes for auditing or debugging functions.
On this publish, we clarify what the Logs IA class is, the way it will help cut back prices in comparison with the usual log class, and configure your AWS Glue assets to make use of this new log class. By routing logs to Logs IA, you possibly can obtain important financial savings in your CloudWatch Logs spend with out sacrificing entry to essential debugging data whenever you want it.
CloudWatch log teams utilized by AWS Glue job steady logging
When steady logging is enabled, AWS Glue for Apache Spark writes Spark driver/executor logs and progress bar data into the next log group:
If a safety configuration is enabled for CloudWatch logs, AWS Glue for Apache Spark will create a log group named as follows for steady logs:
The default and {custom} log teams can be as follows:
- The default steady log group can be /
aws-glue/jobs/logs-v2-<Safety-Configuration-Identify>
- The {custom} steady log group can be
<custom-log-group-name>-<Safety-Configuration-Identify>
You may present a {custom} log group title via the job parameter –continuous-log-logGroup.
Getting began with the brand new Rare Entry log class for AWS Glue workload
To achieve the advantages from Logs IA to your AWS Glue workloads, you should full the next two steps:
- Create a brand new log group utilizing the brand new Log IA class.
- Configure your AWS Glue job to level to the brand new log group
Full the next steps to create a brand new log group utilizing the brand new Rare Entry log class:
- On the CloudWatch console, select Log teams beneath Logs within the navigation pane.
- Select Create log group.
- For Log group title, enter
/aws-glue/jobs/logs-v2-infrequent-access.
- For Log class, select Rare Entry.
- Select Create.
Full the next steps to configure your AWS Glue job to level to the brand new log group:
- On the AWS Glue console, select ETL jobs within the navigation pane.
- Select your job.
- On the Job particulars tab, select Add new parameter beneath Job parameters.
- For Key, enter
--continuous-log-logGroup
. - For Worth, enter
/aws-glue/jobs/logs-v2-infrequent-access
. - Select Save.
- Select Run to set off the job.
New log occasions are written into the brand new log group.
View the logs with the Rare Entry log class
Now you’re able to view the logs with the Rare Entry log class. Open the log group /aws-glue/jobs/logs-v2-infrequent-access
on the CloudWatch console.
If you select one of many log streams, you’ll discover that it redirects you to the CloudWatch console Logs Perception web page with a pre-configured default command and your log stream chosen by default. By selecting Run question, you possibly can view the precise log occasions on the Logs Insights web page.
Concerns
Take into accout the next concerns:
- You can’t change the log class of a log group after it’s created. You might want to create a brand new log group to configure the Rare Entry class.
- The Logs IA class presents a subset of CloudWatch Logs capabilities, together with managed ingestion, storage, cross-account log analytics, and encryption with a decrease ingestion worth per GB. For instance, you possibly can’t view log occasions via the usual CloudWatch Logs console. To study extra concerning the options provided throughout each log lessons, check with Log Courses.
Conclusion
This publish supplied step-by-step directions to information you thru enabling Logs IA to your AWS Glue job logs. In case your AWS Glue ETL jobs generate massive volumes of log information that makes it a problem as you scale your functions, the very best practices demonstrated on this publish will help you cost-effectively scale whereas centralizing all of your logs in CloudWatch Logs. Begin utilizing the Rare Entry class along with your AWS Glue workloads right this moment and revel in the fee advantages.
Concerning the Authors
Noritaka Sekiyama is a Principal Massive Knowledge Architect on the AWS Glue staff. He works based mostly in Tokyo, Japan. He’s answerable for constructing software program artifacts to assist prospects. In his spare time, he enjoys biking on his highway bike.
Abeetha Bala is a Senior Product Supervisor for Amazon CloudWatch, primarily targeted on logs. Being buyer obsessed, she solves observability challenges via modern and cost-effective methods.
Kinshuk Pahare is a frontrunner in AWS Glue’s product administration staff. He drives efforts on the platform, developer expertise, and large information processing frameworks like Apache Spark, Ray, and Python Shell.