Thursday, November 7, 2024

Ship decompressed Amazon CloudWatch Logs to Amazon S3 and Splunk utilizing Amazon Information Firehose

You need to use Amazon Information Firehose to mixture and ship log occasions out of your functions and companies captured in Amazon CloudWatch Logs to your Amazon Easy Storage Service (Amazon S3) bucket and Splunk locations, to be used circumstances similar to knowledge analytics, safety evaluation, utility troubleshooting and many others. By default, CloudWatch Logs are delivered as gzip-compressed objects. You may want the info to be decompressed, or need logs to be delivered to Splunk, which requires decompressed knowledge enter, for utility monitoring and auditing.

AWS launched a characteristic to help decompression of CloudWatch Logs in Firehose. With this new characteristic, you’ll be able to specify an choice in Firehose to decompress CloudWatch Logs. You now not must carry out extra processing utilizing AWS Lambda or post-processing to get decompressed logs, and might ship decompressed knowledge to Splunk. Moreover, you should use elective Firehose options similar to file format conversion to transform CloudWatch Logs to Parquet or ORC, and dynamic partitioning to mechanically group streaming data based mostly on keys within the knowledge (for instance, by month) and ship the grouped data to corresponding Amazon S3 prefixes.

On this publish, we take a look at tips on how to allow the decompression characteristic for Splunk and Amazon S3 locations. We begin with Splunk after which Amazon S3 for brand new streams, then we tackle migration steps to make the most of this characteristic and simplify your current pipeline.

Decompress CloudWatch Logs for Splunk

You need to use subscription filter in CloudWatch log teams to ingest knowledge on to Firehose or by way of Amazon Kinesis Information Streams.

Word: For the CloudWatch Logs decompression characteristic, you want a HTTP Occasion Collector (HEC) knowledge enter created in Splunk, with indexer acknowledgement enabled and the supply kind. That is required to map to the proper supply kind for the decompressed logs. When creating the HEC enter, embody the supply kind mapping (for instance, aws:cloudtrail).

To create a Firehose supply stream for the decompression characteristic, full the next steps:

  1. Present your vacation spot settings and choose Uncooked endpoint as endpoint kind.

You need to use a uncooked endpoint for the decompression characteristic to ingest each uncooked and JSON-formatted occasion knowledge to Splunk. For instance, VPC Move Logs knowledge is uncooked knowledge, and AWS CloudTrail knowledge is in JSON format.

  1. Enter the HEC token for Authentication token.
  2. To allow decompression characteristic, deselect Remodel supply data with AWS Lambda below Remodel data.
  3. Choose Activate decompression and Activate message extraction for Decompress supply data from Amazon CloudWatch Logs.
  4. Choose Activate message extraction for the Splunk vacation spot.

Message extraction characteristic

After decompression, CloudWatch Logs are in JSON format, as proven within the following determine. You possibly can see the decompressed knowledge has metadata info similar to logGroup, logStream, and subscriptionFilters, and the precise knowledge is included throughout the message subject below logEvents (the next instance exhibits an instance of CloudTrail occasions within the CloudWatch Logs).

If you allow message extraction, Firehose will extract simply the contents of the message fields and concatenate the contents with a brand new line between them, as proven in following determine. With the CloudWatch Logs metadata filtered out with this characteristic, Splunk will efficiently parse the precise log knowledge and map to the supply kind configured in HEC token.

Moreover, If you wish to ship these CloudWatch occasions to your Splunk vacation spot in actual time, you should use zero buffering, a brand new characteristic that was launched just lately in Firehose. You need to use this characteristic to arrange 0 seconds because the buffer interval or any time interval between 0–60 seconds to ship knowledge to the Splunk vacation spot in actual time inside seconds.

With these settings, now you can seamlessly ingest decompressed CloudWatch log knowledge into Splunk utilizing Firehose.

Decompress CloudWatch Logs for Amazon S3

The CloudWatch Logs decompression characteristic for an Amazon S3 vacation spot works just like Splunk, the place you’ll be able to flip off knowledge transformation utilizing Lambda and activate the decompression and message extraction choices. You need to use the decompression characteristic to write down the log knowledge as a textual content file to the Amazon S3 vacation spot or use with different Amazon S3 vacation spot options like file format conversion utilizing Parquet or ORC, or dynamic partitioning to partition the info.

Dynamic partitioning with decompression

For Amazon S3 vacation spot, Firehose helps dynamic partitioning, which lets you constantly partition streaming knowledge by utilizing keys inside knowledge, after which ship the info grouped by these keys into corresponding Amazon S3 prefixes. This lets you run high-performance, cost-efficient analytics on streaming knowledge in Amazon S3 utilizing companies similar to Amazon Athena, Amazon EMR, Amazon Redshift Spectrum, and Amazon QuickSight. Partitioning your knowledge minimizes the quantity of knowledge scanned, optimizes efficiency, and reduces prices of your analytics queries on Amazon S3.

With the brand new decompression characteristic, you’ll be able to carry out dynamic partitioning with none Lambda perform for mapping the partitioning keys on CloudWatch Logs. You possibly can allow the Inline parsing for JSON choice, scan the decompressed log knowledge, and choose the partitioning keys. The next screenshot exhibits an instance the place inline parsing is enabled for CloudTrail log knowledge with a partitioning schema chosen for account ID and AWS Area within the CloudTrail file.

File format conversion with decompression

For CloudWatch Logs knowledge, you should use the file format conversion characteristic on decompressed knowledge for Amazon S3 vacation spot. Firehose can convert the enter knowledge format from JSON to Apache Parquet or Apache ORC earlier than storing the info in Amazon S3. Parquet and ORC are columnar knowledge codecs that save house and allow quicker queries in comparison with row-oriented codecs like JSON. You need to use the options for file format conversion below the Remodel and convert data settings to transform the CloudWatch log knowledge to Parquet or ORC format. The next screenshot exhibits an instance of file format conversion settings for Parquet format utilizing an AWS Glue schema and desk for CloudTrail log knowledge. When the dynamic partitioning settings are configured, file format conversion works together with dynamic partitioning to create the information within the output format with a partition folder construction within the goal S3 bucket.

Migrate current supply streams for decompression

If you wish to migrate an current Firehose stream that makes use of Lambda for decompression to this new decompression characteristic of Firehose, seek advice from the steps outlined in Enabling and disabling decompression.

Pricing

The Firehose decompression characteristic decompress the info and fees per GB of decompressed knowledge. To know decompression pricing, seek advice from Amazon Information Firehose pricing.

Clear up

To keep away from incurring future fees, delete the assets you created within the following order:

  1. Delete the CloudWatch Logs subscription filter.
  2. Delete the Firehose supply stream.
  3. Delete the S3 buckets.

Conclusion

The decompression and message extraction characteristic of Firehose simplifies supply of CloudWatch Logs to Amazon S3 and Splunk locations with out requiring any code improvement or extra processing. For an Amazon S3 vacation spot, you should use Parquet or ORC conversion and dynamic partitioning capabilities on decompressed knowledge.

For extra info, seek advice from the next assets:


Concerning the Authors

Ranjit Kalidasan is a Senior Options Architect with Amazon Net Companies based mostly in Boston, Massachusetts. He’s a Associate Options Architect serving to safety ISV companions co-build and co-market options with AWS. He brings over 25 years of expertise in info know-how serving to world clients implement complicated options for safety and analytics. You possibly can join with Ranjit on LinkedIn.

Phaneendra Vuliyaragoli is a Product Administration Lead for Amazon Information Firehose at AWS. On this position, Phaneendra leads the product and go-to-market technique for Amazon Information Firehose.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles