Monday, July 8, 2024

Put together and cargo Amazon S3 knowledge into Teradata utilizing AWS Glue by means of its native connector for Teradata Vantage

On this put up, we discover how one can use the AWS Glue native connector for Teradata Vantage to streamline knowledge integrations and unlock the total potential of your knowledge.

Companies usually depend on Amazon Easy Storage Service (Amazon S3) for storing massive quantities of knowledge from varied knowledge sources in an economical and safe method. For these utilizing Teradata for knowledge evaluation, integrations by means of the AWS Glue native connector for Teradata Vantage unlock new prospects. AWS Glue enhances the pliability and effectivity of knowledge administration, permitting corporations to seamlessly combine their knowledge, no matter its location, with Teradata’s analytical capabilities. This new connector eliminates technical hurdles associated to configuration, safety, and administration, enabling corporations to effortlessly export or import their datasets into Teradata Vantage. In consequence, companies can focus extra on extracting significant insights from their knowledge, relatively than coping with the intricacies of knowledge integration.

AWS Glue is a serverless knowledge integration service that makes it simple for analytics customers to find, put together, transfer, and combine knowledge from a number of sources for analytics, machine studying (ML), and utility growth. With AWS Glue, you’ll be able to uncover and connect with greater than 100 various knowledge sources and handle your knowledge in a centralized knowledge catalog. You possibly can visually create, run, and monitor extract, remodel, and cargo (ETL) pipelines to load knowledge into your knowledge lakes.

Teradata Company is a number one linked multi-cloud knowledge platform for enterprise analytics, centered on serving to corporations use all their knowledge throughout an enterprise, at scale. As an AWS Knowledge & Analytics Competency associate, Teradata presents an entire cloud analytics and knowledge platform, together with for Machine Studying.

Introducing the AWS Glue native connector for Teradata Vantage

AWS Glue supplies assist for Teradata, accessible by means of each AWS Glue Studio and AWS Glue ETL scripts. With AWS Glue Studio, you profit from a visible interface that simplifies the method of connecting to Teradata and authoring, operating, and monitoring AWS Glue ETL jobs. For knowledge builders, this assist extends to AWS Glue ETL scripts, the place you should utilize Python or Scala to create and handle extra particular knowledge integration and transformation duties.

The AWS Glue native connector for Teradata Vantage permits you to effectively learn and write knowledge from Teradata with out the necessity to set up or handle any connector libraries. You possibly can add Teradata as each the supply and goal inside AWS Glue Studio’s no-code, drag-and-drop visible interface or use the connector immediately in an AWS Glue ETL script job.

Resolution overview

On this instance, you employ AWS Glue Studio to complement and add knowledge saved on Amazon S3 to Teradata Vantage. You begin by becoming a member of the Occasion and Venue information from the TICKIT dataset. Subsequent, you filter the outcomes to a single geographic area. Lastly, you add the refined knowledge to Teradata Vantage.

The TICKIT dataset tracks gross sales exercise for the fictional TICKIT web site, the place customers purchase and promote tickets on-line for sporting occasions, exhibits, and concert events. On this dataset, analysts can determine ticket motion over time, success charges for sellers, and best-selling occasions, venues, and seasons.

For this instance, you employ AWS Glue Studio to develop a visible ETL pipeline. This pipeline will learn knowledge from Amazon S3, carry out transformations, after which load the reworked knowledge into Teradata. The next diagram illustrates this structure.

Solution Overview

By the top of this put up, your visible ETL job will resemble the next screenshot.

Visual ETL Job Flow

Stipulations

For this instance, it is best to have entry to an present Teradata database endpoint with community reachability from AWS and permissions to create tables and cargo and question knowledge.

AWS Glue wants community entry to Teradata to learn or write knowledge. How that is configured will depend on the place your Teradata is deployed and the precise community configuration. For Teradata deployed on AWS, you may must configure VPC peering or AWS PrivateLink, safety teams, and community entry management lists (NACLs) to permit AWS Glue to speak with Teradata overt TCP. If Teradata is exterior AWS, networking companies reminiscent of AWS Website-to-Website VPN or AWS Direct Join could also be required. Public web entry isn’t really useful resulting from safety dangers. For those who select public entry, it’s safer to run the AWS Glue job in a VPC behind a NAT gateway. This strategy lets you permit listing just one IP deal with for incoming visitors in your community firewall. For extra data, seek advice from Infrastructure safety in AWS Glue.

Arrange Amazon S3

Each object in Amazon S3 is saved in a bucket. Earlier than you’ll be able to retailer knowledge in Amazon S3, you will need to create an S3 bucket to retailer the outcomes. Full the next steps:

  1. On the Amazon S3 console, select Buckets within the navigation pane.
  2. Select Create bucket.
  3. For Title, enter a globally distinctive identify in your bucket; for instance, tickit8530923.
  4. Select Create bucket.
  5. Obtain the TICKIT dataset and unzip it.
  6. Create the folder tickit in your S3 bucket and add the allevents_pipe.txt and venue_pipe.txt information.

Configure Teradata connections

To connect with Teradata from AWS Glue, see Configuring Teradata Connection.

You have to create and retailer your Teradata credentials in an AWS Secrets and techniques Supervisor secret after which affiliate that secret with a Teradata AWS Glue connection. We focus on these two steps in additional element later on this put up.

Create an IAM position for the AWS Glue ETL job

While you create the AWS Glue ETL job, you specify an AWS Id and Entry Administration (IAM) position for the job to make use of. The position should grant entry to all sources utilized by the job, together with Amazon S3 (for any sources, targets, scripts, driver information, and momentary directories) and Secrets and techniques Supervisor. For directions, see Configure an IAM position in your ETL job.

Create desk in Teradata

Utilizing your most well-liked database device, log in to Teradata. Run the next code to create the desk in Teradata the place you’ll load your knowledge:

CREATE MULTISET TABLE take a look at.tickit, FALLBACK
   (venueid varchar(25),
    venuename varchar(100),
    venuecity varchar(100),
    venuestate varchar(25),
    venueseats varchar(25),
    eventid varchar(25),
    catid varchar(25),
    dateid varchar(25),
    eventname varchar(100),
    starttime varchar(100))
    NO PRIMARY INDEX
;

Retailer Teradata login credentials

An AWS Glue connection is a Knowledge Catalog object that shops login credentials, URI strings, and extra. The Teradata connector requires Secrets and techniques Supervisor for storing the Teradata consumer identify and password that you just use to hook up with Teradata.

To retailer the Teradata consumer identify and password in Secrets and techniques Supervisor, full the next steps:

  1. On the Secrets and techniques Supervisor console, select Secrets and techniques within the navigation pane.
  2. Select Retailer a brand new secret.
  3. Choose Different kind of secret.
  4. Enter the important thing/worth USER and teradata_user, then select Add row.
  5. Enter the important thing/worth PASSWORD and teradata_user_password, then select Subsequent.

Teradata Secrets Manager Configuration

  1. For Secret identify, enter a descriptive identify, then select Subsequent.
  2. Select Subsequent to maneuver to the overview step, then select Retailer.

Create the Teradata connection in AWS Glue

Now you’re able to create an AWS Glue connection to Teradata. Full the next steps:

  1. On the AWS Glue console, select Connections below Knowledge Catalog within the navigation pane.
  2. Select Create connection.
  3. For Title, enter a reputation (for instance, teradata_connection).
  4. For Connection kind¸ select Teradata.
  5. For Teradata URL, enter jdbc:teradata://url_of_teradata/database=name_of_your_database.
  6. For AWS Secret, select the key along with your Teradata credentials that you just created earlier.

Teradata Connection access

Create an AWS Glue visible ETL job to rework and cargo knowledge to Teradata

Full the next steps to create your AWS Glue ETL job:

  1. On the AWS Glue console, below ETL Jobs within the navigation pane, select Visible ETL.
  2. Select Visible ETL.
  3. Select the pencil icon to enter a reputation in your job.

We add venue_pipe.txt as our first dataset.

  1. Select Add nodes and select Amazon S3 on the Sources tab.

Amazon S3 source node

  1. Enter the next knowledge supply properties:
    1. For Title, enter Venue.
    2. For S3 supply kind, choose S3 location.
    3. For S3 URL, enter the S3 path to venue_pipe.txt.
    4. For Knowledge format, select CSV.
    5. For Delimiter, select Pipe.
    6. Deselect First line of supply file incorporates column headers.

S3 data source properties

Now we add allevents_pipe.txt as our second dataset.

  1. Select Add nodes and select Amazon S3 on the Sources tab.
  2. Enter the next knowledge supply properties:
    1. For Title, enter Occasion.
    2. For S3 supply kind, choose S3 location.
    3. For S3 URL, enter the S3 path to allevents_pipe.txt.
    4. For Knowledge format, select CSV.
    5. For Delimiter, select Pipe.
    6. Deselect First line of supply file incorporates column headers.

Subsequent, we rename the columns of the Venue dataset.

  1. Select Add nodes and select Change Schema on the Transforms tab.
  2. Enter the next remodel properties:
    1. For Title, enter Rename Venue knowledge.
    2. For Node mother and father, select Venue.
    3. Within the Change Schema part, map the supply keys to the goal keys:
      1. col0: venueid
      2. col1: venuename
      3. col2: venuecity
      4. col3: venuestate
      5. col4: venueseats

Rename Venue data ETL Transform

Now we filter the Venue dataset to a selected geographic area.

  1. Select Add nodes and select Filter on the Transforms tab.
  2. Enter the next remodel properties:
    1. For Title, enter Location Filter.
    2. For Node mother and father, select Venue.
    3. For Filter situation, select venuestate for Key, select matches for Operation, and enter DC for Worth.

Location Filter Settings

Now we rename the columns within the Occasion dataset.

  1. Select Add nodes and select Change Schema on the Transforms tab.
  2. Enter the next remodel properties:
    1. For Title, enter Rename Occasion knowledge.
    2. For Node mother and father, select Occasion.
    3. Within the Change Schema part, map the supply keys to the goal keys:
      1. col0: eventid
      2. col1: e_venueid
      3. col2: catid
      4. col3: dateid
      5. col4: eventname
      6. col5: starttime

Subsequent, we be part of the Venue and Occasion datasets.

  1. Select Add nodes and select Be a part of on the Transforms tab.
  2. Enter the next remodel properties:
    1. For Title, enter Be a part of.
    2. For Node mother and father, select Location Filter and Rename Occasion knowledge.
    3. For Be a part of kind¸ select Inside be part of.
    4. For Be a part of circumstances, select venueid for Location Filter and e_venueid for Rename Occasion knowledge.

Join Properties

Now we drop the duplicate column.

  1. Select Add nodes and select Change Schema on the Transforms tab.
  2. Enter the next remodel properties:
    1. For Title, enter Drop column.
    2. For Node mother and father, select Be a part of.
    3. Within the Change Schema part, choose Drop for e_venueid .

Drop column properties

Subsequent, we load the info into the Teradata desk.

  1. Select Add nodes and select Teradata on the Targets tab.
  2. Enter the next knowledge sink properties:
    1. For Title, enter Teradata.
    2. For Node mother and father, select Drop column.
    3. For Teradata connection, select teradata_connection.
    4. For Desk identify, enter schema.tablename of the desk you created in Teradata.

Data sink properties Teradata

Lastly, we run the job and cargo the info into Teradata.

  1. Select Save, then select Run.

A banner will show that the job has began.

  1. Select Runs, which shows the standing of the job.

The run standing will change to Succeeded when the job is full.

Run Status

  1. Hook up with your Teradata after which question the desk the info was loaded to it.

The filtered and joined knowledge from the 2 datasets might be within the desk.

Filtered and joined data result

Clear up

To keep away from incurring further expenses attributable to sources created as a part of this put up, be sure to delete the objects you created within the AWS account for this put up:

  • The Secrets and techniques Supervisor key created for the Teradata credentials
  • The AWS Glue native connector for Teradata Vantage
  • The info loaded within the S3 bucket
  • The AWS Glue Visible ETL job

Conclusion

On this put up, you created a connection to Teradata utilizing AWS Glue after which created an AWS Glue job to rework and cargo knowledge into Teradata. The AWS Glue native connector for Teradata Vantage empowers your knowledge analytics journey by offering a seamless and environment friendly pathway for integrating your knowledge with Teradata. This new functionality in AWS Glue not solely simplifies your knowledge integration workflows but additionally opens up new avenues for superior analytics, enterprise intelligence, and machine studying improvements.

With the AWS Teradata Connector, you might have the very best device at your disposal for simplifying knowledge integration duties. Whether or not you’re seeking to load Amazon S3 knowledge into Teradata for analytics, reporting, or enterprise insights, this new connector streamlines the method, making it extra accessible and cost-effective.

To get began with AWS Glue, seek advice from Getting Began with AWS Glue.


In regards to the Authors

Kamen Sharlandjiev is a Sr. Massive Knowledge and ETL Options Architect and AWS Glue skilled. He’s on a mission to make life simpler for purchasers who’re going through complicated knowledge integration challenges. His secret weapon? Absolutely managed, low-code AWS companies that may get the job performed with minimal effort and no coding. Comply with Kamen on LinkedIn to maintain updated with the most recent AWS Glue information!

Sean Bjurstrom is a Technical Account Supervisor in ISV accounts at Amazon Net Companies, the place he focuses on analytics applied sciences and attracts on his background in consulting to assist prospects on their analytics and cloud journeys. Sean is obsessed with serving to companies harness the ability of knowledge to drive innovation and progress. Exterior of labor, he enjoys operating and has participated in a number of marathons.

Vinod Jayendra is an Enterprise Assist Lead in ISV accounts at Amazon Net Companies, the place he helps prospects remedy their architectural, operational, and cost-optimization challenges. With a selected deal with serverless applied sciences, he attracts from his in depth background in utility growth to assist prospects construct top-tier options. Past work, he finds pleasure in high quality household time, embarking on biking adventures, and training youth sports activities groups.

Doug Mbaya is a Senior Associate Resolution architect with a spotlight in analytics and machine studying. Doug works carefully with AWS companions and helps them combine their options with AWS analytics and machine studying options within the cloud.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles