Many purchasers are extending their information warehouse capabilities to their information lake with Amazon Redshift. They want to additional improve their safety posture the place they will implement entry insurance policies on their information lakes primarily based on Amazon Easy Storage Service (Amazon S3). Moreover, they’re adopting safety fashions that require entry to the information lake by way of their personal networks.
Amazon Redshift Spectrum allows you to run Amazon Redshift SQL queries on information saved in Amazon S3. Redshift Spectrum makes use of the AWS Glue Knowledge Catalog as a Hive metastore. With a provisioned Redshift information warehouse, Redshift Spectrum compute capability runs from separate devoted Redshift servers owned by Amazon Redshift which might be impartial of your Redshift cluster. When enhanced VPC routing is enabled in your Redshift cluster, Redshift Spectrum connects from the Redshift VPC to an elastic community interface (ENI) in your VPC. As a result of it makes use of separate Redshift devoted clusters, to pressure all visitors between Redshift and Amazon S3 by way of your VPC, you could activate enhanced VPC routing and create a particular community path between your Redshift information warehouse VPC and S3 information sources.
When utilizing an Amazon Redshift Serverless occasion, Redshift Spectrum makes use of the identical compute capability as your serverless workgroup compute capability. To entry your S3 information sources from Redshift Serverless with out visitors leaving your VPC, you should use the improved VPC routing choice with out the necessity for any extra community configuration.
AWS Lake Formation presents a simple and centralized strategy to entry administration for S3 information sources. Lake Formation permits organizations to handle entry management for Amazon S3-based information lakes utilizing acquainted database ideas comparable to tables and columns, together with extra superior choices comparable to row-level and cell-level safety. Lake Formation makes use of the AWS Glue Knowledge Catalog to offer entry management for Amazon S3.
On this publish, we show the way to configure your community for Redshift Spectrum to make use of a Redshift provisioned cluster’s enhanced VPC routing to entry Amazon S3 information by way of Lake Formation entry management. You possibly can arrange this integration in a non-public community with no connectivity to the web.
Resolution overview
With this answer, community visitors is routed by way of your VPC by enabling Amazon Redshift enhanced VPC routing. This routing choice prioritizes the VPC endpoint as the primary route precedence over an web gateway, NAT occasion, or NAT gateway. To forestall your Redshift cluster from speaking with sources exterior of your VPC, it’s essential to take away all different routing choices. This ensures that each one communication is routed by way of the VPC endpoints.
The next diagram illustrates the answer structure.
The answer consists of the next steps:
- Create a Redshift cluster in a non-public subnet community configuration:
- Allow enhanced VPC routing in your Redshift cluster.
- Modify the route desk to make sure no connectivity to the general public community.
- Create the next VPC endpoints for Redshift Spectrum connectivity:
- AWS Glue interface endpoint.
- Lake Formation interface endpoint.
- Amazon S3 gateway endpoint.
- Analyze Amazon Redshift connectivity and community routing:
- Confirm community routes for Amazon Redshift in a non-public community.
- Confirm community connectivity from the Redshift cluster to varied VPC endpoints.
- Check connectivity utilizing the Amazon Redshift question editor v2.
This integration makes use of VPC endpoints to ascertain a non-public connection out of your Redshift information warehouse to Lake Formation, Amazon S3, and AWS Glue.
Conditions
To arrange this answer, You want fundamental familiarity with the AWS Administration Console, an AWS account, and entry to the next AWS providers:
Moreover, you need to have built-in Lake Formation with Amazon Redshift to entry your S3 information lake in non-private community. For directions, confer with Centralize governance in your information lake utilizing AWS Lake Formation whereas enabling a contemporary information structure with Amazon Redshift Spectrum.
Create a Redshift cluster in a non-public subnet community configuration.
Step one is to configure your Redshift cluster to solely permit community visitors by way of your VPC and stop any public routes. To perform this, you need to allow enhanced VPC routing in your Redshift cluster. Full the next steps:
- On the Amazon Redshift console, navigate to your cluster.
- Edit your community and safety settings.
- For Enhanced VPC routing, choose Activate.
- Disable the Publicly accessible choice.
- Select Save modifications and modify the cluster to use the updates. You now have a Redshift cluster that may solely talk by way of the VPC. Now you possibly can modify the route desk to make sure no connectivity to the general public community.
- On the Amazon Redshift console, make a remark of the subnet group and establish the subnet related to this subnet group.
- On the Amazon VPC console, establish the route desk related to this subnet and edit to take away the default path to the NAT gateway.
In case you cluster is in a public subnet, you will have to take away the web gateway route. If subnet is shared amongst different sources, it might affect their connectivity.
Your cluster is now in a non-public community and may’t talk with any sources exterior of your VPC.
Create VPC endpoints for Redshift Spectrum connectivity
After you configure your Redshift cluster to function inside a non-public community with out exterior connectivity, you could set up connectivity to the next providers by way of VPC endpoints:
- AWS Glue
- Lake Formation
- Amazon S3
Create an AWS Glue endpoint
To start with, Redshift Spectrum connects to AWS Glue endpoints to retrieve data from the AWS Knowledge Glue Catalog. To create a VPC endpoint for AWS Glue, full the next steps:
- On the Amazon VPC console, select Endpoints within the navigation pane.
- Select Create endpoint.
- For Title tag, enter an non-compulsory title.
- For Service class, choose AWS providers.
- Within the Providers part, seek for and choose your AWS Glue interface endpoint.
- Select the suitable VPC and subnets in your endpoint.
- Configure the safety group settings and overview your endpoint settings.
- Select Create endpoint to finish the method.
After you create the AWS Glue VPC endpoint, Redshift Spectrum will be capable to retrieve data from the AWS Glue Knowledge Catalog inside your VPC.
Create a Lake Formation endpoint
Repeat the identical course of to create a Lake Formation endpoint:
- On the Amazon VPC console, select Endpoints within the navigation pane.
- Select Create endpoint.
- For Title tag, enter an non-compulsory title.
- For Service class, choose AWS providers.
- Within the Providers part, seek for and choose your Lake Formation interface endpoint.
- Select the suitable VPC and subnets in your endpoint.
- Configure the safety group settings and overview your endpoint settings.
- Select Create endpoint.
You now have connectivity for Amazon Redshift to Lake Formation and AWS Glue, which lets you retrieve the catalog and validate permissions on the information lake.
Create an Amazon S3 endpoint
The subsequent step is to create a VPC endpoint for Amazon S3 to allow Redshift Spectrum to entry information saved in Amazon S3 by way of VPC endpoints:
- On the Amazon VPC console, select Endpoints within the navigation pane.
- Select Create endpoint.
- For Title tag, enter an non-compulsory title.
- For Service class, choose AWS providers.
- Within the Providers part, seek for and choose your Amazon S3 gateway endpoint.
- Select the suitable VPC and subnets in your endpoint.
- Configure the safety group settings and overview your endpoint settings.
- Select Create endpoint.
With the creation of the VPC endpoint for Amazon S3, you have got accomplished all needed steps to make sure that your Redshift cluster can privately talk with the required providers by way of VPC endpoints inside your VPC.
It’s necessary to make sure that the safety teams connected to the VPC endpoints are correctly configured, as a result of an incorrect inbound rule may cause your connection to timeout. Confirm that the safety group inbound guidelines are accurately set as much as permit needed visitors to move by way of the VPC endpoint.
Analyze visitors and community topology
You should utilize the next strategies to confirm the community paths from Amazon Redshift to different endpoints.
Confirm community routes for Amazon Redshift in a non-public community
You should utilize an Amazon VPC useful resource map to visualise Amazon Redshift connectivity. The useful resource map exhibits the interconnections between sources inside a VPC and the movement of visitors between subnets, NAT gateways, web gateways, and gateway endpoints. As proven within the following screenshot, the highlighted subnet the place the Redshift cluster is operating doesn’t have connectivity to a NAT gateway or web gateway. The route desk related to the subnet can attain out to Amazon S3 by way of VPC endpoint solely.
Observe that AWS Glue and Lake Formation endpoints are interface endpoints and never seen on a useful resource map.
Confirm community connectivity from the Redshift cluster to varied VPC endpoints
You possibly can confirm connectivity out of your Redshift cluster subnet to all VPC endpoints utilizing the Reachability Analyzer. The Reachability Analyzer is a configuration evaluation device that allows you to carry out connectivity testing between a supply useful resource and a vacation spot useful resource in your VPCs. Full the next steps:
- On the Amazon Redshift console, navigate to the Redshift cluster configuration web page and notice the inner IP tackle.
- On the Amazon EC2 console, seek for your ENI by filtering by the IP tackle.
- Select the ENI related together with your Redshift cluster and select Run Reachability Analyzer.
- For Supply kind, select Community interfaces.
- For Supply, select the Redshift ENI.
- For Vacation spot kind, select VPC endpoints.
- For Vacation spot, select your VPC endpoint.
- Select Create and analyze path.
- When evaluation is full, view the evaluation to see reachability.
As proven within the following screenshot, the Redshift cluster has connectivity to the Lake Formation endpoint.
You possibly can repeat these steps to confirm community reachability for all different VPC endpoints.
Check connectivity by operating a SQL question from the Amazon Redshift question editor v2
You possibly can confirm connectivity by operating a SQL question together with your Redshift Spectrum desk utilizing the Amazon Redshift question editor, as proven within the following screenshot.
Congratulations! You’ll be able to efficiently question from Redshift Spectrum tables from a provisioned cluster whereas enhanced VPC routing is enabled for visitors to remain inside your AWS community.
Clear up
You need to clear up the sources you created as a part of this train to keep away from pointless value to your AWS account. Full the next steps:
- On the Amazon VPC console, select Endpoints within the navigation pane.
- Choose the endpoints you created and on the Actions menu, select Delete VPC endpoints.
- On the Amazon Redshift console, navigate to your Redshift cluster.
- Edit the cluster community and safety settings and choose Flip off for Enhanced VPC routing.
- You can too delete your Amazon S3 information and Redshift cluster if you’re not planning to make use of them additional.
Conclusion
By transferring your Redshift information warehouse to a non-public community setting and enabling enhanced VPC routing, you possibly can improve the safety posture of your Redshift cluster by limiting entry to solely approved networks.
We wish to acknowledge our fellow AWS colleagues Harshida Patel, Fabricio Pinto, and Soumyajeet Patra for offering their insights with this weblog publish.
If in case you have any questions or options, go away your suggestions within the feedback part. In case you want additional help with securing your S3 information lakes and Redshift information warehouses, contact your AWS account workforce.
Extra sources
Concerning the Authors
Kanwar Bajwa is an Enterprise Help Lead at AWS who works with clients to optimize their use of AWS providers and obtain their enterprise aims.
Swapna Bandla is a Senior Options Architect within the AWS Analytics Specialist SA Crew. Swapna has a ardour in the direction of understanding clients information and analytics wants and empowering them to develop cloud-based well-architected options. Exterior of labor, she enjoys spending time along with her household.