Thursday, July 4, 2024

Simplify entry administration with Amazon Redshift and AWS Lake Formation for customers in an Exterior Identification Supplier

Many organizations use identification suppliers (IdPs) to authenticate customers, handle their attributes, and group memberships for safe, environment friendly, and centralized identification administration. You is perhaps modernizing your knowledge structure utilizing Amazon Redshift to allow entry to your knowledge lake and knowledge in your knowledge warehouse, and are on the lookout for a centralized and scalable technique to outline and handle the information entry primarily based on IdP identities. AWS Lake Formation makes it easy to centrally govern, safe, and globally share knowledge for analytics and machine studying (ML). Presently, you will have to map consumer identities and teams to AWS Identification and Entry Administration (IAM) roles, and knowledge entry permissions are outlined on the IAM position degree inside Lake Formation. This setup will not be environment friendly as a result of organising and sustaining IdP teams with IAM position mapping as new teams are created is time consuming and it makes it tough to derive what knowledge was accessed from which service at the moment.

Amazon Redshift, Amazon QuickSight, and Lake Formation now combine with the brand new trusted identification propagation functionality in AWS IAM Identification Heart to authenticate customers seamlessly throughout providers. On this put up, we focus on two use circumstances to configure trusted identification propagation with Amazon Redshift and Lake Formation.

Answer overview

Trusted identification propagation offers a brand new authentication possibility for organizations that wish to centralize knowledge permissions administration and authorize requests primarily based on their IdP identification throughout service boundaries. With IAM Identification Heart, you possibly can configure an current IdP to handle customers and teams and use Lake Formation to outline fine-grained entry management permissions on catalog assets for these IdP identities. Amazon Redshift helps identification propagation when querying knowledge with Amazon Redshift Spectrum and with Amazon Redshift Information Sharing, and you need to use AWS CloudTrail to audit knowledge entry by IdP identities to assist your group meet their regulatory and compliance necessities.

With this new functionality, customers can connect with Amazon Redshift from QuickSight with a single sign-on expertise and create direct question datasets. That is enabled by utilizing IAM Identification Heart as a shared identification supply. With trusted identification propagation, when QuickSight belongings like dashboards are shared with different customers, the database permissions of every QuickSight consumer are utilized by propagating their end-user identification from QuickSight to Amazon Redshift and imposing their particular person knowledge permissions. Relying on the use case, the creator can apply extra row-level and column-level safety in QuickSight.

The next diagram illustrates an instance of the answer structure.

On this put up, we stroll by way of tips on how to configure trusted identification propagation with Amazon Redshift and Lake Formation. We cowl the next use circumstances:

  • Redshift Spectrum with Lake Formation
  • Redshift knowledge sharing with Lake Formation

Stipulations

This walkthrough assumes you have got arrange a Lake Formation administrator position or an identical position to comply with together with the directions on this put up. To be taught extra about organising permissions for an information lake administrator, see Create an information lake administrator.

Moreover, you need to create the next assets as detailed in Combine Okta with Amazon Redshift Question Editor V2 utilizing AWS IAM Identification Heart for seamless Single Signal-On:

  • An Okta account built-in with IAM Identification Heart to sync customers and teams
  • A Redshift managed utility with IAM Identification Heart
  • A Redshift supply cluster with IAM Identification Heart integration enabled
  • A Redshift goal cluster with IAM Identification Heart integration enabled (you possibly can skip the part to arrange Amazon Redshift role-based entry)
  • Customers and teams from IAM Identification Heart assigned to the Redshift utility
  • A permission set assigned to AWS accounts to allow Redshift Question Editor v2 entry
  • Add the beneath permission to the IAM position utilized in Redshift managed utility for integration with IAM Identification Heart.
    {
        "Model": "2012-10-17",
        "Assertion": [
            {
                "Effect": "Allow",
                "Action": [
                    "lakeformation:GetDataAccess",
                    "glue:GetTable",
                    "glue:GetTables",
                    "glue:SearchTables",
                    "glue:GetDatabase",
                    "glue:GetDatabases",
                    "glue:GetPartitions",
                    "lakeformation:GetResourceLFTags",
                    "lakeformation:ListLFTags",
                    "lakeformation:GetLFTag",
                    "lakeformation:SearchTablesByLFTags",
                    "lakeformation:SearchDatabasesByLFTags"
               ],
                "Useful resource": "*"
            }
        ]
    }

Use case 1: Redshift Spectrum with Lake Formation

This use case assumes you have got the next conditions:

  1. Log in to the AWS Administration Console as an IAM administrator.
  2. Go to CloudShell or your AWS CLI and run the next AWS CLI command, offering your bucket title to repeat the information:
aws s3 sync s3://redshift-demos/knowledge/NY-Pub/ s3://<bucketname>/knowledge/NY-Pub/

On this put up, we use an AWS Glue crawler to create the exterior desk ny_pub saved in Apache Parquet format within the Amazon S3 location s3://<bucketname>/knowledge/NY-Pub/. Within the subsequent step, we create the answer assets utilizing AWS CloudFormation to create a stack named CrawlS3Source-NYTaxiData in us-east-1.

  1. Obtain the .yml file or launch the CloudFormation stack.

The stack creates the next assets:

  • The crawler NYTaxiCrawler together with the brand new IAM position AWSGlueServiceRole-RedshiftAutoMount
  • The AWS Glue database automountdb

When the stack is full, proceed with the next steps to complete organising your assets:

  1. On the AWS Glue console, underneath Information Catalog within the navigation pane, select Crawlers.
  2. Open NYTaxiCrawler and select Edit.
  1. Below Select knowledge sources and classifiers, select Edit.
  1. For Information supply, select S3.
  2. For S3 path, enter s3://<bucketname>/knowledge/NY-Pub/.
  3. Select Replace S3 knowledge supply.
  1. Select Subsequent and select Replace.
  2. Select Run crawler.

After the crawler is full, you possibly can see a brand new desk referred to as ny_pub within the Information Catalog underneath the automountdb database.

After you create the assets, full the steps within the subsequent sections to arrange Lake Formation permissions on the AWS Glue desk ny_pub for the gross sales IdP group and entry them through Redshift Spectrum.

Allow Lake Formation propagation for the Redshift managed utility

Full the next steps to allow Lake Formation propagation for the Redshift managed utility created in Combine Okta with Amazon Redshift Question Editor V2 utilizing AWS IAM Identification Heart for seamless Single Signal-On:

  1. Log in to the console as admin.
  2. On the Amazon Redshift console, select IAM Identification Heart connection within the navigation pane.
  3. Choose the managed utility that begins with redshift-iad and select Edit.
  1. Choose Allow AWS Lake Formation entry grants underneath Trusted identification propagation and save your modifications.

Arrange Lake Formation as an IAM Identification Heart utility

Full the next steps to arrange Lake Formation as an IAM Identification Heart utility:

  1. On the Lake Formation console, underneath Administration within the navigation pane, select IAM Identification Heart integration.
  1. Evaluate the choices and select Submit to allow Lake Formation integration.

The mixing standing will replace to Success.
Alternatively, you possibly can run the next command:

aws lakeformation create-lake-formation-identity-center-configuration 
--cli-input-json '{"CatalogId": "<catalog_id>","InstanceArn": "<identitycenter_arn>"}'

Register the information with Lake Formation

On this part, we register the information with Lake Formation. Full the next steps:

  1. On the Lake Formation console, underneath Administration within the navigation pane, select Information lake places.
  2. Select Register location.
  3. For Amazon S3 path, enter the bucket the place the desk knowledge resides (s3://<bucketname>/knowledge/NY-Pub/).
  4. For IAM position, select a Lake Formation user-defined position. For extra data, seek advice from Necessities for roles used to register places.
  5. For Permission mode, choose Lake Formation.
  6. Select Register location.

Subsequent, confirm that the IAMAllowedPrincipal group doesn’t have permission on the database.

  1. On the Lake Formation console, underneath Information catalog within the navigation pane, select Databases.
  2. Choose automountdb and on the Actions menu, select View permissions.
  3. If IAMAllowedPrincipal is listed, choose the principal and select Revoke.
  4. Repeat these steps to confirm permissions for the desk ny_pub.

Grant the IAM Identification Heart group permissions on the AWS Glue database and desk

Full the next steps to grant database permissions to the IAM Identification Heart group:

  1. On the Lake Formation console, underneath Information catalog within the navigation pane, select Databases.
  2. Choose the database automountdb and on the Actions menu, select Grant.
  3. Select Grant database.
  4. Below Principals, choose IAM Identification Heart and select Add.
  5. Within the pop-up window, if that is the primary time assigning customers and teams, select Get began.
  6. Enter the IAM Identification Heart group within the search bar and select the group.
  7. Select Assign.
  8. Below LF-Tags or catalog assets, automountdb is already chosen for Databases.
  9. Choose Describe for Database permissions.
  10. Select Grant to use the permissions.

Alternatively, you possibly can run the next command:

aws lakeformation grant-permissions --cli-input-json '
{
    "Principal": {
        "DataLakePrincipalIdentifier": "arn:aws:identitystore:::group/<identitycenter_group_name>"
    },
    "Useful resource": {
        "Database": {
            "Identify": "automountdb"
        }
    },
    "Permissions": [
        "DESCRIBE"
    ]
}'

Subsequent, you grant desk permissions to the IAM Identification Heart group.

  1. Below Information catalog within the navigation pane, select Databases.
  2. Choose the database automountdb and on the Actions menu, select Grant.
  3. Below Principals, choose IAM Identification Heart and select Add.
  4. Enter the IAM Identification Heart group within the search bar and select the group.
  5. Select Assign.
  6. Below LF-Tags or catalog assets, automountdb is already chosen for Databases.
  7. For Tables, select ny_pub.
  8. Choose Describe and Choose for Desk permissions.
  9. Select Grant to use the permissions.

Alternatively, you possibly can run the next command:

aws lakeformation grant-permissions --cli-input-json '
{
    "Principal": {
        "DataLakePrincipalIdentifier": "arn:aws:identitystore:::group/<identitycenter_group_name>"
    },
    "Useful resource": {
        "Desk": {
            "DatabaseName": "automountdb",
            "Identify": "ny_pub "
        }
    },
    "Permissions": [
        "SELECT",
        "DESCRIBE"

    ]
}'

Arrange Redshift Spectrum desk entry for the IAM Identification Heart group

Full the next steps to arrange Redshift Spectrum desk entry:

  1. Check in to the Amazon Redshift console utilizing the admin position.
  2. Navigate to Question Editor v2.
  3. Select the choices menu (three dots) subsequent to the cluster and select Create connection.
  4. Join because the admin consumer and run the next instructions to make the ny_pub knowledge within the S3 knowledge lake accessible to the gross sales group:
    create exterior schema if not exists nyc_external_schema from DATA CATALOG database 'automountdb' catalog_id '<accountid>'; 
    grant utilization on schema nyc_external_schema to position "awsidc:awssso-sales"; 
    grant choose on all tables in schema nyc_external_schema to position "awsidc:awssso- gross sales";

Validate Redshift Spectrum entry as an IAM Identification Heart consumer

Full the next steps to validate entry:

  1. On the Amazon Redshift console, navigate to Question Editor v2.
  2. Select the choices menu (three dots) subsequent to the cluster and select Create connection
  3. Select choose IAM Identification Heart possibility for Join possibility. Present Okta consumer title and password within the browser pop-up.
  4. As soon as related as a federated consumer, run the next SQL instructions to question the ny_pub knowledge lake desk:
choose * from nyc_external_schema.ny_pub;

Use Case 2: Redshift knowledge sharing with Lake Formation

This use case assumes you have got IAM Identification Heart integration with Amazon Redshift arrange, with Lake Formation propagation enabled as per the directions supplied within the earlier part.

Create an information share with objects and share it with the Information Catalog

Full the next steps to create an information share:

  1. Check in to the Amazon Redshift console utilizing the admin position.
  2. Navigate to Question Editor v2.
  3. Select the choices menu (three dots) subsequent to the Redshift supply cluster and select Create connection.
  4. Join as admin consumer utilizing Briefly credentials utilizing a database consumer title possibility and run the next SQL instructions to create an information share:
    CREATE DATASHARE salesds; 
    ALTER DATASHARE salesds ADD SCHEMA sales_schema; 
    ALTER DATASHARE salesds ADD TABLE store_sales; 
    GRANT USAGE ON DATASHARE salesds TO ACCOUNT ‘<accountid>’ through DATA CATALOG;

  5. Authorize the information share by selecting Information shares within the navigation web page and choosing the information share salesdb.
  6. Choose the information share and select Authorize.

Now you possibly can register the information share in Lake Formation as an AWS Glue database.

  1. Check in to the Lake Formation console as the information lake administrator IAM consumer or position.
  2. Below Information catalog within the navigation pane, select Information sharing and look at the Redshift knowledge share invites on the Configuration tab.
  3. Choose the datashare salesds and select Evaluate Invitation.
  4. When you overview the small print select Settle for.
  5. Present a reputation for the AWS Glue database (for instance, salesds) and select Skip to Evaluate and create.

After the AWS Glue database is created on the Redshift knowledge share, you possibly can view it underneath Shared databases.

Grant the IAM Identification Heart consumer group permission on the AWS Glue database and desk

Full the next steps to grant database permissions to the IAM Identification Heart group:

  1. On the Lake Formation console, underneath Information catalog within the navigation pane, select Databases.
  2. Choose the database salesds and on the Actions menu, select Grant.
  3. Select Grant database.
  4. Below Principals, choose IAM Identification Heart and select Add.
  5. Within the pop-up window, enter the IAM Identification Heart group awssso within the search bar and select the awssso-sales group.
  6. Select Assign.
  7. Below LF-Tags or catalog assets, salesds is already chosen for Databases.
  8. Choose Describe for Database permissions.
  9. Select Grant to use the permissions.

Subsequent, grant desk permissions to the IAM Identification Heart group.

  1. Below Information catalog within the navigation pane, select Databases.
  2. Choose the database salesds and on the Actions menu, select Grant.
  3. Below Principals, choose IAM Identification Heart and select Add.
  4. Within the pop-up window, enter the IAM Identification Heart group awssso within the search bar and select the awssso-sales group.
  5. Select Assign.
  6. Below LF-Tags or catalog assets, salesds is already chosen for Databases.
  7. For Tables, select sales_schema.store_sales.
  8. Choose Describe and Choose for Desk permissions.
  9. Select Grant to use the permissions.

Mount the exterior schema within the goal Redshift cluster and allow entry for the IAM Identification Heart consumer

Full the next steps:

  1. Check in to the Amazon Redshift console utilizing the admin position.
  2. Navigate to Question Editor v2.
  3. Join as an admin consumer and run the next SQL instructions to mount the AWS Glue database customerds as an exterior schema and allow entry to the gross sales group:
create exterior schema if not exists sales_datashare_schema from DATA CATALOG database salesds catalog_id '<accountid>';
create position "awsidc:awssso-sales"; # If the position was not already created 
grant utilization on schema sales_datashare_schema to position "awsidc:awssso-sales";
grant choose on all tables in schema sales_datashare_schema to position "awsidc:awssso- gross sales";

Entry Redshift knowledge shares as an IAM Identification Heart consumer

Full the next steps to entry the information shares:

  1. On the Amazon Redshift console, navigate to Question Editor v2.
  2. Select the choices menu (three dots) subsequent to the cluster and select Create connection.
  3. Join with IAM Identification Heart and the present IAM Identification Heart consumer and password within the browser login.
  4. Run the next SQL instructions to question the information lake desk:
SELECT * FROM "dev"."sales_datashare_schema"."sales_schema.store_sales";

With Transitive Identification Propagation we will now audit consumer entry to dataset from Lake Formation dashboard and repair used for accessing the dataset offering full trackability. For federated consumer Ethan whose Identification Heart Consumer ID is ‘459e10f6-a3d0-47ae-bc8d-a66f8b054014’ you possibly can see the beneath occasion log.

"eventSource": "lakeformation.amazonaws.com",
    "eventName": "GetDataAccess",
    "awsRegion": "us-east-1",
    "sourceIPAddress": "redshift.amazonaws.com",
    "userAgent": "redshift.amazonaws.com",
    "requestParameters": {
        "tableArn": "arn:aws:glue:us-east-1:xxxx:desk/automountdb/ny_pub",
        "durationSeconds": 3600,
        "auditContext": {
            "additionalAuditContext": "{"invokedBy":"arn:aws:redshift:us-east-1:xxxx:dbuser:redshift-consumer/awsidc:ethan.doe@gmail.com", "transactionId":"961953", "queryId":"613842", "isConcurrencyScalingQuery":"false"}"
        },
        "cellLevelSecurityEnforced": true
    },
    "responseElements": null,
    "additionalEventData": {
        "requesterService": "REDSHIFT",
        "LakeFormationTrustedCallerInvocation": "true",
        "lakeFormationPrincipal": "arn:aws:identitystore:::consumer/459e10f6-a3d0-47ae-bc8d-a66f8b054014",
        "lakeFormationRoleSessionName": "AWSLF-00-RE-726034267621-K7FUMxovuq"
    }

Clear up

Full the next steps to scrub up your assets:

  1. Delete the information from the S3 bucket.
  2. Delete the Lake Formation utility and the Redshift provisioned cluster that you simply created for testing.
  3. Check in to the CloudFormation console because the IAM admin used for creating the CloudFormation stack, and delete the stack you created.

Conclusion

On this put up, we coated tips on how to simplify entry administration for analytics by propagating consumer identification throughout Amazon Redshift and Lake Formation utilizing IAM Identification Heart. We realized tips on how to get began with trusted identification propagation by connecting to Amazon Redshift and Lake Formation. We additionally realized tips on how to configure Redshift Spectrum and knowledge sharing to assist trusted identification propagation.

Study extra about IAM Identification Heart with Amazon Redshift and AWS Lake Formation. Depart your questions and suggestions within the feedback part.


In regards to the Authors

Harshida Patel is a Analytics Specialist Principal Options Architect, with AWS.

Srividya Parthasarathy is a Senior Large Information Architect on the AWS Lake Formation group. She enjoys constructing knowledge mesh options and sharing them with the neighborhood.

Maneesh Sharma is a Senior Database Engineer at AWS with greater than a decade of expertise designing and implementing large-scale knowledge warehouse and analytics options. He collaborates with varied Amazon Redshift Companions and prospects to drive higher integration.

Poulomi Dasgupta is a Senior Analytics Options Architect with AWS. She is enthusiastic about serving to prospects construct cloud-based analytics options to unravel their enterprise issues. Exterior of labor, she likes travelling and spending time together with her household.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles