Thursday, November 7, 2024

Deliver your workforce id to Amazon EMR Studio and Athena

Clients right now might wrestle to implement correct entry controls and auditing on the consumer degree when a number of purposes are concerned in knowledge entry workflows. The important thing problem is to implement correct least-privilege entry controls primarily based on consumer id when one utility accesses knowledge on behalf of the consumer in one other utility. It forces you to both give all customers broad entry by way of the appliance with no auditing, or attempt to implement complicated bespoke options to map roles to customers.

Utilizing AWS IAM Id Middle, now you can propagate consumer id to a set of AWS providers and decrease the necessity to construct and preserve complicated customized programs to vend roles between purposes. IAM Id Middle additionally supplies a consolidated view of customers and teams in a single place that the interconnected purposes can use for authorization and auditing.

IAM Id Middle permits centralized administration of consumer entry to AWS accounts and purposes utilizing id suppliers (IDPs) like Okta. This enables customers to log in a single time with their present company credentials and seamlessly entry downstream AWS providers supporting id propagation. With IAM Id Middle, Okta consumer identities and teams might be mechanically synced utilizing SCIM 2.0 for correct consumer info in AWS.

Amazon EMR Studio is a unified knowledge evaluation atmosphere the place you may develop knowledge engineering and knowledge science purposes. Now you can develop and run interactive queries on Amazon Athena from EMR Studio (for extra particulars, check with Amazon EMR Studio provides interactive question editor powered by Amazon Athena ). Athena customers can entry EMR Studio with out logging in to the AWS Administration Console by enabling federated entry out of your IdP through IAM Id Middle. This removes the complexity of sustaining completely different identities and mapping consumer roles throughout your IdP, EMR Studio, and Athena.

You’ll be able to govern Athena workgroups primarily based on consumer attributes from Okta to manage question entry and prices. AWS Lake Formation may also use Okta identities to implement fine-grained entry controls by way of granting and revoking permissions.

IAM Id Middle and Okta single sign-on (SSO) integration streamlines entry to EMR Studio and Athena with centralized authentication. Customers can have a well-recognized sign-in expertise with their workforce credentials to securely run queries in Athena. Entry insurance policies on Athena workgroups and Lake Formation permissions present governance primarily based on Okta consumer profiles.

This weblog put up explains easy methods to allow single sign-on to EMR Studio utilizing IAM Id Middle integration with Okta. It exhibits easy methods to propagate Okta identities to Athena and Lake Formation to offer granular entry controls on queries and knowledge. The answer streamlines entry to analytics instruments with centralized authentication utilizing workforce credentials. It leverages AWS IAM Id Middle, Amazon EMR Studio, Amazon Athena, and AWS Lake Formation.

Resolution overview

IAM Id Middle permits customers to hook up with EMR Studio with no need directors to manually configure AWS Id and Entry Administration (IAM) roles and permissions. It permits mapping of IAM Id Middle teams to present company id roles and teams. Admins can then assign privileges to roles and teams and assign customers to them, enabling granular management over consumer entry. IAM Id Middle supplies a central repository of all customers in AWS. You’ll be able to create customers and teams straight in IAM Id Middle or join present customers and teams from suppliers like Okta, Ping Id, or Azure AD. It handles authentication by way of your chosen id supply and maintains a consumer and group listing for EMR Studio entry. Identified consumer identities and logged knowledge entry facilitates compliance by way of auditing consumer entry in AWS CloudTrail.

The next diagram illustrates the answer structure.

Solution Overview

The EMR Studio workflow consists of the next high-level steps:

  1. The tip-user launches EMR Studio utilizing the AWS entry portal URL. This URL is offered by an IAM Id Middle administrator through the IAM Id Middle dashboard.
  2. The URL redirects the end-user to the workforce IdP Okta, the place the consumer enters workforce id credentials.
  3. After profitable authentication, the consumer might be logged in to the AWS console as a federated consumer.
  4. The consumer opens EMR Studio and navigates to the Athena question editor utilizing the hyperlink out there on EMR Studio.
  5. The consumer selects the proper workgroup as per the consumer function to run Athena queries.
  6. The question outcomes are saved in separate Amazon Easy Storage Service (Amazon S3) areas with a prefix that’s primarily based on consumer id.

To implement the answer, we full the next steps:

  1. Combine Okta with IAM Id Middle to sync customers and teams.
  2. Combine IAM Id Middle with EMR Studio.
  3. Assign customers or teams from IAM Id Middle to EMR Studio.
  4. Arrange Lake Formation with IAM Id Middle.
  5. Configure granular role-based entitlements utilizing Lake Formation on propagated company identities.
  6. Arrange workgroups in Athena for governing entry.
  7. Arrange Amazon S3 entry grants for fine-grained entry to Amazon S3 sources like buckets, prefixes, or objects.
  8. Entry EMR Studio by way of the AWS entry portal utilizing IAM Id Middle.
  9. Run queries on the Athena SQL editor in EMR Studio.
  10. Overview the end-to-end audit path of workforce id.

Stipulations

To comply with alongside this put up, you must have the next:

  • An AWS account – In case you don’t have one, you may join right here.
  • An Okta account that has an energetic subscription – You want an administrator function to arrange the appliance on Okta. In case you’re new to Okta, you may join a free trial or a developer account.

For directions to configure Okta with IAM Id Middle, check with Configure SAML and SCIM with Okta and IAM Id Middle.

Combine Okta with IAM Id Middle to sync customers and teams

After you’ve efficiently synced customers or teams from Okta to IAM Id Middle, you may see them on the IAM Id Middle console, as proven within the following screenshot. For this put up, we created and synced two consumer teams:

  • Knowledge Engineer
  • Knowledge Scientists

Workforce Identity groups in IAM Identity Center

Subsequent, create a trusted token issuer in IAM Id Middle:

  1. On the IAM Id Middle console, select Settings within the navigation pane.
  2. Select Create trusted token issuer.
  3. For Issuer URL, enter the URL of the trusted token issuer.
  4. For Trusted token issuer title, enter Okta.
  5. For Map attributes¸ map the IdP attribute Electronic mail to the IAM Id Middle attribute Electronic mail.
  6. Select Create trusted token issuer.
    Create a Trusted Token Issuer in IAM Identity Center

The next screenshot exhibits your new trusted token issuer on the IAM Id Middle console.

Okta Trusted Token Issuer in Identity Center

Combine IAM Id Middle with EMR Studio

We begin with making a trusted id propagation enabled in EMR Studio.

An EMR Studio administrator should carry out the steps to configure EMR Studio as an IAM Id Middle-enabled utility. This permits EMR Studio to find and hook up with IAM Id Middle mechanically to obtain sign-in and consumer listing providers.

The purpose of enabling EMR Studio as an IAM Id Middle-managed utility is so you may management consumer and group permissions from inside IAM Id Middle or from a supply third-party IdP that’s built-in with it (Okta on this case). When your customers sign up to EMR Studio, for instance data-engineer or data-scientist, it checks their teams in IAM Id Middle, and these are mapped to roles and entitlements in Lake Formation. On this method, a bunch can map to a Lake Formation database function that permits learn entry to a set of tables or columns.

The next steps present easy methods to create EMR Studio as an AWS-managed utility with IAM Id Middle, then we see how the downstream purposes like Lake Formation and Athena propagate these roles and entitlements utilizing present company credentials.

  1. On the Amazon EMR console, navigate to EMR Studio.
  2. Select Create a Studio.
  3. For Setup choices, choose Customized.
  4. For Studio title, enter a reputation.
  5. For S3 location for Workspace storage, choose Choose present location and enter the Amazon S3 location.

Create EMR Studio with Custom Set up option

6. Configure permission particulars for the EMR Studio.

Be aware that if you select View permission particulars below Service function, a brand new pop-up window will open. You should create an IAM function with the identical insurance policies as proven within the pop-up window. You should utilize the identical in your service function and IAM function.

Permission details for EMR studio

  1. On the Create a Studio web page, for Authentication, choose AWS IAM Id Middle.
  2. For Consumer function, select your consumer function.
  3. Below Trusted id propagation, choose Allow trusted id propagation.
  4. Below Software entry, choose Solely assigned customers and teams.
  5. For VPC, enter your VPC.
  6. For Subnets, enter your subnet.
  7. For Safety and entry, choose Default safety group.
  8. Select Create Studio.

Enable Identity Center and Trusted Identity Propagation

You need to now see an IAM Id Middle-enabled EMR Studio on the Amazon EMR console.

IAM Identity Center enabled EMR Studio

After the EMR Studio administrator finishes creating the trusted id propagation-enabled EMR Studio and saves the configuration, the occasion of the EMR Studio seems as an IAM Id Middle-enabled utility on the IAM Id Middle console.

EMR Studio appears under AWS Managed app in IAM Identity Centre

Assign customers or teams from IAM Id Middle to EMR Studio

You’ll be able to assign customers and teams out of your IAM Id Middle listing to the EMR Studio utility after syncing with IAM. The EMR Studio administrator decides which IAM Id Middle customers or teams to incorporate within the app. For instance, in case you have 10 complete teams in IAM Id Middle however don’t need all of them accessing this occasion of EMR Studio, you may choose which teams to incorporate within the EMR Studio-enabled IAM app.

The next steps assign teams to EMR Studio-enabled IAM Id Middle utility:

  1. On the EMR Studio console, navigate to the brand new EMR Studio occasion.
  2. On the Assigned teams tab, select Assign teams.
  3. Select which IAM Id Middle teams you need to embody within the utility. For instance, chances are you’ll select the Knowledge-Scientist and Knowledge-Engineer teams.
  4. Select Executed.

This enables the EMR Studio administrator to decide on particular IAM Id Middle teams to be assigned entry to this particular occasion built-in with IAM Id Middle. Solely the chosen teams might be synced and given entry, not all teams from the IAM Id Middle listing.

Assign Trusted Identity Propagation enabled EMR studio to your user groups by selecting groups from Studio settings

Arrange Lake Formation with IAM Id Middle

To arrange Lake Formation with IAM Id Middle, just remember to have configured Okta because the IdP for IAM Id Middle, and ensure that the customers and teams kind Okta at the moment are out there in IAM Id Middle. Then full the next steps:

  1. On the Lake Formation console, select IAM Id Middle Integration below Administration within the navigation pane.

You will notice the message “IAM Id Middle enabled” together with the ARN for the IAM Id Middle utility.

  1. Select Create.

In a couple of minutes, you will note a message indicating that Lake Formation has been efficiently built-in along with your centralized IAM identities from Okta Id Middle. Particularly, the message will state “Efficiently created id heart integration with utility ARN,” signifying the combination is now in place between Lake Formation and the identities managed in Okta.

IAM Identity Center enabled AWS Lake Formation

Configure granular role-based entitlements utilizing Lake Formation on propagated company identities

We’ll now arrange granular entitlements for our knowledge entry in Lake Formation. For this put up, we summarize the steps wanted to make use of the prevailing company identities on the Lake Formation console to offer related controls and governance on the information, which we are going to later question by way of the Athena question editor. To find out about establishing databases and tables in Lake Formation, check with Getting began with AWS Lake Formation

This put up is not going to go into the total particulars about Lake Formation. As a substitute, we are going to give attention to a brand new functionality that has been launched in Lake Formation—the flexibility to arrange permissions primarily based in your present company identities which are synchronized with IAM Id Middle.

This integration permits Lake Formation to make use of your group’s IdP and entry administration insurance policies to manage permissions to knowledge lakes. Quite than defining permissions from scratch particularly for Lake Formation, now you can depend on your present customers, teams, and entry controls to find out who can entry knowledge catalogs and underlying knowledge sources. General, this new integration with IAM Id Middle makes it easy to handle permissions in your knowledge lake workloads utilizing your company identities. It reduces the executive overhead of conserving permissions aligned throughout separate programs. As AWS continues enhancing Lake Formation, options like this can additional enhance its viability as a full-featured knowledge lake administration atmosphere.

On this put up, we created a database referred to as zipcode-db-tip and granted full entry to the consumer group Knowledge-Engineer to question on the underlying desk within the database. Full the next steps:

  1. On the Lake Formation console, select Grant knowledge lake permissions.
  2. For Principals, choose IAM Id Middle.
  3. For Customers and teams, choose Knowledge-Engineer.
  4. For LF-Tags or catalog sources, choose Named Knowledge Catalog sources.
  5. For Databases, select zipcode-db-tip.
  6. For Tables, select tip-zipcode.
    Grant Data Lake permissions to users in IAM Identity Center

Equally, we have to present the related entry on the underlying tables to the customers and teams for them to have the ability to question on the information.

  1. Repeat the previous steps to offer entry to the Knowledge-Engineer group to have the ability to question on the information.
  2. For Desk permissions, choose Choose, Describe, and Tremendous.
  3. For Knowledge permissions, choose All knowledge entry.

You’ll be able to grant selective entry on rows and feedback as per your particular necessities.

Grant Table permissions in AWS Data Lake

Arrange workgroups in Athena

Athena workgroups are an AWS function that permits you to isolate knowledge and queries inside an AWS account. It supplies a approach to segregate knowledge and management entry so that every group can solely entry the information that’s related to them. Athena workgroups are helpful for organizations that need to prohibit entry to delicate datasets or assist stop queries from impacting one another. If you create a workgroup, you may assign customers and roles to it. Queries launched inside a workgroup will run with the entry controls and settings configured for that workgroup. They permit governance, safety, and useful resource controls at a granular degree. Athena workgroups are an vital function for managing and optimizing Athena utilization throughout giant organizations.

On this put up, we create a workgroup particularly for members of our Knowledge Engineering group. Later, when logged in below Knowledge Engineer consumer profiles, we run queries from inside this workgroup to reveal how entry to Athena workgroups might be restricted primarily based on the consumer profile. This enables governance insurance policies to be enforced, ensuring customers can solely entry permitted datasets and queries primarily based on their function.

  1. On the Athena console, select Workgroups below Administration within the navigation pane.
  2. Select Create workgroup.
  3. For Authentication, choose AWS Id Middle.
  4. For Service function to authorize Athena, choose Create and use a brand new service function.
  5. For Service function title, enter a reputation in your function.
    Select IAM Identity Centre for Athena Authentication option
  6. For Location of question end result, enter an Amazon S3 location for saving your Athena question outcomes.

This can be a obligatory subject if you specify IAM Id Middle for authentication.

Configure location for query result and enable user identity based S3 prefix

After you create the workgroup, it’s essential to assign customers and teams to it. For this put up, we create a workgroup named data-engineer and assign the group Knowledge-Engineer (propagated by way of the trusted id propagation from IAM Id Middle).

  1. On the Teams tab on the data-engineer particulars web page, choose the consumer group to assign and select Assign teams.
    Assign groups option is available in the Groups tab of Workgroup settings

Arrange Amazon S3 entry grants to separate the question outcomes for every workforce id

Subsequent, we arrange Amazon S3 grants.

You’ll be able to watch the next video to arrange the grants or check with Use Amazon EMR with S3 Entry Grants to scale Spark entry Amazon S3 for directions.

Provoke login by way of AWS federated entry utilizing the IAM Id Middle entry portal

Now we’re prepared to hook up with EMR Studio and federated login utilizing IAM Id Middle authentication:

  1. On the IAM Id Middle console, navigate to the dashboard and select the AWS entry portal URL.
  2. A browser pop-up directs you to the Okta login web page, the place you enter your Okta credentials.
  3. After profitable authentication, you’ll be logged in to the AWS console as a federated consumer.
  4. Select the EMR Studio utility.
  5. After you federate to EMR Studio, select Question Editor within the navigation pane to open a brand new tab with the Athena question editor.

The next video exhibits a federated consumer utilizing the AWS entry portal URL to entry EMR Studio utilizing IAM Id Middle authentication.

Run queries with granular entry on the editor

On EMR Studio, the consumer can open the Athena question editor after which specify the proper workgroup within the question editor to run the queries.

Athena Query result in data-engineer workgroup

The info engineer can question solely the tables on which the consumer has entry. The question outcomes will seem below the S3 prefix, which is separate for every workforce id.

Overview the end-to-end audit path of workforce id

The IAM Id Middle administrator can look into the downstream apps which are trusted for id propagation, as proven within the following screenshot of the IAM Id Middle console.

AWS IAM Identity Center view of the trusted applications

On the CloudTrail console, the occasion historical past shows the occasion title and useful resource accessed by the particular workforce id.

Auditors can see the workforce identity who executed the query on AWS Data Lake

If you select an occasion in CloudTrail, the auditors can see the distinctive consumer ID that accessed the underlying AWS Analytics providers.

Clear up

Full the next steps to scrub up your sources:

  1. Delete the Okta purposes that you simply created to combine with IAM Id Middle.
  2. Delete IAM Id Middle configuration.
  3. Delete the EMR Studio that you simply created for testing.
  4. Delete the IAM function that you simply created for IAM Id Middle and EMR Studio integration.

Conclusion

On this put up, we confirmed you an in depth walkthrough to deliver your workforce id to EMR Studio and propagate the id to related AWS purposes like Athena and Lake Formation. This answer supplies your workforce with a well-recognized sign-in expertise, with out the necessity to bear in mind extra credentials or preserve complicated function mapping throughout completely different analytics programs. As well as, it supplies auditors with end-to-end visibility into workforce identities and their entry to analytics providers.

To study extra about trusted id propagation and EMR Studio, check with Combine Amazon EMR with AWS IAM Id Middle.


In regards to the authors

Manjit Chakraborty is a Senior Options Architect at AWS. He’s a Seasoned & Outcome pushed skilled with intensive expertise in Monetary area having labored with clients on advising, designing, main, and implementing core-business enterprise options throughout the globe. In his spare time, Manjit enjoys fishing, working towards martial arts and enjoying together with his daughter.

Neeraj Roy is a Principal Options Architect at AWS primarily based out of London. He works with World Monetary Companies clients to speed up their AWS journey. In his spare time, he enjoys studying and spending time together with his household.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles