Thursday, November 7, 2024

Use Amazon OpenSearch Ingestion emigrate to Amazon OpenSearch Serverless

Amazon OpenSearch Serverless is an on-demand auto scaling configuration for Amazon OpenSearch Service. Since its launch, the curiosity for OpenSearch Serverless had been steadily rising. Clients favor to let the service handle its capability routinely somewhat than having to manually provision capability. Till now, clients have needed to depend on utilizing customized code or third-party options to maneuver the info between provisioned OpenSearch Service domains and OpenSearch Serverless.

We not too long ago launched a function with Amazon OpenSearch Ingestion (OSI) to make this migration much more easy. OSI is a completely managed, serverless knowledge collector that delivers real-time log, metric, and hint knowledge to OpenSearch Service domains and OpenSearch Serverless collections.

On this submit, we define the steps to make migrate the info between provisioned OpenSearch Service domains and OpenSearch Serverless. Migration of metadata resembling safety roles and dashboard objects will likely be lined in one other subsequent submit.

Answer overview

The next diagram reveals the required elements for shifting knowledge between OpenSearch Service provisioned domains and OpenSearch Serverless utilizing OSI. You’ll use OSI with OpenSearch Service as supply and an OpenSearch Serverless assortment as sink.

Conditions

Earlier than getting began, full the next steps to create the required assets:

  1. Create an AWS Id and Entry Administration (IAM) function that the OpenSearch Ingestion pipeline will assume to write down to the OpenSearch Serverless assortment. This function must be specified within the sts_role_arn parameter of the pipeline configuration.
  2. Connect a permissions coverage to the function to permit it to learn knowledge from the OpenSearch Service area. The next is a pattern coverage with least privileges:
    {
       "Model":"2012-10-17",
       "Assertion":[
          {
             "Effect":"Allow",
             "Action":"es:ESHttpGet",
             "Resource":[
                "arn:aws:es:us-east-1:{account-id}:domain/{domain-name}/",
                "arn:aws:es:us-east-1:{account-id}:domain/{domain-name}/_cat/indices",
                "arn:aws:es:us-east-1:{account-id}:domain/{domain-name}/_search",
                "arn:aws:es:us-east-1:{account-id}:domain/{domain-name}/_search/scroll",
                "arn:aws:es:us-east-1:{account-id}:domain/{domain-name}/*/_search"
             ]
          },
          {
             "Impact":"Permit",
             "Motion":"es:ESHttpPost",
             "Useful resource":[
                "arn:aws:es:us-east-1:{account-id}:domain/{domain-name}/*/_search/point_in_time",
                "arn:aws:es:us-east-1:{account-id}:domain/{domain-name}/*/_search/scroll"
             ]
          },
          {
             "Impact":"Permit",
             "Motion":"es:ESHttpDelete",
             "Useful resource":[
                "arn:aws:es:us-east-1:{account-id}:domain/{domain-name}/_search/point_in_time",
                "arn:aws:es:us-east-1:{account-id}:domain/{domain-name}/_search/scroll"
             ]
          }
       ]
    }

  3. Connect a permissions coverage to the function to permit it to ship knowledge to the gathering. The next is a pattern coverage with least privileges:
    {
      "Model": "2012-10-17",
      "Assertion": [
        {
          "Action": [
            "aoss:BatchGetCollection",
            "aoss:APIAccessAll"
          ],
          "Impact": "Permit",
          "Useful resource": "arn:aws:aoss:{area}:{your-account-id}:assortment/{collection-id}"
        },
        {
          "Motion": [
            "aoss:CreateSecurityPolicy",
            "aoss:GetSecurityPolicy",
            "aoss:UpdateSecurityPolicy"
          ],
          "Impact": "Permit",
          "Useful resource": "*",
          "Situation": {
            "StringEquals": {
              "aoss:assortment": "{collection-name}"
            }
          }
        }
      ]
    }

  4. Configure the function to imagine the belief relationship, as follows:
    {
            "Model": "2012-10-17",
            "Assertion": [
                {
                    "Effect": "Allow",
                    "Principal": {
                        "Service": "osis-pipelines.amazonaws.com"
                    },
                    "Action": "sts:AssumeRole"
                }
            ]
        }

  5. It’s beneficial so as to add the aws:SourceAccount and aws:SourceArn situation keys to the coverage for cover in opposition to the confused deputy downside:
    "Situation": {
        "StringEquals": {
            "aws:SourceAccount": "{your-account-id}"
        },
        "ArnLike": {
            "aws:SourceArn": "arn:aws:osis:{area}:{your-account-id}:pipeline/*"
        }
    }

  6. Map the OpenSearch Ingestion area function ARN as a backend consumer (as an all_access consumer) to the area consumer. We present a simplified instance to make use of the all_access function. For manufacturing eventualities, be sure that to make use of a job with simply sufficient permissions to learn and write.
  7. Create an OpenSearch Serverless assortment, which is the place knowledge will likely be ingested.
  8. Affiliate a knowledge coverage, as proven within the following code, to grant the OpenSearch Ingestion function permissions on the gathering:
    [
      {
        "Rules": [
          {
            "Resource": [
              "index/collection-name/*"
            ],
            "Permission": [
              "aoss:CreateIndex",
              "aoss:UpdateIndex",
              "aoss:DescribeIndex",
              "aoss:WriteDocument",
            ],
            "ResourceType": "index"
          }
        ],
        "Principal": [
          "arn:aws:iam::{account-id}:role/pipeline-role"
        ],
        "Description": "Pipeline function entry"
      }
    ]

  9. If the gathering is outlined as a VPC assortment, you want to create a community coverage and configure it within the ingestion pipeline.

Now you’re prepared to maneuver knowledge out of your provisioned area to OpenSearch Serverless.

Transfer knowledge from provisioned domains to Serverless

Setup Amazon OpenSearch Ingestion
To get began, you should have an energetic OpenSearch Service area (supply) and OpenSearch Serverless assortment (sink). Full the next steps to arrange an OpenSearch Ingestion pipeline for migration:

  1. On the OpenSearch Service console, select Pipeline below Ingestion within the navigation pane.
  2. Select Create a pipeline.
  3. For Pipeline identify, enter a reputation (for instance, octank-migration).
  4. For Pipeline capability, you may outline the minimal and most capability to scale up the assets. For now, you may depart the default minimal as 1 and most as 4.
  5. For Configuration Blueprint, choose AWS-OpenSearchDataMigrationPipeline.
  6. Replace the next info for the supply:
    1. Uncomment hosts and specify the endpoint of the prevailing OpenSearch Service endpoint.
    2. Uncomment distribution_version in case your supply cluster is an OpenSearch Service cluster with compatibility mode enabled; in any other case, depart it commented.
    3. Uncomment indices, embody, index_name_regex, and add an index identify or sample that you simply wish to migrate (for instance, octank-iot-logs-2023.11.0*).
    4. Replace area below aws the place your supply area is (for instance, us-west-2).
    5. Replace sts_role_arn below aws to the function that has permission to learn knowledge from the OpenSearch Service area (for instance, arn:aws:iam::111122223333:function/osis-pipeline). This function ought to be added as a backend function throughout the OpenSearch Service safety roles.
  7. Replace the next info for the sink:
    1. Uncomment hosts and specify the endpoint of the prevailing OpenSearch Serverless endpoint.
    2. Replace sts_role_arn below aws to the function that has permission to write down knowledge into the OpenSearch Serverless assortment (for instance, arn:aws:iam::111122223333:function/osis-pipeline). This function ought to be added as a part of the info entry coverage within the OpenSearch Serverless assortment.
    3. Replace the serverless flag to be true.
    4. For index, you may depart it as default, which is able to get the metadata from the supply index and write to the identical identify within the vacation spot as of the sources. Alternatively, if you wish to have a distinct index identify on the vacation spot, modify this worth along with your desired identify.
    5. For document_id, you may get the ID from the doc metadata within the supply and use the identical within the goal. Word that customized doc IDs are supported just for the SEARCH sort of assortment; when you have your assortment as TIMESERIES or VECTORSEARCH, you must remark this line.
  8. Subsequent, you may validate your pipeline to test the connectivity of supply and sink to verify the endpoint exists and is accessible.
  9. For Community settings, select your most well-liked setting:
    1. Select VPC entry and choose your VPC, subnet, and safety group to arrange the entry privately.
    2. Select Public to make use of public entry. AWS recommends that you simply use a VPC endpoint for all manufacturing workloads, however this walkthrough, choose Public.
  10. For Log Publishing Possibility, you may both create a brand new Amazon CloudWatch group or use an present CloudWatch group to write down the ingestion logs. This gives entry to details about errors and warnings raised throughout the operation, which may help throughout troubleshooting. For this walkthrough, select Create new group.
  11. Select Subsequent, and confirm the main points you specified in your pipeline settings.
  12. Select Create pipeline.

It ought to take a few minutes to create the ingestion pipeline.
The next graphic offers a fast demonstration of making the OpenSearch Ingestion pipeline by way of the previous steps.

Confirm ingested knowledge within the goal OpenSearch Serverless assortment

After the pipeline is created and energetic, log in to OpenSearch Dashboards in your OpenSearch Serverless assortment and run the next command to checklist the indexes:

GET _cat/indices?v

The next graphic offers a fast demonstration of itemizing the indexes earlier than and after the pipeline turns into energetic.

Conclusion

On this submit, we noticed how OpenSearch Ingestion can ingest knowledge into an OpenSearch Serverless assortment with out the necessity to use the third-party options. With minimal knowledge producer configuration, it routinely ingested knowledge to the gathering. OSI additionally permits you to rework or reindex the info from ES7.x model earlier than ingestion to an OpenSearch Service area or OpenSearch Serverless assortment. OSI eliminates the necessity to provision, scale, or handle servers. AWS provides varied assets so that you can rapidly begin constructing pipelines utilizing OpenSearch Ingestion. You should use varied built-in pipeline integrations to rapidly ingest knowledge from Amazon DynamoDB, Amazon Managed Streaming for Apache Kafka (Amazon MSK), Amazon Safety Lake, Fluent Bit, and lots of extra. The next OpenSearch Ingestion blueprints allow you to construct knowledge pipelines with minimal configuration adjustments.


Concerning the Authors

Muthu Pitchaimani is a Search Specialist with Amazon OpenSearch Service. He builds large-scale search functions and options. Muthu is within the matters of networking and safety, and is predicated out of Austin, Texas.

Prashant Agrawal is a Sr. Search Specialist Options Architect with Amazon OpenSearch Service. He works carefully with clients to assist them migrate their workloads to the cloud and helps present clients fine-tune their clusters to attain higher efficiency and save on value. Earlier than becoming a member of AWS, he helped varied clients use OpenSearch and Elasticsearch for his or her search and log analytics use instances. When not working, you’ll find him touring and exploring new locations. Briefly, he likes doing Eat → Journey → Repeat.

Rahul Sharma is a Technical Account Supervisor at Amazon Net Companies. He’s passionate concerning the knowledge applied sciences that assist leverage knowledge as a strategic asset and is predicated out of NY city, New York.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles