Coder Social home page Coder Social logo

codewhisperer-for-awsglue-integration's Introduction

Building data integration faster with Amazon CodeWhisperer for AWS Glue

This lab is provided as part of AWS Innovate Data Edition

ℹ️ You will run this lab in your own AWS account. Please follow directions at the end of the lab to remove resources to avoid future costs.

Introduction

Amazon CodeWhisperer is an AI coding companion that uses foundational models under the hood to improve developer productivity.

In this lab, learn how AWS Glue Studio notebook integration with Amazon CodeWhisperer helps you build data integration jobs faster.

Prerequisites

Before going forward with this lab, you need to complete the following prerequisites:

  1. Set up AWS Glue Studio.
  2. Navigate to IAM Console and select Create policy

IAM Create Policy

  1. Choose JSON, Copy and Paste the following JSON document in the IAM policy and click Next
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "glue:GetTableVersions",
                "glue:GetPartitions",
                "glue:GetDevEndpoint",
                "glue:GetJobs",
                "s3:GetBucketWebsite",
                "s3:GetMultiRegionAccessPoint",
                "s3:GetObjectAttributes",
                "s3:GetObjectLegalHold",
                "s3:GetBucketNotification",
                "s3:DescribeMultiRegionAccessPointOperation",
                "s3:GetReplicationConfiguration",
                "glue:GetPartition",
                "glue:DeleteConnection",
                "glue:BatchDeleteConnection",
                "s3:GetStorageLensDashboard",
                "s3:GetLifecycleConfiguration",
                "s3:GetInventoryConfiguration",
                "s3:GetBucketTagging",
                "s3:GetAccessPointPolicyForObjectLambda",
                "glue:BatchDeletePartition",
                "glue:CreateUserDefinedFunction",
                "s3:ListBucket",
                "glue:DeleteJob",
                "codewhisperer:GenerateRecommendations",
                "glue:CreateJob",
                "iam:PassRole",
                "glue:GetConnection",
                "glue:ResetJobBookmark",
                "glue:CreatePartition",
                "glue:UpdatePartition",
                "s3:GetMultiRegionAccessPointPolicyStatus",
                "glue:BatchGetPartition",
                "s3:GetBucketVersioning",
                "s3:GetAccessPointConfigurationForObjectLambda",
                "glue:GetTable",
                "glue:GetDatabase",
                "s3:GetMultiRegionAccessPointRoutes",
                "s3:GetStorageLensConfiguration",
                "s3:GetAccountPublicAccessBlock",
                "s3:ListAllMyBuckets",
                "glue:CreateDatabase",
                "s3:GetBucketCORS",
                "s3:GetObjectVersion",
                "glue:BatchCreatePartition",
                "s3:GetObjectVersionTagging",
                "glue:UpdateTable",
                "glue:DeleteTable",
                "s3:GetStorageLensConfigurationTagging",
                "s3:GetObjectAcl",
                "s3:GetBucketObjectLockConfiguration",
                "s3:GetIntelligentTieringConfiguration",
                "glue:GetUserDefinedFunction",
                "s3:GetObjectVersionAcl",
                "glue:GetUserDefinedFunctions",
                "s3:GetBucketPolicyStatus",
                "glue:UpdateDatabase",
                "s3:GetObjectRetention",
                "glue:CreateTable",
                "glue:GetTables",
                "s3:GetJobTagging",
                "glue:DeleteUserDefinedFunction",
                "glue:CreateConnection",
                "s3:GetObject",
                "glue:GetDevEndpoints",
                "s3:DescribeJob",
                "glue:BatchDeleteTable",
                "s3:GetAnalyticsConfiguration",
                "s3:GetObjectVersionForReplication",
                "glue:DeletePartition",
                "s3:GetAccessPointForObjectLambda",
                "glue:GetJob",
                "glue:GetConnections",
                "s3:GetAccessPoint",
                "glue:DeleteDatabase",
                "s3:GetBucketLogging",
                "s3:GetAccelerateConfiguration",
                "s3:GetObjectVersionAttributes",
                "s3:GetBucketPolicy",
                "glue:*",
                "s3:GetEncryptionConfiguration",
                "s3:GetObjectVersionTorrent",
                "s3:GetBucketRequestPayment",
                "s3:GetAccessPointPolicyStatus",
                "s3:GetObjectTagging",
                "glue:UpdateJob",
                "s3:GetMetricsConfiguration",
                "s3:GetBucketOwnershipControls",
                "glue:GetJobBookmark",
                "s3:GetBucketPublicAccessBlock",
                "glue:UpdateUserDefinedFunction",
                "s3:GetMultiRegionAccessPointPolicy",
                "s3:GetAccessPointPolicyStatusForObjectLambda",
                "glue:GetDatabases",
                "s3:GetBucketAcl",
                "s3:GetObjectTorrent",
                "glue:UpdateConnection",
                "glue:UpdateDevEndpoint",
                "s3:GetBucketLocation",
                "s3:GetAccessPointPolicy"
            ],
            "Resource": "*"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": "s3:GetObject",
            "Resource": [
                "arn:aws:s3:::crawler-public*",
                "arn:aws:s3:::aws-glue*"
            ]
        },
        {
            "Sid": "VisualEditor2",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:DeleteObject"
            ],
            "Resource": "arn:aws:s3:::aws-glue*"
        }
    ]
}

IAM Policy

  1. Name Policy as Glue-CodeWhisperer-GenerateRecommendations-Policy and Create Policy

Create Policy

  1. Navigate to IAM Roles and choose Create Role

  2. Select Glue as Trusted Entity and click Next

Trusted Entity

  1. Select Glue-CodeWhisperer-GenerateRecommendations-Policy in the Permissions Policies and click Next

Permission Policies

  1. Name the role as Glue-CodeWhisperer-GenerateRecommendations-Role and click on Create Role

Create Role

Getting Started

  1. Naviagate to the AWS Glue Studio Console
  2. Select Jupyter Notebook and choose Create.

Create Notebook

  1. For Job name, enter codewhisperer-demo.
  2. For IAM Role, select your IAM role that you configured as a prerequisite.
  3. Choose Start notebook.

Notebook Setup

A new notebook is created with sample cells.

At the bottom, there is a menu named CodeWhisperer. By choosing this menu, you can see the shortcuts and several options, including disabling auto-suggestions.

Let’s try your first recommendation by Amazon CodeWhisperer.

Note that this post contains examples of recommendations, but you may see different code snippets recommended by Amazon CodeWhisperer.

Add a new cell and enter your comment to describe what you want to achieve. After you press Enter, the recommended code is shown.

If you press Tab, then code is chosen. If you press arrow keys, then you can select other recommendations. You can learn more in User actions.

Read JSON File Example

Now let’s read a JSON file from Amazon Simple Storage Service (Amazon S3). Enter the following code comment into a notebook cell and press Enter:

# Create a Spark DataFrame from a json file

CodeWhisperer will recommend a code snippet similar to the following:

# Create a Spark DataFrame from a json file
df = spark.read.json("s3://bucket_name/folder_name/file_name.json")

Now use this method to utilize the suggested code snippet:

df = spark.read.json("s3://awsinnovate2023-data/persons.json")
df.show()

The proceeding code returns the following output:

JSON Output

As you can see from the result, you can quickly utilize the code snippet recommended by Amazon CodeWhisperer.

In the following sections, we provide additional examples of code recommendations. Note that these are just our examples, and different code snippets may be suggested by Amazon CodeWhisperer.

Count Values

You can ask Amazon CodeWhisperer to recommend code to count unique values

#Count unique values on the birth_date column

Amazon CodeWhisperer will recommend a code snippet similar to the following:

df.select("birth_date").distinct().count()
df.show()

The proceeding code returns the following output:

1784

Count Distinct

Sort Records

You can use Amazon CodeWhisperer for sorting data and extracting records within a Spark DataFrame as well:

# Sort DataFrame by column given_name Descending

Amazon CodeWhisperer will recommend a code snippet similar to the following:

df.sort("given_name", ascending=False).show()

The proceeding code returns the following output:

Sort

Add a column with a calculation

In extract, transform, and load (ETL) use cases, it’s common to add new columns from existing columns. When we need to add columns to our Spark DataFrame, we can articulate with a high level of detail to Amazon CodeWhisperer what type of column we need added and its respective attributes:

# Calculate Age in Years At Death

Amazon CodeWhisperer will recommend a code snippet similar to the following:

df.withColumn("age_at_death", datediff(col("death_date"), col("birth_date"))/365).show()

The proceeding code returns the following output:

Age At Death

Generate sample datasets in a Spark DataFrame

Amazon CodeWhisperer is powerful enough to generate sample Spark DataFrames as well, which can be done like so:

# Generate sample Spark DataFrame of country name and country code
# First column name is country_name, and second column name is country_code

Amazon CodeWhisperer will recommend a code snippet similar to the following:

df = spark.createDataFrame([("United States", "US"), ("Canada", "CA"), ("Mexico", "MX")])

Generate

Tear Down

Delete the notebook instance

  1. Click Stop notebook

Stop Notebook

  1. Select Action in the top right corner of Notebook and choose to Delete Delete Notebook

Delete IAM Role and Policy

  1. Navigate to IAM Console and delete Glue-CodeWhisperer-GenerateRecommendations-Role created previously
  2. Delete policy Glue-CodeWhisperer-GenerateRecommendations-Policy created previously

codewhisperer-for-awsglue-integration's People

Contributors

phonghuule avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.