Coder Social home page Coder Social logo

aws-samples / query-data-in-s3-with-amazon-athena-and-aws-sdk-for-dotnet Goto Github PK

View Code? Open in Web Editor NEW
23.0 10.0 8.0 5.79 MB

This Project provides a sample implementation that will show how to leverage Amazon Athena from .NET Core Application using AWS SDK for .NET to run standard SQL to analyze a large amount of data in Amazon S3.

License: MIT No Attribution

C# 61.54% HTML 3.11% JavaScript 33.66% CSS 0.54% Dockerfile 1.15%
amazon-athena aws-sdk s3-bucket dotnet dotnet-core

query-data-in-s3-with-amazon-athena-and-aws-sdk-for-dotnet's Introduction

How to use SQL to query data in S3 Bucket with Amazon Athena and AWS SDK for .NET

This Project provides a sample implementation that will show how to leverage Amazon Athena from .NET Core Application using AWS SDK for .NET to run standard SQL to analyze a large amount of data in Amazon S3. To showcase a more realistic use-case, it includes a WebApp UI developed using ReactJs. this WebApp contains components to demonstrate fetching COVID-19 data from API Server that uses AWS SDK for .NET to connect to Amazon Athena and run SQL Standard query from datasets on Amazon S3 files from a Data Lake account. This Data Lake account is the aws-covid19-lake account, made available on Registry of Open Data on AWS

Those ReatJs Components call .NET Core API that runs Amazon Athena Query, check the execution status, and list results. Each menu presents different views.

Menu option Testing By Date: Shows a filter by Date that presents a table with the following data: Date, State, Positive, Negative, Pending, Hospitalized, Death, Positive Increase

Menu option Testing By State: Shows a filter by State that presents a table with the following data: Date, State, Positive, Negative, Pending, Hospitalized, Death Positive Increase

Menu option Hospitals (Run&Go): Run a request to the API server, get 200 with the Query ID, check the status of the execution; when the execution it's completed, it presents a table with the following data: Name, State, Type, ZipCode, Licensed Beds, Staffed Beds, Potential Increase in Beds

Menu option Hospitals (Run&Go): Run request to the API server, wait for the result and presents a table with the following data: Name, State, Type, Zip Code, Licensed Beds, Staffed Beds, Potential Increase in Beds

Steps

To run this project follow the instructions bellow:

1) Deploy Glue Catalog & Athena Database/Tables

#1) Deploy
aws cloudformation create-stack --stack-name covid-lake-stack --template-url https://covid19-lake.s3.us-east-2.amazonaws.com/cfn/CovidLakeStack.template.json --region us-west-2

#2) Check deployment Status
aws cloudformation  describe-stacks --stack-name covid-lake-stack --region us-west-2

Below the result of status check, wait for "StackStatus": "CREATE_COMPLETE"

{
    "Stacks": [
        {
            "StackId": "arn:aws:cloudformation:us-west-2:XXXXXXXX9152:stack/covid-lake-stack/xxxxxxxx-100d-11eb-87ef-xxxxxxxxxxx",
            "StackName": "covid-lake-stack",
            "CreationTime": "2020-10-17T00:12:09.151Z",
            "RollbackConfiguration": {},
            "StackStatus": "CREATE_COMPLETE",
            "DisableRollback": false,
            "NotificationARNs": [],
            "Tags": [],
            "EnableTerminationProtection": false,
            "DriftInformation": {
                "StackDriftStatus": "NOT_CHECKED"
            }
        }
    ]
}

2) Create S3 bucket for Athena Result

#1) Deploy S3 Bucket
aws cloudformation deploy --stack-name athena-results-netcore --template-file ./src/cloud-formation-templates/s3-athena-result.template.yaml --region us-west-2

#2) Check deployment Status
aws cloudformation  describe-stacks --stack-name athena-results-netcore --region us-west-2

Below the result of status check, wait for "StackStatus": "CREATE_COMPLETE" and copy output Bucket Name "OutputValue": "s3://athena-results-netcore-s3bucket-xxxxxxxxxxxx/athena/results/", you will need this to run your code

{
    "Stacks": [
        {
            "StackId": "arn:aws:cloudformation:us-west-2:XXXXXXXX9152:stack/athena-results-netcore/xxxxxxxx-100c-11eb-889f-xxxxxxxxxxx",
            "StackName": "athena-results-netcore",
            "Description": "Amazon S3 bucket to store Athena query results",
            "CreationTime": "2020-10-17T00:02:44.968Z",
            "LastUpdatedTime": "2020-10-17T00:21:13.692Z",
            "RollbackConfiguration": {
                "RollbackTriggers": []
            },
            "StackStatus": "CREATE_COMPLETE",
            "DisableRollback": false,
            "NotificationARNs": [],
            "Outputs": [
                {
                    "OutputKey": "BucketName",
                    "OutputValue": "s3://athena-results-netcore-s3bucket-xxxxxxxxxxxx/athena/results/",
                    "Description": "Name of the Amazon S3 bucket to store Athena query results"
                }
            ],
            "Tags": [],
            "EnableTerminationProtection": false,
            "DriftInformation": {
                "StackDriftStatus": "NOT_CHECKED"
            }
        }
    ]
}

3) COVID-19 Analisys (optional)

Some SQL Query that you can try on your own using Amazon Athena Console UI. This step is optional for this demo, but it helps you explore and learn more about Amazon Athena using Console UI

-- The following query returns the growth of confirmed cases for the past 7 days joined side-by-side with hospital bed availability, broken down by US county:
SELECT 
  cases.fips, 
  admin2 as county, 
  province_state, 
  confirmed,
  growth_count, 
  sum(num_licensed_beds) as num_licensed_beds, 
  sum(num_staffed_beds) as num_staffed_beds, 
  sum(num_icu_beds) as num_icu_beds
FROM 
  "covid-19"."hospital_beds" beds, 
  ( SELECT 
      fips, 
      admin2, 
      province_state, 
      confirmed, 
      last_value(confirmed) over (partition by fips order by last_update) - first_value(confirmed) over (partition by fips order by last_update) as growth_count,
      first_value(last_update) over (partition by fips order by last_update desc) as most_recent,
      last_update
    FROM  
      "covid-19"."enigma_jhu" 
    WHERE 
      from_iso8601_timestamp(last_update) > now() - interval '200' day AND country_region = 'US') cases
WHERE 
  beds.fips = cases.fips AND last_update = most_recent
GROUP BY cases.fips, confirmed, growth_count, admin2, province_state
ORDER BY growth_count desc

--Last 10 records regarding Testing and deaths
SELECT * FROM "covid-19"."world_cases_deaths_testing" order by "date" desc limit 10;

-- Last 10 records regarding Testing and deaths with JOIN on us_state_abbreviations to list State name
SELECT 
   date,
   positive,
   negative,
   pending,
   hospitalized,
   death,
   total,
   deathincrease,
   hospitalizedincrease,
   negativeincrease,
   positiveincrease,
   sta.state AS state_abbreviation,
   abb.state 

FROM "covid-19"."covid_testing_states_daily" sta
JOIN "covid-19"."us_state_abbreviations" abb ON sta.state = abb.abbreviation
limit 500;

4) Build & Run .NET Web Application

  1. Go to the app root dir
cd ./src/app/AthenaNetCore/
  1. Create AWS Credential file, for security precaution the file extension *.env is added to .gitignore to avoid accidental commit
code aws-credentials-do-not-commit.env #You can use any text editor eg: vi -> vi aws-credentials-do-not-commit.env

Below example of env file content, replace the XXXX... with your real AWS Credential, and add to S3_RESULT the output result you got from steep 2)

AWS_DEFAULT_REGION=us-west-2
AWS_ACCESS_KEY_ID=XXXXXXXXXXXXXXXXXXXX
AWS_SECRET_ACCESS_KEY=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
AWS_SESSION_TOKEN=XXXXX #(Optional, used only in case of temporary token, you'll need to remove this comment on the .env file)
S3_RESULT_BUCKET_NAME=s3://athena-results-netcore-s3bucket-xxxxxxxxxxxx/athena/results/ #paste the bucket name you've copied on the step 2, you'll need to remove this comment on the .env file)
  1. Build .NET APP using docker-compose
docker-compose -f ./docker-compose.yml build
  1. Run .NET APP docker-compose
docker-compose -f ./docker-compose.yml up
  1. Test .NET APP via URL http://localhost:8089/

  2. Clean up

# 1) Clean local resources
docker-compose down -v

# 2) Clean s3 objects created by Athena to store Results metadata
 aws s3 rm --recursive s3://athena-results-netcore-s3bucket-xxxxxxxxxxxx/athena/results/

# 3) Delete S3 bucket
aws cloudformation delete-stack --stack-name athena-results-netcore --region us-west-2

# 4) Delete Athena Tables
aws cloudformation delete-stack --stack-name covid-lake-stack

 

References

https://aws.amazon.com/blogs/big-data/a-public-data-lake-for-analysis-of-covid-19-data/

https://docs.aws.amazon.com/athena/latest/ug/code-samples.html

https://aws.amazon.com/blogs/apn/using-athena-express-to-simplify-sql-queries-on-amazon-athena/

https://docs.aws.amazon.com/sdk-for-net/v3/developer-guide/net-dg-config-creds.html

https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html

https://docs.aws.amazon.com/sdk-for-net/latest/developer-guide/creds-assign.html

https://github.com/awsdocs/aws-cloud9-user-guide/blob/master/LICENSE-SAMPLECODE

query-data-in-s3-with-amazon-athena-and-aws-sdk-for-dotnet's People

Contributors

amazon-auto avatar dependabot[bot] avatar ulili5 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

query-data-in-s3-with-amazon-athena-and-aws-sdk-for-dotnet's Issues

The S3 location provided to save your query results is invalid

Describe the bug
I'm trying to use this project as a scaffold to a project that I'm building with c# and athena. I configured every step in the readme and was able to start. However, when I clicked in any covid19 query, I got the error bellow:

athenanetcore.webapp_1 | fail: Microsoft.AspNetCore.Diagnostics.ExceptionHandlerMiddleware[1] athenanetcore.webapp_1 | An unhandled exception has occurred while executing the request. athenanetcore.webapp_1 | Amazon.Athena.Model.InvalidRequestException: The S3 location provided to save your query results is invalid. Please check your S3 location is correct and is in the same region and try again. If you continue to see the issue, contact customer support for further assistance.

This error does not make sense, because I tested the bucket generated by CF stack in step 2 inside DBeaver and the results was returned without problems (picture1/picture2).

Bellow is my env configurations:

AWS_DEFAULT_REGION=us-east-1
AWS_ACCESS_KEY_ID=xxxxxxx
AWS_SECRET_ACCESS_KEY=yyyyyyyyyyyyyyy
S3_RESULT_BUCKET_NAME=s3://athena-results-netcore-s3bucket-cycw2c22srf5/athena/results/

Is there aditional steps I'm missing to make this project work?

Thanks!

Screenshots
picture1
picture2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.