Comments (6)
Is there anyway to determine what bucket it cannot access? Is it the bucket where the pyspark code resides or is it a bucket where data resides? Can you look in the logs for any more details?
from autonomous-driving-data-framework.
One thing that I am not able to understand is, my EMR virtual cluster's EKS namespace is emr-eks-spark , refer point 1 below, whereas there is no such namespace called "emr-eks-spark" in the EKS cluster , refer point 2 below. My understanding is, EKS namespace should be created and only then EMR virtual cluster can be created (I may be wrong).
But if a namespace is a hard per-requisite for EMR virtual cluster then the namespace emr-eks-spark existed at some point in time but I am unable to find any place in ADDF where EKS namespace emr-eks-spark is created.
1. describe virtual cluster to get namespace
(.venv) (base) aws emr-containers describe-virtual-cluster --id <masked>
{
"virtualCluster": {
"id": "<masked>",
"name": "addf-ros-image-demo-emr-emr-<masked>",
"arn": "arn:aws:emr-containers:<masked>:<masked>:/virtualclusters/<masked>",
"state": "RUNNING",
"containerProvider": {
"type": "EKS",
"id": "addf-ros-image-demo-core-eks-cluster",
"info": {
"eksInfo": {
"namespace": "**emr-eks-spark**"
}
}
},
"createdAt": "<masked>",
"tags": {
"Deployment": "addf-ros-image-demo"
}
}
}
2. describe all namespace
(.venv) (base) kubectl describe ns
Name: default
Labels: kubernetes.io/metadata.name=default
Annotations: <none>
Status: Active
No resource quota.
No LimitRange resource.
Name: kube-node-lease
Labels: kubernetes.io/metadata.name=kube-node-lease
Annotations: <none>
Status: Active
No resource quota.
No LimitRange resource.
Name: kube-public
Labels: kubernetes.io/metadata.name=kube-public
Annotations: <none>
Status: Active
No resource quota.
No LimitRange resource.
Name: kube-system
from autonomous-driving-data-framework.
Is there anyway to determine what bucket it cannot access? Is it the bucket where the pyspark code resides or is it a bucket where data resides? Can you look in the logs for any more details?
I tried submitting the pyspark job by specifying the S3 Log bucket and Cloudwatch logs , refer code below, job submitted successfully, job failed with same error as stated in issue, logs were not seen in S3 neither in Cloudwatch logs. I had provided temporary elevated access to "execution-role-arn" before submitting job so it doesn't seem like IAM access issue on EMR JOb's end.
aws emr-containers start-job-run \
--virtual-cluster-id <masked>\
--name scene_detection_manual \
--execution-role-arn arn:aws:iam::<masked>:role/addf-ros-image-demo-emr-e-<masked> \
--release-label emr-6.8.0-latest \
--job-driver '{
"sparkSubmitJobDriver": {
"entryPoint": "s3://addf-ros-image-demo-artifacts-bucket-<masked>/dags/ros-image-demo/dags-aws/spark_scripts/detect_scenes.py",
"entryPointArguments": ["--batch-metadata-table-name addf-ros-image-demo-dags-aws-drive-tracking --batch-id 6_dec_2022 --bucket addf-ros-image-demo-curated-bucket-<masked>--region ap-south-1 --output-dynamo-table addf-ros-image-demo-dags-aws-scenes"],
"sparkSubmitParameters": "--conf spark.executor.instances=3 --conf spark.executor.memory=4G --conf spark.driver.memory=2G --conf spark.executor.cores=2 --conf spark.sql.shuffle.partitions=60 --conf spark.dynamicAllocation.enabled=false --packages com.audienceproject:spark-dynamodb_2.12:1.1.1"
}
}' --configuration-overrides '{
"monitoringConfiguration": {
"cloudWatchMonitoringConfiguration": {
"logGroupName": "/emr-on-eks/emr-on-eks-to-delete",
"logStreamNamePrefix": "detect_scenes_todelete"
},
"s3MonitoringConfiguration": {
"logUri": "s3://addf-ros-image-demo-logs-bucket-<masked>/emr-on-eks"
}
}
}'
from autonomous-driving-data-framework.
Since the namespace 'emr-eks-spark' did not exists, we followed this doc link to test further
- created a new namespace "spark"
kubectl create namespace spark - added emr-containers in config map of EKS cluster
eksctl create iamidentitymapping --cluster addf-ros-image-demo-core-eks-cluster --namespace spark --service-name "emr-containers" - enabled IAM role
eksctl utils associate-iam-oidc-provider --cluster addf-ros-image-demo-core-eks-cluster --approve - created new virtual cluster
aws emr-containers create-virtual-cluster \
--name to-delete \
--container-provider '{
"id": "addf-ros-image-demo-core-eks-cluster",
"type": "EKS",
"info": {
"eksInfo": {
"namespace": "spark"
}
}
}'
- I was able to submit spark jobs to this new virtual cluster, spark jobs were in scheduled state for 15 minutes and then they fail.
Observation : Earlier the spark jobs were failing as soon as they were submitted (<2 seconds). In New virtual EMR cluster created with proper namespace, the jobs stay in scheduled mode for 15 minutes and then fail. In Scheduled state , the Spark's resource manager negotiates resource allocation with cluster manager, I think the communication or resource allocation between Spark's resource manager and EKS cluster is the root cause of this issue.
PS: I had discussed the same with @kevinsoucy
from autonomous-driving-data-framework.
@srinivasreddych @manojrajpurohit Sooo...I think you may have hosed your cluster when you ran the eksctl utils associate-iam-oidc-provider --cluster addf-ros-image-demo-core-eks-cluster --approve
command as the ADDF cluster already has an OIDC provider. It sounds like the service account for EMR-on-EKS was not installed by the module correctly? @srinivasreddych can you look and we can circle back later?
from autonomous-driving-data-framework.
Closing due to inactivity. Please reopen once eyes can focus on it
from autonomous-driving-data-framework.
Related Issues (20)
- Merge re:Invent workshop changes
- [INVESTIGATION] OpenLineage support HOT 1
- [Q1 2023]FSx for Lustre on EKS HOT 2
- [BUG] rosbag-webviz deploying issue HOT 2
- [WEB-APP] Deploy Sketch Wireframes on AWS
- [Q1 2023] Example Terraform module HOT 1
- Example Terraform prereqs module HOT 1
- [BUG]rosbag scene detection no longer deploys HOT 1
- [FEATURE]Tests coverage for ADDF
- [FEATURE]Update the example manifests to use the `git` path
- [FEATURE]ADDF 2.0 release HOT 1
- [FEATURE]Improve example tf module
- [BUG] - DDB-to-Opensearch Readme missing a parameter HOT 1
- [BUG] OpenSearch ProxyUsing IDMSv1 HOT 2
- [BUG] HOT 3
- [FEATURE]Refactor Rosbag Image pipeline w/ scene detection
- [FEATURE] CloudWatch Alarm - SNS - Email
- [BUG] OpenSearch Domain does not have module name embedded in name HOT 1
- [BUG] Update aws-emr-launch library to newer version
- [BUG] Replace EMR-Launch from stack HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from autonomous-driving-data-framework.