The amazon-codeguru-profiler-for-spark from amzn

Onboarding difficulties

Hi, I've spent a few hours going through the first use of the CodeGuru profiler and thought I'd report the issues I faced if that can allow the documentation to be improved.

I'm only occasionally using EMR, so those might not all be issues that most users would go through.

The official documentation doesn't mention Spark.

I started from the blog post, but the amount of outgoing links make me focus on the official CodeGuru documentation and "Setup Instructions" page of the new profiling group. I never used CodeGuru before and all those other references made me believe that I should use the Java software.amazon.codeguruprofilerjavaagent.Profiler. It actually worked but obviously only installed the agent on the driver. So some of those pages could also mention Spark having its special plugin too.

I didn't notice that the yarn-env.export JSON way of specifying PROFILING_CONTEXT and ENABLE_AMAZON_PROFILER also required setting spark.plugins

It's somewhat mentioned that "an alternative way to specify PROFILING_CONTEXT and ENABLE_AMAZON_PROFILER" doesn't include setting the plugin, and it's also listed in Prerequisites, but highlighting this nearer to the JSON approach paragraph could be easier to spot. To be honest I didn't read that page very carefully given the amount of text encountered in the whole process. At first it also wasn't clear to me whether those environment variables were read by something inside Spark, or something around it.

It's not clear that the plugin must be included in the fat JAR

And the README doesn't mention that it's available on Maven Central, only the blog post mentions it.

That became obvious pretty quickly, but a quick note and a link to a pom.xml snippet could have spared me an iteration.

The JSON property names casing seems wrong in the README and in the blog post

At least the aws CLI reported me this error and I had to capitalize the property names.

Parameter validation failed:
Unknown parameter in Configurations[0]: "classification", must be one of: Classification, Configurations, Properties
Unknown parameter in Configurations[0]: "properties", must be one of: Classification, Configurations, Properties
Unknown parameter in Configurations[0]: "configurations", must be one of: Classification, Configurations, Properties

The yarn-env.export classification didn't work for me (emr-6.13.0)

This was the most tedious issue.

I connected to the workers through SSH and found that the environment variables were properly set in /etc/hadoop/conf.empty/yarn-env.sh, but somehow those environment variables didn't seem to reach the plugin in my Spark worker process and it didn't report "Profiling is enabled".

After ending up on that page, I tried this and it finally worked 🎉:

[
  {
    "Classification": "spark-defaults",
    "Properties": {
      "spark.executorEnv.PROFILING_CONTEXT": "{\\\"profilingGroupName\\\":\\\"CodeGuru-Spark-Demo\\\"}",
      "spark.executorEnv.ENABLE_AMAZON_PROFILER": "true"
    }
  }
]

The profiler has already been useful for us and the flamegraph is actually quite nice, so thank you for putting this in place.

Could you help upgrade the vulnerble dependency in amazon-codeguru-profiler-for-spark?

Hi, @xiongbo-sjtu , I'd like to report a vulnerable dependency in software.amazon.profiler:codeguru-profiler-for-spark:1.0.

Issue Description

I noticed that software.amazon.profiler:codeguru-profiler-for-spark:1.0 directly depends on org.apache.spark:spark-core_2.12:3.0.0 in the pom. However, as shown in the following dependency graph, org.apache.spark:spark-core_2.12:3.0.0 sufferes from the vulnerability which the C library zstd(version:1.4.4) exposed: CVE-2021-24032.

Dependency Graph between Java and Shared Libraries

Suggested Vulnerability Patch Versions

org.apache.spark:spark-core_2.12:3.2.0 (>=3.2.0) has upgraded this vulnerable C library zstd to the patch version 1.5.0.

Java build tools cannot report vulnerable C libraries, which may induce potential security issues to many downstream Java projects. Could you please upgrade this vulnerable dependency?

Thanks for your help~
Best regards,
Helen Parr

amzn / amazon-codeguru-profiler-for-spark Goto Github PK

amazon-codeguru-profiler-for-spark's People

Contributors

Stargazers

Watchers

Forkers

amazon-codeguru-profiler-for-spark's Issues

Onboarding difficulties

The official documentation doesn't mention Spark.

I didn't notice that the yarn-env.export JSON way of specifying PROFILING_CONTEXT and ENABLE_AMAZON_PROFILER also required setting spark.plugins

It's not clear that the plugin must be included in the fat JAR

The JSON property names casing seems wrong in the README and in the blog post

The yarn-env.export classification didn't work for me (emr-6.13.0)

Could you help upgrade the vulnerble dependency in amazon-codeguru-profiler-for-spark?

Issue Description

Dependency Graph between Java and Shared Libraries

Suggested Vulnerability Patch Versions

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent