amzn / amazon-codeguru-profiler-for-spark Goto Github PK
View Code? Open in Web Editor NEWA Spark plugin for CPU and memory profiling
License: Apache License 2.0
A Spark plugin for CPU and memory profiling
License: Apache License 2.0
Hi, I've spent a few hours going through the first use of the CodeGuru profiler and thought I'd report the issues I faced if that can allow the documentation to be improved.
I'm only occasionally using EMR, so those might not all be issues that most users would go through.
I started from the blog post, but the amount of outgoing links make me focus on the official CodeGuru documentation and "Setup Instructions" page of the new profiling group. I never used CodeGuru before and all those other references made me believe that I should use the Java software.amazon.codeguruprofilerjavaagent.Profiler. It actually worked but obviously only installed the agent on the driver. So some of those pages could also mention Spark having its special plugin too.
It's somewhat mentioned that "an alternative way to specify PROFILING_CONTEXT and ENABLE_AMAZON_PROFILER" doesn't include setting the plugin, and it's also listed in Prerequisites, but highlighting this nearer to the JSON approach paragraph could be easier to spot. To be honest I didn't read that page very carefully given the amount of text encountered in the whole process. At first it also wasn't clear to me whether those environment variables were read by something inside Spark, or something around it.
And the README doesn't mention that it's available on Maven Central, only the blog post mentions it.
That became obvious pretty quickly, but a quick note and a link to a pom.xml snippet could have spared me an iteration.
At least the aws CLI reported me this error and I had to capitalize the property names.
Parameter validation failed:
Unknown parameter in Configurations[0]: "classification", must be one of: Classification, Configurations, Properties
Unknown parameter in Configurations[0]: "properties", must be one of: Classification, Configurations, Properties
Unknown parameter in Configurations[0]: "configurations", must be one of: Classification, Configurations, Properties
This was the most tedious issue.
I connected to the workers through SSH and found that the environment variables were properly set in /etc/hadoop/conf.empty/yarn-env.sh
, but somehow those environment variables didn't seem to reach the plugin in my Spark worker process and it didn't report "Profiling is enabled".
After ending up on that page, I tried this and it finally worked ๐:
[
{
"Classification": "spark-defaults",
"Properties": {
"spark.executorEnv.PROFILING_CONTEXT": "{\\\"profilingGroupName\\\":\\\"CodeGuru-Spark-Demo\\\"}",
"spark.executorEnv.ENABLE_AMAZON_PROFILER": "true"
}
}
]
The profiler has already been useful for us and the flamegraph is actually quite nice, so thank you for putting this in place.
Hi, @xiongbo-sjtu , I'd like to report a vulnerable dependency in software.amazon.profiler:codeguru-profiler-for-spark:1.0.
I noticed that software.amazon.profiler:codeguru-profiler-for-spark:1.0 directly depends on org.apache.spark:spark-core_2.12:3.0.0 in the pom. However, as shown in the following dependency graph, org.apache.spark:spark-core_2.12:3.0.0 sufferes from the vulnerability which the C library zstd(version:1.4.4) exposed: CVE-2021-24032.
org.apache.spark:spark-core_2.12:3.2.0 (>=3.2.0) has upgraded this vulnerable C library zstd
to the patch version 1.5.0.
Java build tools cannot report vulnerable C libraries, which may induce potential security issues to many downstream Java projects. Could you please upgrade this vulnerable dependency?
Thanks for your help~
Best regards,
Helen Parr
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.