marcelmay / hadoop-hdfs-fsimage-exporter Goto Github PK
View Code? Open in Web Editor NEWExports Hadoop HDFS content statistics to Prometheus
License: Apache License 2.0
Exports Hadoop HDFS content statistics to Prometheus
License: Apache License 2.0
and yes, having a (pre-build) docker image with this exporter would be highly useful ๐
Currently time is formatted as number of milliseconds elapsed. Replace with ISO8601 format for log events.
Example:
2018-11-20 19:40:53,157
Apart from bugfixes, the new version 1.1 brings major performance improvements.
Note: Update to latest version 1.1.1
Compile/runtime dependencies:
Test dependencies:
Add a JMH benchmark for micro profiling (GC) and stress testing.
The exporter exports size distribution histogram metrics by default.
This is useful for user/group/total stats, but not that interesting for path based stats.
Make it configurable if group/user/path based file size should be Summary(no distribution) or Histogram (size distribution, potentially many time series).
Hi, all, I met a problem, when I use nested regular expression to compute per path stats, it didn't work, for example:
'/user/hive/warehouse/.*/.*'
, but /user/hive/warehouse/ods_ds_test_bdpms.db/.*
is ok, how can do for it? thanks.
# Path where HDFS NameNode stores the fsimage files
# See https://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml#dfs.namenode.name.dir
fsImagePath : '/fsimage-location'
# Skip file size distribution for group based stats
skipFileDistributionForGroupStats : false
# Skip file size distribution for user based stats
# Good for figuring out who has too many small files.
skipFileDistributionForUserStats : false
# Compute per path stats
# Supports regex matching for direct child directories
paths:
- '/tmp'
- '/datalake/a.*'
- '/user/m.*'
- '/user/hive/warehouse/.*/.*'
# - '/user/hive/warehouse/ods_ds_test_bdpms.db/.*'
# Skip file size distribution for path based stats
skipFileDistributionForPathStats : false
# Path sets are grouped paths by an identifier
pathSets:
'userMmAndFooAndAsset1' : [
'/datalake/asset3',
'/user/mm',
'/user/foo'
]
'datalakeAsset1and2' : [
'/datalake/asset1',
'/datalake/asset2'
]
# Skip file size distribution for path sets based stats
skipFileDistributionForPathSetStats : false
# Configure file size distribution buckets, supporting IEC units of KiB, MiB, GiB, TiB, PiB
fileSizeDistributionBuckets: ['0','1MiB', '32MiB', '64MiB', '128MiB', '1GiB', '10GiB']
Current jetty is fairly old, so upgrade to latest 8.1.x .
Upgrade dependency from 8.1.7.v20120910 to 8.1.22.v20160922.
Note: In a micro benchmark, 9.x is slower but more GC friendly...?
I am trying to run the jar file and not getting any logs:
java -dsa -da -XX:+UseG1GC -Xmx2048m -Dlog.level=DEBUG -jar fsimage-exporter-1.4.jar localhost 9709 fsimage-conf.yml
Nothing is being logged and no process is reporting open on 9709. Any pointers?
Previous version was openjdk:11-jre-slim-buster, which resolved for version export 1.4 to OpenJDK build 11.0.8+10
Parse FSImage in background for fast scrapes.
Avoids blocking scrapes for long time.
For a given list of configurable directory paths, export
Useful for hierarchical datalake, where assets have mixed user/group ownership.
Support stats for single and group of dir paths.
Update from openjdk:8u171-jdk-alpine3.8 to latest openjdk:8u181-jdk-alpine3.8
https://github.com/marcelmay/hfsa/releases/tag/release-1.3.3
Brings support for Hadoop 3.3.1 FSIMAGE version.
Switch from openjdk:8u212-jre-alpine3.9 to openjdk:11-jre-slim-buster .
OpenJDK 11 performs better with the G1 default garbage collector than OpenJDK 8.
TODO: Benchmark results on GiB-sized FSImage
Export the fsimage file size metric so that we see the growth
run script:
java -dsa -da -XX:+UseG1GC -Xmx1024m
-jar fsimage-exporter-1.2-new.jar
localhost 9092 fsimage.yml
problem:
0 [main] INFO de.m3y.hadoop.hdfs.hfsa.core.FSImageLoader - Loaded 2 strings [7ms]
3 [main] INFO de.m3y.hadoop.hdfs.hfsa.core.FSImageLoader - Loaded 1 inodes [2ms]
3 [main] INFO de.m3y.hadoop.hdfs.hfsa.core.FSImageLoader - Sorted 1 inodes [0ms]
7 [main] INFO de.m3y.prometheus.exporter.fsimage.FsImageWatcher - Loaded /home/zwx602706/program/hadoop/tmp/dfs/name/current/fsimage_0000000000000006624 with 0.0MiB in 75ms
50 [main] ERROR de.m3y.prometheus.exporter.fsimage.FsImageWatcher - Can not preload FSImage
java.lang.NullPointerException
at de.m3y.hadoop.hdfs.hfsa.core.FSImageLoader.visitParallel(FSImageLoader.java:325)
at de.m3y.hadoop.hdfs.hfsa.core.FSImageLoader.visitParallel(FSImageLoader.java:311)
at de.m3y.prometheus.exporter.fsimage.FsImageReporter.computeStatsReport(FsImageReporter.java:303)
at de.m3y.prometheus.exporter.fsimage.FsImageWatcher.run(FsImageWatcher.java:99)
at de.m3y.prometheus.exporter.fsimage.FsImageWatcher.getFsImageReport(FsImageWatcher.java:114)
at de.m3y.prometheus.exporter.fsimage.FsImageCollector.collect(FsImageCollector.java:206)
at io.prometheus.client.CollectorRegistry.collectorNames(CollectorRegistry.java:100)
at io.prometheus.client.CollectorRegistry.register(CollectorRegistry.java:50)
at io.prometheus.client.Collector.register(Collector.java:139)
at io.prometheus.client.Collector.register(Collector.java:132)
at de.m3y.prometheus.exporter.fsimage.WebServer.configure(WebServer.java:20)
at de.m3y.prometheus.exporter.fsimage.WebServer.main(WebServer.java:61)
52 [main] ERROR de.m3y.prometheus.exporter.fsimage.FsImageCollector - FSImage scrape failed
java.lang.NullPointerException
at de.m3y.prometheus.exporter.fsimage.FsImageWatcher.run(FsImageWatcher.java:103)
at de.m3y.prometheus.exporter.fsimage.FsImageWatcher.getFsImageReport(FsImageWatcher.java:114)
at de.m3y.prometheus.exporter.fsimage.FsImageCollector.collect(FsImageCollector.java:206)
at io.prometheus.client.CollectorRegistry.collectorNames(CollectorRegistry.java:100)
at io.prometheus.client.CollectorRegistry.register(CollectorRegistry.java:50)
at io.prometheus.client.Collector.register(Collector.java:139)
at io.prometheus.client.Collector.register(Collector.java:132)
at de.m3y.prometheus.exporter.fsimage.WebServer.configure(WebServer.java:20)
at de.m3y.prometheus.exporter.fsimage.WebServer.main(WebServer.java:61)
114 [main] INFO org.eclipse.jetty.server.Server - jetty-8.y.z-SNAPSHOT
164 [main] INFO org.eclipse.jetty.server.AbstractConnector - Started SelectChannelConnector@localhost:9092
Report file replication as Summary (count, sum)
to detect replication (mis)use.
@marcelmay
We are trying to run the exporter for our Hadoop v3.2.1 fsimage.
We are getting this error:
2020-09-07 01:34:04,110 [pool-1-thread-1] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2020-09-07 01:34:04,113 [pool-1-thread-1] ERROR de.m3y.prometheus.exporter.fsimage.FsImageUpdateHandler - Can not load FSImage
java.lang.NullPointerException
at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:124)
at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:180)
at org.apache.hadoop.hdfs.server.namenode.FSImageCompression.createCompression(FSImageCompression.java:100)
at org.apache.hadoop.hdfs.server.namenode.FSImageUtil.wrapInputStreamForCompression(FSImageUtil.java:88)
at de.m3y.hadoop.hdfs.hfsa.core.FSImageLoader.loadSection(FSImageLoader.java:156)
at de.m3y.hadoop.hdfs.hfsa.core.FSImageLoader.load(FSImageLoader.java:186)
at de.m3y.prometheus.exporter.fsimage.FsImageUpdateHandler.loadFsImage(FsImageUpdateHandler.java:212)
at de.m3y.prometheus.exporter.fsimage.FsImageUpdateHandler.onFsImageChange(FsImageUpdateHandler.java:191)
at de.m3y.prometheus.exporter.fsimage.FsImageWatcher.run(FsImageWatcher.java:63)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
We have tried for a lower version where this is working fine, but we need to use version 3.2.1 in any case.
How can we overcome this issue?
Create a docker image for exporter:
Update docker jdk base image from openjdk:8u151-jre-alpine3.7 to latest openjdk:8u212-jre-alpine3.9
i want to run hbase on aarch64 env. but, i am lack of the image -- hadoop-hdfs-fsimage-exporter. can u support me the image file. thanks very much.
When trying this against Hadoop 3.1.2, I get
2019-03-20 14:28:12,098 [main] ERROR de.m3y.prometheus.exporter.fsimage.FsImageWatcher - Can not preload FSImage
java.io.IOException: Unsupported layout version -64
de.m3y.hadoop.hdfs.hfsa:hfsa-lib ... 1.3.0 -> 1.3.1
For standard JVM memory metrics, instead of custom JVM memory metrics.
Configuring non-existent paths in path stats results in FileNotFoundException or NoSuchElementException, which should be ignored as it simplifies generic configurations across multiple clusters.
Pin version of docker base image used, for reproducible builds.
Previous 1.1 release contained JRE 8u131.
Latest base image is now 8u151-jre-alpine3.7.
Include the HDFS storage policy for collected metrics.
For consistency (eg node exporter).
Hi @marcelmay,
Is there any reason i'm missing not to add and use a new default port for this exporter?
Thanks!
Extend current file size metrics by additionally reporting the effective file size (file size * replication factor).
docker run -i -t -p 9709:9709 -v /data/dfs/name/current:/fsimage-location -e "JAVA_OPTS=-server -XX:+UseG1GC -Xmx1024m" marcelmay/hadoop-hdfs-fsimage-exporter
2021-08-16 09:51:14,575 [main] INFO de.m3y.prometheus.exporter.fsimage.WebServer - FSImage exporter started and listening on http://0.0.0.0:9709
2021-08-16 09:51:14,615 [pool-3-thread-1] ERROR de.m3y.prometheus.exporter.fsimage.FsImageUpdateHandler - Can not load FSImage /fsimage-location/fsimage_0000000000004734441
java.io.IOException: Unsupported layout version -66
at org.apache.hadoop.hdfs.server.namenode.FSImageUtil.loadSummary(FSImageUtil.java:78)
at de.m3y.hadoop.hdfs.hfsa.core.FsImageLoader.load(FsImageLoader.java:286)
at de.m3y.prometheus.exporter.fsimage.FsImageUpdateHandler.loadFsImage(FsImageUpdateHandler.java:207)
at de.m3y.prometheus.exporter.fsimage.FsImageUpdateHandler.onFsImageChange(FsImageUpdateHandler.java:185)
at de.m3y.prometheus.exporter.fsimage.FsImageWatcher.run(FsImageWatcher.java:64)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.base/java.util.concurrent.FutureTask.runAndReset(Unknown Source)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
This is a suggestion for a nice to have. Consider the below conf file:
fsImagePath : $FSIMAGE_PATH
skipFileDistributionForGroupStats : true
skipFileDistributionForUserStats : false
paths:
- '/data/*'
skipFileDistributionForPathStats : true
skipFileDistributionForPathSetStats : true
I would like metrics displayed for all the subdirectories under data. This way I can create a Prometheus query that automatically picks up new paths added and I don't have to add them to the conf file. Currently an error is thrown saying the path could not be found. Please let me know your thoughts.
Info
type instead of custom BuildInfoExporter collectorfsimage_exporter_build_info
does not contain appName
label anymoreSupport configuring the log level (INFO as default, WARN and DEBUG) when starting the exporter to simplifying analysing issues in configuration.
jmx_exporter recently switched to reduce footprint.
Currently the file size histogram buckets are hardcoded.
Make the buckets configurable, for more flexibility.
Test dependencies:
Please help in resolving below issue. I am hitting below error while maven build
[ERROR] Failed to execute goal de.m3y.maven:inject-maven-plugin:1.1:inject (default) on project fsimage-exporter: Value is null for injection Injection{value='null', pointCut='de.m3y.prometheus.exporter.fsimage.BuildInfoExporter.getBuildScmVersion', pointCuts=null} -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal de.m3y.maven:inject-maven-plugin:1.1:inject (default) on project fsimage-exporter: Value is null for injection Injection{value='null', pointCut='de.m3y.prometheus.exporter.fsimage.BuildInfoExporter.getBuildScmVersion', pointCuts=null}
Caused by: org.apache.maven.plugin.MojoFailureException: Value is null for injection Injection{value='null', pointCut='de.m3y.prometheus.exporter.fsimage.BuildInfoExporter.getBuildScmVersion', pointCuts=null}
Compile scope:
Test scope:
@marcelmay I have been trying to setup exporter on separate machine than namenode. The new fsimage is getting pushed every 30 mins on the machine where exporter is running from namenode. Still the new metrics are not getting updated. Could you please point me in the right direction why it's not working ?
The exporter should always skip previously parsed FSImage files, as there is no benefit for re-parsing.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.