Coder Social home page Coder Social logo

lucacanali / miscellaneous Goto Github PK

View Code? Open in Web Editor NEW
401.0 25.0 143.0 32.94 MB

Includes notes on using Apache Spark in general, notes on using Spark for Physics, how to run TPCDS on PySpark, how to create histograms with Spark, tools for performance testing CPUs, Jupyter notebooks examples for Spark, examples for Oracle and other DB systems.

License: Apache License 2.0

PLSQL 0.06% Jupyter Notebook 99.39% Python 0.34% Scala 0.08% Rust 0.04% Shell 0.01% HTML 0.10% Dockerfile 0.01% Roff 0.01%
apache-spark database jupyter-notebooks performance-analysis performance-monitoring performance-testing

miscellaneous's Introduction

Miscellaneous projects and scripts.

Author and contact: [email protected]

Spark and Performance Engineering

Folder Description
Spark Dashboard A tool for Apache monitoring, use to build a performance dashboard and troubleshoot Spark jobs.
Spark Notes Miscellaneous tips and code snippets about Apache Spark.
Spark for Physics Examples, with code and data of how Apache Spark can be used in the domain of High Energy Physics data analysis.
Performance Testing Code and examples, includes:
- A tool to run TPCDS at scale with PySpark and collect execution metrics
- Tools for load-testing CPUs in writetn Python and Rust
- Notes on how to use tooling for performace measurements

Data Engineering and Data Science

Folder Description
Deep Learning Notes Notes and examples on Deep Learning tools and related data pipelines.
Pyspark_SQL_Magic_Jupyter How to write Jupyter SQL magic functions for PySpark and Spark SQL.
Trino and Presto on Jupyter Example of using Trino or Presto on a Jupyter notebook.
PostgreSQL and YugabyteDB on Jupyter Example of using PostgreSQL or YugabyteDB on a Jupyter notebook.
Oracle_Jupyter Examples of how to query Oracle using Jupyter/IPython notebooks.
Impala_SQL_Jupyter Examples of how to run SQL on Apache Impala using Jupyter/IPython notebooks.
SQL_color_Mandelbrot How to use SQL to compute and display the Mandelbrot set with colors. Examples for Oracle and PostgreSQL.
PLSQL_Neural_Network An example of how to deploy a DL serving engine for Oracle using PL/SQL.

miscellaneous's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

miscellaneous's Issues

Scaling questions

Hey Luca!

Thanks again for your spark dashboarding work. It gave me a great leg up on implementing our own metrics solution.

One thing I'm noticing though is the spark metrics being per app-id have really high cardinality and our metrics receiver (prometheus & victoria metrics) seems to be struggling as as the number of series grows (seeing up to 30MM series per cluster in some cases).

Have you seen anything like this on your installation? Does influx maybe just handle it better?

worth adding

Hi, I was coming across your repository and I thought it maybe worth adding the another method of acquiring flame graph in k8s environment for any java application and many more.

The easiest is to setup account on profiler.granulate then just download the ready to use template of yaml, ready to be deployed on k8s. Special workers profiling the cluster and dump data to the web, with ready to use flamegraphs.

gprofiler-k8s-deploy-yaml

Regards,
Patryk.

Spark 3.2 support?

Hi @LucaCanali - hope this is an okay channel to reach out on. Love your spark dashboards! ๐Ÿงก

I was trying to set this up for work on AWS EMR and it seems like the metrics listed on https://spark.apache.org/docs/latest/monitoring.html#component-instance--executor don't get produced from applications on spark prior to 3.3

But I see 3.2 listed in the tags of apache/spark@1ffe03d

Do you happen to know if there's a way to get your dashboards working on 3.2? Perhaps the metrics.properties just needs to be different?

I'm specially curious about the active jobs and executor run time per process graphs.

These work fine for me on 3.3

Screen Shot 2022-11-23 at 7 49 30 PM

But if I set up an EMR running spark 3.2 to publish to the same influx, it's just blank.

Screen Shot 2022-11-23 at 7 51 13 PM

add jars to hbase server side

Add jars to hbase server side according to https://github.com/LucaCanali/Miscellaneous/blob/master/Spark_Notes/Spark_HBase_Connector.md. But it won't work for me. I get error as below. Please help me.

java.lang.NoSuchMethodError: org.apache.hadoop.hbase.spark.protobuf.generated.SparkFilterProtos$SQLPredicatePushDownFilter$Builder.addValueFromQueryArray(Lorg/apache/hbase/thirdparty/com/google/protobuf/ByteString;)Lorg/apache/hadoop/hbase/spark/protobuf/generated/SparkFilterProtos$SQLPredicatePushDownFilter$Builder;
	at org.apache.hadoop.hbase.spark.SparkSQLPushDownFilter.toByteArray(SparkSQLPushDownFilter.java:257)
	at org.apache.hadoop.hbase.spark.datasources.SerializedFilter$.$anonfun$toSerializedTypedFilter$1(HBaseTableScanRDD.scala:273)
	at scala.Option.map(Option.scala:230)
	at org.apache.hadoop.hbase.spark.datasources.SerializedFilter$.toSerializedTypedFilter(HBaseTableScanRDD.scala:273)
	at org.apache.hadoop.hbase.spark.datasources.HBaseTableScanRDD.$anonfun$getPartitions$2(HBaseTableScanRDD.scala:85)
	at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245)
	at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
	at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
	at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
	at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245)
	at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242)
	at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
	at org.apache.hadoop.hbase.spark.datasources.HBaseTableScanRDD.getPartitions(HBaseTableScanRDD.scala:77)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.