Coder Social home page Coder Social logo

Comments (13)

CodingCat avatar CodingCat commented on August 17, 2024

sorry, I didn't understand the question here, you do not want to save to file in a shutdown hook , but why you have

sys.ShutdownHookThread {
println("Gracefully stopping Spark Streaming Application")
println("*****************************************************########################################################################################################################")
//val conf = new SparkConf().setAppName("Merging Files")
// val sc= new SparkContext(conf)
// sc.textFile("/EventCheckpoint_0.1/ParquetEvent/part*").coalesce(1).saveAsTextFile("/EventCheckpoint_0.1/SingleFile")
// sc.stop()
streamingContext.stop(true, true)
println("Application stopped")
}

from azure-event-hubs-spark.

ankushreddy avatar ankushreddy commented on August 17, 2024

@CodingCat Hi sorry for the confusion I have modified the code I mean removed all the comments.
If we look at println("COUNT" +dataString.count()) this is not getting executed until I press ctrl+c.

Thank You for your help.

from azure-event-hubs-spark.

CodingCat avatar CodingCat commented on August 17, 2024

how you detect this line is not executed? can you post a screenshot of your streaming UI?

from azure-event-hubs-spark.

CodingCat avatar CodingCat commented on August 17, 2024

and we also have other questions

  • how you start the application

  • how many resources you allocate to the application

@arijitt

from azure-event-hubs-spark.

ankushreddy avatar ankushreddy commented on August 17, 2024

Am printing a dummy line just to debug.
I started the application in local. yarn client and cluster mode.
in yarn client and cluster mode I have used 2 executors and 5 executor cores. driver memory as 2GB and executor memory as 2G.
As this is a streaming application.
Am expecting it should write the data in the batches for every 10 seconds.

from azure-event-hubs-spark.

CodingCat avatar CodingCat commented on August 17, 2024

wait...you have 10 partitions in eventhubs, and you only give 2 executors and 5 executor cores, that means you have 10 cores in total

because you are using receiver-based connection, these 10 cores would be used to run 10 receivers and no resources left for your data processing tasks. Once you killed the application, those receivers are killed and some pending data processing tasks are started (but quickly killed since the whole app is down)

Are you using Spark 2.0+? if yes, you can try direct dstream integration of Spark and Event Hubs which is much faster and resource-efficient

from azure-event-hubs-spark.

ankushreddy avatar ankushreddy commented on August 17, 2024

Hi @CodingCat when ever there is a single record in the stream it is taking forever to finish the process. and rest of the streaming process jobs are accumulating in queue.

Please find the screenshot and let me know if am missing anything.

image

Thanks You,
Ankush Reddy.

from azure-event-hubs-spark.

CodingCat avatar CodingCat commented on August 17, 2024

as I said, it is due to the fact that there is no resources left for your processing....you can either increase the number of cores or use direct dstream

from azure-event-hubs-spark.

CodingCat avatar CodingCat commented on August 17, 2024

update on this? otherwise I will close it

from azure-event-hubs-spark.

ankushreddy avatar ankushreddy commented on August 17, 2024

I am using createDirectStream.

Could you please let me know if we have any documentation on the differences between union stream and direct stream.

Thanks,
Ankush.

from azure-event-hubs-spark.

CodingCat avatar CodingCat commented on August 17, 2024

We are uploading docs but still waiting for approval...basically it is the same high level idea of the diff beween kafka receiver based and direct dstream based integration

from azure-event-hubs-spark.

CodingCat avatar CodingCat commented on August 17, 2024

update on this? otherwise, I will close it

from azure-event-hubs-spark.

ankushreddy avatar ankushreddy commented on August 17, 2024

@CodingCat Thank You for your inputs Am closing this.

from azure-event-hubs-spark.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.