Coder Social home page Coder Social logo

Comments (11)

pabman avatar pabman commented on June 28, 2024

Sorry, this was a bug on my part.

It was retrieving the list of files from the watch folder, processing them all, then retrieving the list of all files in the folder again to move. If files were added once the initial processing had started they would not be processed, but still moved to the completed folder.

This should be fixed now.

from mailmerger.

distributev avatar distributev commented on June 28, 2024

image

from mailmerger.

pabman avatar pabman commented on June 28, 2024

My original intention was the process all the files in the directory (at the time the task started) and then move them all once the processing was completed. This was to avoid Gradle triggering the task again and again when it detected each file had been moved.

I've just looked at this in more detail and Gradle is clever enough not to attempt to run the task agin while the initial execution is still in progress.

I also discovered another problem which I'll push a fix for in a moment. When spawning the java process to run mailmerger-cmdline it wasn't waiting for this process to finish before attempting to move the files which was causing problems.

from mailmerger.

distributev avatar distributev commented on June 28, 2024

Here is the situation I have in mind.

Step 1 - I push a CSV file into the poll folder and it is picked up and processed fine.
Step 2 - Concurrently I drop 2 CSV files in the poll folder ==> everything is processed fine.
Step 3 - Concurrently I drop 200 small CSV files in the poll folder ==> can I rest assured all 200 CSV files will get processed and no CSV file will "get lost" - I'm fine if it will take more time to process the 200 csv files but how can I be sure all 200 will be processed and nothing lost?

Which are the mechanisms to protect against this situation?

from mailmerger.

pabman avatar pabman commented on June 28, 2024

With the latest changes this should not happen.

The file is moved from the watching directory to the complete directory after is has been processed.

The list of files processed by a single invocation of the task is obtained only once. Any additional files placed into the watching directory while the current task invocation is in progress will be ignored by that invocation. Gradle will then realise the files in the watching directory have changed and trigger the task again.

Once all files have been processed and moved and no more files have been placed in the directory, Gradle will trigger the task one more time (as it reacts to the files being moved from the watching directory), the task will find no files to process this time and do nothing.

Gradle will then watching the directory for any further changes.

from mailmerger.

distributev avatar distributev commented on June 28, 2024

Gradle should have a threshold with what "new" files means. If the current task invocation takes 30 minutes to complete (very possible since is generating PDF files which could take lots of time to render if many CSV rows are processed); what happens when additional CSV files are dropped in the middle of the 30 minutes when gradle is busy processing reports ==> do we have problems with this situation?

from mailmerger.

pabman avatar pabman commented on June 28, 2024

I don't think so.

I've just run a test which generates 100 files to be processed. e.g.

for n in {001..100}; do
    echo 'value1,value2,value3,value4' > test-$n.csv
    echo ${n},aa,bb,cc >> test-$n.csv
done

I added some additional logging to the Gradle script to show how many files it was processing. When it was part way through the 100 files, I added another 50 to the directory. These files were all processed fine. The results were (I've removed some lines from the middle to save space)....

Waiting for changes to input files of tasks... (ctrl-d to exit)
new file: /Users/paul/jambray/upwork/client/mailmerger/test/watch/test-055.csv
new file: /Users/paul/jambray/upwork/client/mailmerger/test/watch/test-041.csv
new file: /Users/paul/jambray/upwork/client/mailmerger/test/watch/test-069.csv
and some more changes
Change detected, executing build...


> Task :mailMerger
Starting to process 100 files
Processed 1 of 100 at Thu May 03 20:29:23 BST 2018
Processed 2 of 100 at Thu May 03 20:29:35 BST 2018
Processed 3 of 100 at Thu May 03 20:29:47 BST 2018
Processed 4 of 100 at Thu May 03 20:30:02 BST 2018
Processed 5 of 100 at Thu May 03 20:30:15 BST 2018
...
...
Processed 97 of 100 at Thu May 03 20:49:35 BST 2018
Processed 98 of 100 at Thu May 03 20:49:48 BST 2018
Processed 99 of 100 at Thu May 03 20:50:00 BST 2018
Processed 100 of 100 at Thu May 03 20:50:13 BST 2018

BUILD SUCCESSFUL in 21m 2s
1 actionable task: 1 executed

Waiting for changes to input files of tasks... (ctrl-d to exit)
deleted: /Users/paul/jambray/upwork/client/mailmerger/test/watch/test-055.csv
deleted: /Users/paul/jambray/upwork/client/mailmerger/test/watch/test-041.csv
deleted: /Users/paul/jambray/upwork/client/mailmerger/test/watch/test-069.csv
and some more changes
Change detected, executing build...


> Task :mailMerger
Starting to process 50 files
Processed 1 of 50 at Thu May 03 20:50:26 BST 2018
Processed 2 of 50 at Thu May 03 20:50:38 BST 2018
Processed 3 of 50 at Thu May 03 20:50:51 BST 2018
Processed 4 of 50 at Thu May 03 20:51:03 BST 2018
...

Gradle is using the standard WatchService under the covers, which is what the incompatible gradle-watch-plugin was using and what I was planning to use if this solution were implemented without Gradle.

See https://docs.oracle.com/javase/7/docs/api/java/nio/file/WatchKey.html and https://docs.oracle.com/javase/7/docs/api/java/nio/file/WatchService.html for details of how events are detected and queued.

from mailmerger.

pabman avatar pabman commented on June 28, 2024

In answer to you question

what happens when additional CSV files are dropped in the middle of the 30 minutes when gradle is busy processing reports

The WatchService will detect this and queue events until the WatchKey has been reset and ready to be signalled again. This means no events should be dropped or lost.

from mailmerger.

distributev avatar distributev commented on June 28, 2024

Thank you. This is a good answer.

Do you think that, instead of the current command line execution from gradle to call mail merge, if instead of that gradle will only queue JSM messages which will be consumed asyncronously to render the PDF files by mailmerger API ==> do you think this could add any further confidence that 101% will be processed and nothing will be lost. Or do you think that we already have 101% confidence and JMS will only add un-necessary bloat?

P.S - as I said previously I saw other similar implementations based on folder / files polling which seemed to work in 98% of cases only to "loose" some input files semi-randomly when the input concurrency was higher,

from mailmerger.

pabman avatar pabman commented on June 28, 2024

I don't think introducing JMS will give any more confidence regarding files being missed. I'm confident that nothing should be missed, but hard to prove 100% without extensive testing.

from mailmerger.

distributev avatar distributev commented on June 28, 2024

OK

from mailmerger.

Related Issues (15)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.