Comments (11)
Sorry, this was a bug on my part.
It was retrieving the list of files from the watch folder, processing them all, then retrieving the list of all files in the folder again to move. If files were added once the initial processing had started they would not be processed, but still moved to the completed folder.
This should be fixed now.
from mailmerger.
from mailmerger.
My original intention was the process all the files in the directory (at the time the task started) and then move them all once the processing was completed. This was to avoid Gradle triggering the task again and again when it detected each file had been moved.
I've just looked at this in more detail and Gradle is clever enough not to attempt to run the task agin while the initial execution is still in progress.
I also discovered another problem which I'll push a fix for in a moment. When spawning the java process to run mailmerger-cmdline it wasn't waiting for this process to finish before attempting to move the files which was causing problems.
from mailmerger.
Here is the situation I have in mind.
Step 1 - I push a CSV file into the poll folder and it is picked up and processed fine.
Step 2 - Concurrently I drop 2 CSV files in the poll folder ==> everything is processed fine.
Step 3 - Concurrently I drop 200 small CSV files in the poll folder ==> can I rest assured all 200 CSV files will get processed and no CSV file will "get lost" - I'm fine if it will take more time to process the 200 csv files but how can I be sure all 200 will be processed and nothing lost?
Which are the mechanisms to protect against this situation?
from mailmerger.
With the latest changes this should not happen.
The file is moved from the watching directory to the complete directory after is has been processed.
The list of files processed by a single invocation of the task is obtained only once. Any additional files placed into the watching directory while the current task invocation is in progress will be ignored by that invocation. Gradle will then realise the files in the watching directory have changed and trigger the task again.
Once all files have been processed and moved and no more files have been placed in the directory, Gradle will trigger the task one more time (as it reacts to the files being moved from the watching directory), the task will find no files to process this time and do nothing.
Gradle will then watching the directory for any further changes.
from mailmerger.
Gradle should have a threshold with what "new" files means. If the current task invocation takes 30 minutes to complete (very possible since is generating PDF files which could take lots of time to render if many CSV rows are processed); what happens when additional CSV files are dropped in the middle of the 30 minutes when gradle is busy processing reports ==> do we have problems with this situation?
from mailmerger.
I don't think so.
I've just run a test which generates 100 files to be processed. e.g.
for n in {001..100}; do
echo 'value1,value2,value3,value4' > test-$n.csv
echo ${n},aa,bb,cc >> test-$n.csv
done
I added some additional logging to the Gradle script to show how many files it was processing. When it was part way through the 100 files, I added another 50 to the directory. These files were all processed fine. The results were (I've removed some lines from the middle to save space)....
Waiting for changes to input files of tasks... (ctrl-d to exit)
new file: /Users/paul/jambray/upwork/client/mailmerger/test/watch/test-055.csv
new file: /Users/paul/jambray/upwork/client/mailmerger/test/watch/test-041.csv
new file: /Users/paul/jambray/upwork/client/mailmerger/test/watch/test-069.csv
and some more changes
Change detected, executing build...
> Task :mailMerger
Starting to process 100 files
Processed 1 of 100 at Thu May 03 20:29:23 BST 2018
Processed 2 of 100 at Thu May 03 20:29:35 BST 2018
Processed 3 of 100 at Thu May 03 20:29:47 BST 2018
Processed 4 of 100 at Thu May 03 20:30:02 BST 2018
Processed 5 of 100 at Thu May 03 20:30:15 BST 2018
...
...
Processed 97 of 100 at Thu May 03 20:49:35 BST 2018
Processed 98 of 100 at Thu May 03 20:49:48 BST 2018
Processed 99 of 100 at Thu May 03 20:50:00 BST 2018
Processed 100 of 100 at Thu May 03 20:50:13 BST 2018
BUILD SUCCESSFUL in 21m 2s
1 actionable task: 1 executed
Waiting for changes to input files of tasks... (ctrl-d to exit)
deleted: /Users/paul/jambray/upwork/client/mailmerger/test/watch/test-055.csv
deleted: /Users/paul/jambray/upwork/client/mailmerger/test/watch/test-041.csv
deleted: /Users/paul/jambray/upwork/client/mailmerger/test/watch/test-069.csv
and some more changes
Change detected, executing build...
> Task :mailMerger
Starting to process 50 files
Processed 1 of 50 at Thu May 03 20:50:26 BST 2018
Processed 2 of 50 at Thu May 03 20:50:38 BST 2018
Processed 3 of 50 at Thu May 03 20:50:51 BST 2018
Processed 4 of 50 at Thu May 03 20:51:03 BST 2018
...
Gradle is using the standard WatchService under the covers, which is what the incompatible gradle-watch-plugin was using and what I was planning to use if this solution were implemented without Gradle.
See https://docs.oracle.com/javase/7/docs/api/java/nio/file/WatchKey.html and https://docs.oracle.com/javase/7/docs/api/java/nio/file/WatchService.html for details of how events are detected and queued.
from mailmerger.
In answer to you question
what happens when additional CSV files are dropped in the middle of the 30 minutes when gradle is busy processing reports
The WatchService
will detect this and queue events until the WatchKey
has been reset and ready to be signalled again. This means no events should be dropped or lost.
from mailmerger.
Thank you. This is a good answer.
Do you think that, instead of the current command line execution from gradle to call mail merge, if instead of that gradle will only queue JSM messages which will be consumed asyncronously to render the PDF files by mailmerger API ==> do you think this could add any further confidence that 101% will be processed and nothing will be lost. Or do you think that we already have 101% confidence and JMS will only add un-necessary bloat?
P.S - as I said previously I saw other similar implementations based on folder / files polling which seemed to work in 98% of cases only to "loose" some input files semi-randomly when the input concurrency was higher,
from mailmerger.
I don't think introducing JMS will give any more confidence regarding files being missed. I'm confident that nothing should be missed, but hard to prove 100% without extensive testing.
from mailmerger.
OK
from mailmerger.
Related Issues (15)
- remove org.springframework.boot dependency from mailmerger-cmdline
- errors while importing gradle project into eclipse HOT 2
- gradle dependency tree HOT 2
- replace ${Root.value1} with ${value1}
- All junit tests should assert and should validate whatever need to assert / validate for making sure the functionality works fine
- Can I use @grab inside .gradle scripts? HOT 4
- Is it possible to write gradle scripts without the buildScript header? HOT 6
- I'll provide some details of the logging shortly. HOT 1
- Why I get stacktrace inside info.log? HOT 1
- Why this error is not reported in errors.log? HOT 1
- How to run cmdline? HOT 1
- Sample of run configuration to execute server from Eclipse? HOT 3
- gradlew.bat build install fails HOT 5
- Cannot execute server 2 HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mailmerger.