Comments (19)
@idoyehe looking at the plot, did you create it with the python notebook?
from lithops.
@JosepSampe yes, just took a screenshot for the graph so it looks a little bit blur...
from lithops.
Are you running your app code from a python notebook or from a .py file?
I say this because if you use a .py file, I integrated into PyWren a method that generates the plots. So you can avoid some code and simply use this method. Read this: #50
Also, it is preferable to upgrade PyWren to last commit, as I slightly changed the way to retrieve results.
from lithops.
@idoyehe so, when you have everything updated, can you paste here the log of the reduce function when result is 5 MB?
from lithops.
Are you running your app code from a python notebook or from a .py file?
I say this because if you use a .py file, I integrated into PyWren a method that generates the plots. So you can avoid some code and simply use this method. Read this: #50
Also, it is preferable to upgrade PyWren to last commit, as I slightly changed the way to retrieve results.
amazing just tried it it works great!
from lithops.
@JosepSampe
When returning an empty dictionary:
returned empty dict.txt
When returning the 5MB list (inside dictionary):
returned list indside dict.txt
just want to explain that this job use partioner on a file in COS so the log contains all invocations, for the reducer please look the last one....
from lithops.
@idoyehe These logs are from partitioner. I need logs from reducer ;)
Activation ID: 2921787e26814dcea1787e2681edceee and 9ccdc448d766425f8dc448d766925f16
from lithops.
returned list indside dict.txt
from lithops.
Ok... as I thought, this issue is related to cloudpickle. It takes 1 minute to pickle 5MB of data... that it is converted to 36MB of pickled information.
I was already aware that cloudpickle takes long to pickle MB information. I think @gilv already did tests on this.
As I don't have the code of your app and the related data in COS, you can try to modify (locally in your laptop) the code of PyWren, just one line, and test if there is some improvement or not.
Just comment this line: https://github.com/pywren/pywren-ibm-cloud/blob/master/pywren/pywren_ibm_cloud/action/jobrunner.py#L28
and add: import pickle
from lithops.
@idoyehe thanks for reporting this. @JosepSampe may be we move to pickle instead of cloudpickle?
from lithops.
Ok... as I thought, this issue is related to cloudpickle. It takes 1 minute to pickle 5MB of data... that it is converted to 36MB of pickled information.
I was already aware that cloudpickle takes long to pickle MB information. I think @gilv already did tests on this.
As I don't have the code of your app and the related data in COS, you can try to modify (locally in your laptop) the code of PyWren, just one line, and test if there is some improvement or not.
Just comment this line: https://github.com/pywren/pywren-ibm-cloud/blob/master/pywren/pywren_ibm_cloud/action/jobrunner.py#L28
and add:
import pickle
@JosepSampe I did what you suggested change the line, reinstall PyWren, and redeploy again...
but I could see improvement: the runtime is ~47 sec when return the list and ~16 sec when returning an empty dict....
here the log file of the reducer:
returned_list_inside_dict.txt
by the way there is no other packages that you can use to that ? because pickle is not very good recommended from what I search online...
from lithops.
@idoyehe Seems now pickle only takes 7 seconds (Instead of 1 minute from cloudpickle).I changed pickle package everywhere in the project, so when @gilv merge #52, try againg becasue you can get major improvement.
Also, in the return statement of your reduce function, you don't return futures, right?
from lithops.
@idoyehe Seems now pickle only takes 7 seconds (Instead of 1 minute from cloudpickle).I changed pickle package everywhere in the project, so when @gilv merge #52, try againg becasue you can get major improvement.
Also, in the return statement of your reduce function, you don't return futures, right?
@JosepSampe
Tooks me arround 44 sec without returning run_statuses and invoke_statuses and arround 50 sec with retruning them
from lithops.
@idoyehe ok
Just wanted to understand why if you return 5MB of data, output.pickle file takes 56.9MB... You can see this at the end of the log.
from lithops.
@JosepSampe
if I execute this process locally on my computer and export the retuned list to a text file the text file size is arround 24 MB and the size of the python object is 4.8 MB (calculating it using sys.getsizeof(result_list))
Then I executed the code using pywren just returning my list (without run_statuses and invoke_statuses) and after reviewing the log of the reducer the output.pickle size is 52MB
Can we say the pickle overheads is 28 MB ? sounds like too much?
from lithops.
In my experience, don't take sys.getsizeof
into account as it does not return correct size number when using over dicts or lists, or complex objects.
Yep, seems the overhead of pickle is 28MB
from lithops.
@JosepSampe So it is more than 100% overhead is sounds way too much...
There is no other methods to preform that job?
from lithops.
@idoyehe What are you pickling? becasue as stated here: https://stackoverflow.com/questions/31304006/why-is-there-a-large-overhead-in-pickling-numpy-arrays , pickling is just a binary copying. So if you store the list in a text file, it only stores the numbers, however, when pickling, it stores the complet object.
from lithops.
@JosepSampe
You right i found a way to make my object much more simple without array and the overhead reduced significantly and the runtime decrease also!
Thank you!
from lithops.
Related Issues (20)
- Bump a new version HOT 2
- [Bug] AWS Lambda calls fail after consecutive call
- AWS Lambda invoker's performance depends on the Python interpreter HOT 10
- [GCP Storage] 429 Too Many Requests in bucket creation during map stage
- [AWS EC2] New charge for public IPv4 addresses starting February 1, 2024 HOT 1
- VM setup without root privileges. HOT 1
- Problem in combination IBM CE and Lithops HOT 7
- AWS S3 PutObject limitation HOT 4
- Problem in combination AWS Lambda and Lithops HOT 10
- Restart failed futures only HOT 2
- FileNotFoundError: func.pickle not found in customized_runtime for AWS Lambda HOT 10
- StandaloneExecutor reuse problem HOT 4
- Lithops from main branch not working anymore. HOT 9
- [Enhancement] Allow runtime_cpu to be passed as an argument for the FunctionExecutor for k8s HOT 2
- Race condition in storage monitor after re-running task HOT 5
- Add a mechanism to automatically retry failed tasks HOT 1
- Can't delete aws_lambda images HOT 1
- Make delay between call status check configurable HOT 2
- Why do AWS credentials have to be hard coded into config files? HOT 4
- Custom runtime failed to deploy on aws lambda HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lithops.