Coder Social home page Coder Social logo

Comments (19)

JosepSampe avatar JosepSampe commented on July 20, 2024

@idoyehe looking at the plot, did you create it with the python notebook?

from lithops.

idoyehe avatar idoyehe commented on July 20, 2024

@JosepSampe yes, just took a screenshot for the graph so it looks a little bit blur...

from lithops.

JosepSampe avatar JosepSampe commented on July 20, 2024

Are you running your app code from a python notebook or from a .py file?

I say this because if you use a .py file, I integrated into PyWren a method that generates the plots. So you can avoid some code and simply use this method. Read this: #50

Also, it is preferable to upgrade PyWren to last commit, as I slightly changed the way to retrieve results.

from lithops.

JosepSampe avatar JosepSampe commented on July 20, 2024

@idoyehe so, when you have everything updated, can you paste here the log of the reduce function when result is 5 MB?

from lithops.

idoyehe avatar idoyehe commented on July 20, 2024

Are you running your app code from a python notebook or from a .py file?

I say this because if you use a .py file, I integrated into PyWren a method that generates the plots. So you can avoid some code and simply use this method. Read this: #50

Also, it is preferable to upgrade PyWren to last commit, as I slightly changed the way to retrieve results.

amazing just tried it it works great!

from lithops.

idoyehe avatar idoyehe commented on July 20, 2024

@JosepSampe
When returning an empty dictionary:
returned empty dict.txt

When returning the 5MB list (inside dictionary):
returned list indside dict.txt

just want to explain that this job use partioner on a file in COS so the log contains all invocations, for the reducer please look the last one....

from lithops.

JosepSampe avatar JosepSampe commented on July 20, 2024

@idoyehe These logs are from partitioner. I need logs from reducer ;)
Activation ID: 2921787e26814dcea1787e2681edceee and 9ccdc448d766425f8dc448d766925f16

from lithops.

idoyehe avatar idoyehe commented on July 20, 2024

returned empty dict.txt

returned list indside dict.txt

from lithops.

JosepSampe avatar JosepSampe commented on July 20, 2024

Ok... as I thought, this issue is related to cloudpickle. It takes 1 minute to pickle 5MB of data... that it is converted to 36MB of pickled information.

I was already aware that cloudpickle takes long to pickle MB information. I think @gilv already did tests on this.

As I don't have the code of your app and the related data in COS, you can try to modify (locally in your laptop) the code of PyWren, just one line, and test if there is some improvement or not.

Just comment this line: https://github.com/pywren/pywren-ibm-cloud/blob/master/pywren/pywren_ibm_cloud/action/jobrunner.py#L28

and add: import pickle

from lithops.

gilv avatar gilv commented on July 20, 2024

@idoyehe thanks for reporting this. @JosepSampe may be we move to pickle instead of cloudpickle?

from lithops.

idoyehe avatar idoyehe commented on July 20, 2024

Ok... as I thought, this issue is related to cloudpickle. It takes 1 minute to pickle 5MB of data... that it is converted to 36MB of pickled information.

I was already aware that cloudpickle takes long to pickle MB information. I think @gilv already did tests on this.

As I don't have the code of your app and the related data in COS, you can try to modify (locally in your laptop) the code of PyWren, just one line, and test if there is some improvement or not.

Just comment this line: https://github.com/pywren/pywren-ibm-cloud/blob/master/pywren/pywren_ibm_cloud/action/jobrunner.py#L28

and add: import pickle

@JosepSampe I did what you suggested change the line, reinstall PyWren, and redeploy again...
but I could see improvement: the runtime is ~47 sec when return the list and ~16 sec when returning an empty dict....

here the log file of the reducer:
returned_list_inside_dict.txt

by the way there is no other packages that you can use to that ? because pickle is not very good recommended from what I search online...

from lithops.

JosepSampe avatar JosepSampe commented on July 20, 2024

@idoyehe Seems now pickle only takes 7 seconds (Instead of 1 minute from cloudpickle).I changed pickle package everywhere in the project, so when @gilv merge #52, try againg becasue you can get major improvement.

Also, in the return statement of your reduce function, you don't return futures, right?

from lithops.

idoyehe avatar idoyehe commented on July 20, 2024

@idoyehe Seems now pickle only takes 7 seconds (Instead of 1 minute from cloudpickle).I changed pickle package everywhere in the project, so when @gilv merge #52, try againg becasue you can get major improvement.

Also, in the return statement of your reduce function, you don't return futures, right?

@JosepSampe
Tooks me arround 44 sec without returning run_statuses and invoke_statuses and arround 50 sec with retruning them

from lithops.

JosepSampe avatar JosepSampe commented on July 20, 2024

@idoyehe ok
Just wanted to understand why if you return 5MB of data, output.pickle file takes 56.9MB... You can see this at the end of the log.

from lithops.

idoyehe avatar idoyehe commented on July 20, 2024

@JosepSampe
if I execute this process locally on my computer and export the retuned list to a text file the text file size is arround 24 MB and the size of the python object is 4.8 MB (calculating it using sys.getsizeof(result_list))
Then I executed the code using pywren just returning my list (without run_statuses and invoke_statuses) and after reviewing the log of the reducer the output.pickle size is 52MB

Can we say the pickle overheads is 28 MB ? sounds like too much?

from lithops.

JosepSampe avatar JosepSampe commented on July 20, 2024

In my experience, don't take sys.getsizeof into account as it does not return correct size number when using over dicts or lists, or complex objects.

Yep, seems the overhead of pickle is 28MB

from lithops.

idoyehe avatar idoyehe commented on July 20, 2024

@JosepSampe So it is more than 100% overhead is sounds way too much...

There is no other methods to preform that job?

from lithops.

JosepSampe avatar JosepSampe commented on July 20, 2024

@idoyehe What are you pickling? becasue as stated here: https://stackoverflow.com/questions/31304006/why-is-there-a-large-overhead-in-pickling-numpy-arrays , pickling is just a binary copying. So if you store the list in a text file, it only stores the numbers, however, when pickling, it stores the complet object.

from lithops.

idoyehe avatar idoyehe commented on July 20, 2024

@JosepSampe
You right i found a way to make my object much more simple without array and the overhead reduced significantly and the runtime decrease also!

Thank you!

from lithops.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.