Coder Social home page Coder Social logo

Comments (10)

Dieterbe avatar Dieterbe commented on May 11, 2024

I'm currently working around this issue by using http://jsonpickle.github.com/

diff --git a/src/gensim/utils.py b/src/gensim/utils.py
index 817f3b7..3d797a9 100644
--- a/src/gensim/utils.py
+++ b/src/gensim/utils.py
@@ -1,4 +1,4 @@
 #!/usr/bin/env python
 # -*- coding: utf-8 -*-
 #
 # Copyright (C) 2010 Radim Rehurek <[email protected]>
@@ -13,6 +13,7 @@ from __future__ import with_statement
 import logging
 import re
 import unicodedata
+import jsonpickle
 import cPickle
 import itertools
 from functools import wraps # for `synchronous` function lock
@@ -421,16 +422,24 @@ def chunkize(corpus, chunks, maxsize=0):
         for chunk in chunkize_serial(corpus, chunks):
             yield chunk


 def pickle(obj, fname, protocol=-1):
     """Pickle object `obj` to file `fname`."""
-    with open(fname, 'wb') as fout: # 'b' for binary, needed on Windows
-        cPickle.dump(obj, fout, protocol=protocol)
+    with open(fname, 'w') as fout:
+       fout.write(jsonpickle.encode(obj))


 def unpickle(fname):
     """Load pickled object from `fname`"""
-    return cPickle.load(open(fname, 'rb'))
+    with open(fname, 'r') as fin:
+       return jsonpickle.decode(fin.read())

from gensim.

piskvorky avatar piskvorky commented on May 11, 2024

Can jsonpickle handle very large objects (reasonably memory efficient during save/load)? Dedan had another issue with cPickle, see #31 , so perhaps completely switching from pickle to json would solve both at the same time...

from gensim.

Dieterbe avatar Dieterbe commented on May 11, 2024

Radim, your question triggered this little experiment:
http://dieter.plaetinck.be/poor_mans_pickle_implementations_benchmark.html
I shall check out your numpy-based approach, it is probably better than my jsonpickle approach.

from gensim.

piskvorky avatar piskvorky commented on May 11, 2024

Nice! I like benchmarks :)

How about the standard json package? (simplejson in python <2.6)

from gensim.

Dieterbe avatar Dieterbe commented on May 11, 2024

What do you mean? what about it?
the jsonpickle page says "The standard Python libraries for encoding Python into JSON, such as the stdlib’s json, simplejson, and demjson, can only handle Python primitives that have a direct JSON equivalent (e.g. dicts, lists, strings, ints, etc.). jsonpickle builds on top of these libraries"

http://jsonpickle.github.com/

from gensim.

piskvorky avatar piskvorky commented on May 11, 2024

Oh, I didn't know it builds on json. In that case its performance is prolly nearly identical, no need to test.

Btw I remembered reading about json speed on metaoptimize some time ago, I managed to googled it up: http://metaoptimize.com/blog/2009/03/22/fast-deserialization-in-python/

from gensim.

Dieterbe avatar Dieterbe commented on May 11, 2024

Well, the article explicitly discourages using it because it's buggy and unmaintained.

from gensim.

piskvorky avatar piskvorky commented on May 11, 2024

?? It's part of the standard python library. You probably mean cjson.

from gensim.

Dieterbe avatar Dieterbe commented on May 11, 2024

Yes, I meant cjson. Anyway I don't feel the need to test more things right now, as the numpy native persistency thing you did is probably best anyway. Or am I missing something?

from gensim.

piskvorky avatar piskvorky commented on May 11, 2024

For numpy arrays, I think you're right :) Numpy is also very actively developed/maintained, so there's a good chance potential bugs will be fixed quickly. The core numpy guys are very good engineers.

from gensim.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.