chrisdev / django-pandas Goto Github PK

View Code? Open in Web Editor NEW

792.0 792.0 115.0 309 KB

Tools for working with pandas in your Django projects

License: BSD 3-Clause "New" or "Revised" License

Makefile 2.50% Python 97.50%

django-pandas's Introduction

Django Pandas

Tools for working with pandas in your Django projects

Contributors

What's New

This is release facilitates running of test with Python 3.10 and automates the publishing of the package to PYPI as per PR #146 (again much thanks @graingert). As usual we have attempted support legacy versions of Python/Django/Pandas and this sometimes results in deperation errors being displayed in when test are run. To avoid use python -Werror runtests.py

Dependencies

django-pandas supports Django (>=1.4.5) or later and requires django-model-utils (>= 1.4.0) and Pandas (>= 0.12.0). Note because of problems with the requires directive of setuptools you probably need to install numpy in your virtualenv before you install this package or if you want to run the test suite

pip install numpy
pip install -e .[test]
python runtests.py

Some pandas functionality requires parts of the Scipy stack. You may wish to consult http://www.scipy.org/install.html for more information on installing the Scipy stack.

You need to install your preferred version of Django. as that Django 2 does not support Python 2.

Contributing

Please file bugs and send pull requests to the GitHub repository and issue tracker.

Installation

Start by creating a new virtualenv for your project

mkvirtualenv myproject

Next install numpy and pandas and optionally scipy

pip install numpy
pip install pandas

You may want to consult the scipy documentation for more information on installing the Scipy stack.

Finally, install django-pandas using pip:

pip install django-pandas

or install the development version from github

pip install https://github.com/chrisdev/django-pandas/tarball/master

Usage

IO Module

The django-pandas.io module provides some convenience methods to facilitate the creation of DataFrames from Django QuerySets.

read_frame

Parameters

qs: A Django QuerySet.

fieldnames: A list of model field names to use in creating the DataFrame.

You can span a relationship in the usual Django way by using double underscores to specify a related field in another model

index_col: Use specify the field name to use for the DataFrame index.

If the index field is not in the field list it will be appended

coerce_float : Boolean, defaults to True

Attempt to convert values to non-string, non-numeric objects (like decimal.Decimal) to floating point.

verbose: If this is True then populate the DataFrame with the

human readable versions of any foreign key or choice fields else use the actual values set in the model.

column_names: If not None, use to override the column names in the

DateFrame

Examples

Assume that this is your model:

class MyModel(models.Model):

    full_name = models.CharField(max_length=25)
    age = models.IntegerField()
    department = models.CharField(max_length=3)
    wage = models.FloatField()

First create a query set:

from django_pandas.io import read_frame
qs = MyModel.objects.all()

To create a dataframe using all the fields in the underlying model

df = read_frame(qs)

The df will contain human readable column values for foreign key and choice fields. The DataFrame will include all the fields in the underlying model including the primary key. To create a DataFrame using specified field names:

df = read_frame(qs, fieldnames=['age', 'wage', 'full_name'])

To set full_name as the DataFrame index

qs.to_dataframe(['age', 'wage'], index_col='full_name'])

You can use filters and excludes

qs.filter(age__gt=20, department='IT').to_dataframe(index_col='full_name')

DataFrameManager

django-pandas provides a custom manager to use with models that you want to render as Pandas Dataframes. The DataFrameManager manager provides the to_dataframe method that returns your models queryset as a Pandas DataFrame. To use the DataFrameManager, first override the default manager (objects) in your model's definition as shown in the example below

#models.py

from django_pandas.managers import DataFrameManager

class MyModel(models.Model):

    full_name = models.CharField(max_length=25)
    age = models.IntegerField()
    department = models.CharField(max_length=3)
    wage = models.FloatField()

    objects = DataFrameManager()

This will give you access to the following QuerySet methods:

to_dataframe

to_timeseries

to_pivot_table

to_dataframe

Returns a DataFrame from the QuerySet

Parameters

fieldnames: The model field names to utilise in creating the frame.

to span a relationship, use the field name of related fields across models, separated by double underscores,

index: specify the field to use for the index. If the index

field is not in the field list it will be appended

coerce_float: Attempt to convert the numeric non-string data

like object, decimal etc. to float if possible

verbose: If this is True then populate the DataFrame with the

human readable versions of any foreign key or choice fields else use the actual value set in the model.

Examples

Create a dataframe using all the fields in your model as follows

qs = MyModel.objects.all()

df = qs.to_dataframe()

This will include your primary key. To create a DataFrame using specified field names:

df = qs.to_dataframe(fieldnames=['age', 'department', 'wage'])

To set full_name as the index

qs.to_dataframe(['age', 'department', 'wage'], index='full_name'])

You can use filters and excludes

qs.filter(age__gt=20, department='IT').to_dataframe(index='full_name')

to_timeseries

A convenience method for creating a time series i.e the DataFrame index is instance of a DateTime or PeriodIndex

Parameters

fieldnames: The model field names to utilise in creating the frame.

to span a relationship, just use the field name of related fields across models, separated by double underscores,

index: specify the field to use for the index. If the index

field is not in the field list it will be appended. This is mandatory.

storage: Specify if the queryset uses the wide or long format

for data.

pivot_columns: Required once the you specify long format

storage. This could either be a list or string identifying the field name or combination of field. If the pivot_column is a single column then the unique values in this column become a new columns in the DataFrame If the pivot column is a list the values in these columns are concatenated (using the '-' as a separator) and these values are used for the new timeseries columns

values: Also required if you utilize the long storage the

values column name is use for populating new frame values

freq: the offset string or object representing a target conversion

rs_kwargs: Arguments based on pandas.DataFrame.resample

verbose: If this is True then populate the DataFrame with the

human readable versions of any foreign key or choice fields else use the actual value set in the model.

Examples

Using a long storage format

#models.py

class LongTimeSeries(models.Model):
    date_ix = models.DateTimeField()
    series_name = models.CharField(max_length=100)
    value = models.FloatField()

    objects = DataFrameManager()

Some sample data::

========   =====       =====
date_ix    series_name value
========   =====       ======
2010-01-01  gdp        204699

2010-01-01  inflation  2.0

2010-01-01  wages      100.7

2010-02-01  gdp        204704

2010-02-01  inflation  2.4

2010-03-01  wages      100.4

2010-02-01  gdp        205966

2010-02-01  inflation  2.5

2010-03-01  wages      100.5
==========  ========== ======

Create a QuerySet

qs = LongTimeSeries.objects.filter(date_ix__year__gte=2010)

Create a timeseries dataframe

df = qs.to_timeseries(index='date_ix',
                      pivot_columns='series_name',
                      values='value',
                      storage='long')
df.head()

date_ix      gdp     inflation     wages

2010-01-01   204966     2.0       100.7

2010-02-01   204704      2.4       100.4

2010-03-01   205966      2.5       100.5

Using a wide storage format

class WideTimeSeries(models.Model):
    date_ix = models.DateTimeField()
    col1 = models.FloatField()
    col2 = models.FloatField()
    col3 = models.FloatField()
    col4 = models.FloatField()

    objects = DataFrameManager()

qs = WideTimeSeries.objects.all()

rs_kwargs = {'how': 'sum', 'kind': 'period'}
df = qs.to_timeseries(index='date_ix', pivot_columns='series_name',
                      values='value', storage='long',
                      freq='M', rs_kwargs=rs_kwargs)

to_pivot_table

A convenience method for creating a pivot table from a QuerySet

Parameters

fieldnames: The model field names to utilise in creating the frame.

to span a relationship, just use the field name of related fields across models, separated by double underscores,

values : column to aggregate, optional

rows : list of column names or arrays to group on

Keys to group on the x-axis of the pivot table

cols : list of column names or arrays to group on

Keys to group on the y-axis of the pivot table

aggfunc : function, default numpy.mean, or list of functions

If list of functions passed, the resulting pivot table will have hierarchical columns whose top level are the function names (inferred from the function objects themselves)

fill_value : scalar, default None

Value to replace missing values with

margins : boolean, default False

Add all row / columns (e.g. for subtotal / grand totals)

dropna : boolean, default True

Example

# models.py
class PivotData(models.Model):
    row_col_a = models.CharField(max_length=15)
    row_col_b = models.CharField(max_length=15)
    row_col_c = models.CharField(max_length=15)
    value_col_d = models.FloatField()
    value_col_e = models.FloatField()
    value_col_f = models.FloatField()

    objects = DataFrameManager()

Usage

rows = ['row_col_a', 'row_col_b']
cols = ['row_col_c']

pt = qs.to_pivot_table(values='value_col_d', rows=rows, cols=cols)

django-pandas's People

Contributors

Stargazers

Watchers

Forkers

patseng pombredanne yuvallanger bertrandbordage suledev tumb1er jobingr thedrow gtnx victornovais paulgueltekin parbhat fredrikburman arcticshores liudch sternb0t middlefork vineelyalamarthy elmers simudream jmaupetit jameschansell asval adanilychevjr poliflix yappawu mthlvt dgo-list mareklabonarski pastorenue shniu alexoliveira lyricz vuchau davinirjr chanlyi pmaddi arita37 forester9019 m3rryqold khansrk hoslack andrlik perpetua1 debuggerpk thinmanj andymos66 lordoftheflies zwaluw henhuy ecnu-zjn denhartog tongyuanfeng jerusalemsbell sanyambansal76 heliomeiralins utapyngo rightx2 wkschwartz ttill raonyguimaraes coinroutes iwillcodeu smar10 onchainvibe stevenludwig belonesox shashank14 osvill cloudmercato hllyzms reggynoble grupocato vtoupet shappiro aodin brad-luo victorhenriquez aisipos anuragsinghchaudhary he9995 stjordanis edwelker zulupro solversa eduardo-lucas am2397 benwhalley abhishek1995kumar onyedikachi-david whyscream xyzlat yonimdo sitinuraini2021 graingert cyclops26 thomaskunz justin-f-perez jacksund selfcontrol7

django-pandas's Issues

django-model-utils 2.4 breaks DataFrameManager because of missing import

If you upgrade to django-model-utils 2.4 the DataFrameManager model manager breaks. It appears that 2.4 remove support for PassThroughManager. The last support django-model-utils was 2.3.1.

Remove the django>=1.4.2 dependency in setup.py

Would it be possible to remove the django>=1.4.2 dependency from setup.py?

Reason: now that Django 2 is out it makes me download it which is not what I want. It is annoying for a module I have that depends on django-pandas.

Generate docs using Sphix

We should generate docs using Sphinx and put them on ReadTheDocs. This would avoid outdated API, as it actually is the case in README.rst (for example, fill_na is still mentioned as a valid argument of to_dataframe).

Django 1.7 support

Installing django-pandas in a Django 1.7 project I realized that 1.7 version was uninstalled and a 1.6.10 version was installed instead.

Django 1.7 is a problem for django-pandas? Can we change this setup directive to allow django 1.7?

Bellow, link to the code showing the django 1.7 limitation.
https://github.com/chrisdev/django-pandas/blob/master/setup.py#L24

Get all fields in related model

Is there a way to get all the fields in a related model instead of having to explicitly follow individual relationships with the double underscore notation ('offer__provider')?

For example I have the following statement:

productapps = ProductApplication.objects.all().select_related('offer')

And I want to turn that queryset into a dataframe with all the fields from ProductApplication and Offer.

README: "pivot_column"

The readme text for to_timeseries says "pivot_column" whereas the parameter name is "pivot_columns" with an S.

Speedup read_frame

Use pandas.io.sql.read_frame for performance speedup

from django.db import connection
import pandas as pd
import resource

def get_frame(qs):
    """ proposed solution: get an sql and pass it to pandas.io.sql.read_frame """
    compiler = qs.query.get_compiler(using='default')
    sql, args = compiler.as_sql()
    return pd.io.sql.read_frame(sql, connection, params=args)

def get_list(qs):
    """ That is under the hood of django-pandas 
    _clone is called to avoid django result cache usage
    """
    return pd.DataFrame.from_records(list(qs._clone()))

def get_iter(qs):
    """ First solution, replace list with iterator."""
    qs = qs._clone()
    compiler = qs.query.get_compiler(using='default')
    return pd.DataFrame.from_records(compiler.results_iter())

First, we need sample data. Three columns, 800k rows mysql table.

# Sample data from MySQL database
qs = Views.objects.filter(date='2014-05-13').values_list('platform_id', 'video_id', 'video_views')

qs.count() # 800K rows

Next, let's measure memory consumption (with interpreter restart after each test)

old_memory = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
get_frame(qs) # replace with get_iter and get_list
new_memory = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
print "allocated:", new_memory - old_memory

In my case get_frame worked better, 390MB instead of 423MB for other variants.

Next, timeit!

print "as_sql (alloc 390M)"
%timeit get_frame(qs)
print "results_iter (alloc 423M)"
%timeit get_iter(qs)
print "django_pandas (alloc 424M)"
%timeit get_list(qs)

as_sql (alloc 390M)
1 loops, best of 3: 6.51 s per loop
results_iter (alloc 423M)
1 loops, best of 3: 11.7 s per loop
django_pandas (alloc 424M)
1 loops, best of 3: 12 s per loop

So, with pandas.io.sql.read_frame we have less memory usage and almost twice speed-up.

A more verbose output

In some scenarii, the rendered output is not enough verbose. I see three cases:

verbose field names should be rendered (seems easy, just needs to change the names argument in this line)
DB fields with choices should have their values rendered with get_FOO_display
foreign keys should be more verbose than just the id (this can be tricky, because calling __str__ can trigger tons of queries & code)

Example showing those three cases:

Expected output:

"Année" is the verbose name of the field "annee".
"Hommes" & "Femmes" are choices from the "sexe" field.
"Abbeville", "Agen", etc are __str__ representation of the model of a foreign key.

read_frame columns with model properties

reading model properies ( with @propery annotated methods of a model ) into DataFrame doesent work yet , because QuerySet.values_list() only works with model fields.

how to render the pivot table in django template

How can I render the dictionary object into an html table in the template?

pandas .17 support

io.read_frame no longer works with fields referencing related models

I've got a Trade model that references Security as a foreign field

df = io.read_frame(qs,fieldnames=['security__isin', 'security__category'])

Where the error occurs in get_field('security__isin')

*** FieldDoesNotExist: Trade has no field named 'security__isin'

runtests is broken in Django 1.8

django.test.simple was removed in 1.8

Error in readme

This isn't valid Python:

qs.to_dataframe(['age', 'wage', index='full_name'])

Did you mean?:

qs.to_dataframe(['age', 'wage'], index='full_name')

Is this necessary to add django-pandas to INSTALLED_APPS?

(First of all, thanks for this module, of course :) )

Since there is no model, no static file, no template, and no template tag, I don't see why one has to add it to INSTALLED_APPS. Is it for the tests?

Column Translations in read_frame

A common use case (I'm guessing, since that's what I'm doing) is to:
Define models => do operations in dfs => save back down as models

This is kind of fiddly because of the column translations when dealing with fk columns.
read_frame(verbose = False) returns an _id suffix for the pk, and not for the fks
but Model(**kwargs) requires an _id suffix for the pk and also for the fks

So there's this translation that's happening that I either circumvent or do some fiddling to undo, the origin is the line:

fieldnames = [f.name for f in fields]

I'm wondering whether a default of:

fieldnames = [f.attname for f in fields]

might make more sense, although it's easy enough to wrap this and pass those fieldnames in. I'm curious whether there's a deeper reason for the current default.

to_timeseries - how to pivot?

I have the following model:

class Vote(models.Model):
    user = models.ForeignKey(User, blank=True, null=True)
    poll = models.ForeignKey(Poll)
    choice = models.ForeignKey(Choice)
    comment = models.TextField(max_length=144, blank=True, null=True)
    created = models.DateTimeField(auto_now_add=True)

I am trying to create a timeseries object that counts votes by day or week and poll:

votes = Vote.objects.to_timeseries(['id', 'poll'], pivot_columns='poll', verbose=True,
                                 index='created', freq='D', rs_kwargs=dict(how='count'))
votes.head()
=>
                            id  poll
created     
2015-05-28 00:00:00+00:00   15  15
2015-05-29 00:00:00+00:00   55  55
2015-05-30 00:00:00+00:00   61  61
2015-05-31 00:00:00+00:00   15  15
2015-06-01 00:00:00+00:00   112 112
(...)

Poll has several values, say 'A', 'B' as its string representation. I would expect something along the lines of:

                            id  poll_A  poll_B
created     
2015-05-28 00:00:00+00:00   15  7   8
2015-05-29 00:00:00+00:00   55  20  35
2015-05-30 00:00:00+00:00   61  30  31
2015-05-31 00:00:00+00:00   15  8   7
2015-06-01 00:00:00+00:00   112 60  52
(...)

What am I missing?

Upgrade to Django 1.10

get_all_field_names() removed in Django 1.10

to_pivot_table method failing with pandas == 0.16

Previously deprecated rows and cols arguments have been removed

rows : index
cols : columns

Explicitely specify DataFrameQuerySet.to_[*] kwargs

Currently, a kwargs dict is used to parse arguments in those methods.

A lot of code could be removed by explicitely specifying those arguments in the method declaration. Plus this eases code autocompletion, introspection, etc.

Before working on #9, I would like to fix this. Can I?

DataFrameManager.get_query_set method warning.

Hello,

I have been seeing the following warning when using django-pandas. Should this function be renamed?

C:\Python27\lib\site-packages\django_pandas\managers.py:183: RemovedInDjango18Warning: DataFrameManager.get_query_set method should be renamed get_queryset.
class DataFrameManager(PassThroughManager):

Kind regards,

Dan.

Django 1.9 support

Started working with Django Pandas lately and like it a lot :)
A few days ago I migrated my Django install to a new server running Django 1.9 and a couple of things broke, like django-pandas. I figured this is partially due to managers.py class DataFrameManager which depends on class PassThroughManager but PassThroughManager has been removed from latest django-utils. Not so sure how to resolve, perhaps something like https://docs.djangoproject.com/en/1.9/topics/db/managers/ (from_queryset). For example:

managers.py

from django.db import models
...
...
class BaseManager(models.Manager):
    def manager_only_method(self):
        return

DataFrameManager = BaseManager.from_queryset(DataFrameQuerySet)

Apart from that, io.py requires ValuesQuerySet but that has been removed from Django 1.9 as well..

Update io.py to something like:


if fieldnames:
        if index_col is not None and index_col not in fieldnames:
            # Add it to the field names if not already there
            fieldnames = tuple(fieldnames) + (index_col,)

        fields = to_fields(qs, fieldnames)
    elif isinstance(qs, django.db.models.query.QuerySet):
        if django.VERSION < (1, 8):
            annotation_field_names = qs.aggregate_names
        else:
            annotation_field_names = list(qs.values().query.annotation_select)

        fieldnames = list(qs.values().query.values_select) + annotation_field_names + list(qs.values().query.extra_select)

        fields = [qs.model._meta.get_field(f) for f in list(qs.values().query.values_select)] + \
                 [None] * (len(annotation_field_names) + len(list(qs.values().query.extra_select)))
    else:
    fields = qs.model._meta.fields
        fieldnames = [f.name for f in fields]

    if isinstance(qs, django.db.models.query.QuerySet):
        vqs = qs.values()
        recs = list(vqs)

Dataframe and many-to-many relationship

Is it possible to use django-pandas with models that stay in many-to-many relationship? It seems that currently this feature is not supported. E.g., for

class Topping(models.Model):
    name = models.CharField(max_length=30)

class Pizza(models.Model):
    name = models.CharField(max_length=50)
    toppings = models.ManyToManyField(Topping)

qs_pizza_with_toppings = Pizza.objects.all().prefetch_related('toppings')
df_pizza_with_toppings = read_frame(qs_pizza_with_toppings)

df_pizza_with_toppings does not contain any topping names.

'Options' object has no attribute 'module_name' at Django 1.8.2

I'm facing the error:

Django Version: 1.8.2
Exception Type: AttributeError
Exception Value:
'Options' object has no attribute 'module_name'

what should I do?

Unable to use django-pandas

My first attempt, so I may have missed something. Running Django 1.6 and pandas 0.14. I have tried this:

from locales.models import Place
# the Place model has a pandas_data = DataFrameManager()
qs = Place.pandas_data.all()  
df = qs.to_dataframe(['code', 'type',])

but got this error trace:

  File "<console>", line 1, in <module>
  File "/home/creation/.virtualenvs/s2s/local/lib/python2.7/site-packages/django_pandas/managers.py", line 166, in to_dataframe
    qs = self.values_list(*fields)
  File "/home/creation/.virtualenvs/s2s/local/lib/python2.7/site-packages/django/db/models/query.py", line 535, in values_list
    _fields=fields)
  File "/home/creation/.virtualenvs/s2s/local/lib/python2.7/site-packages/django/db/models/query.py", line 849, in _clone
    c._setup_query()
  File "/home/creation/.virtualenvs/s2s/local/lib/python2.7/site-packages/django/db/models/query.py", line 992, in _setup_query
    self.query.add_fields(self.field_names, True)
  File "/home/creation/.virtualenvs/s2s/local/lib/python2.7/site-packages/django/db/models/sql/query.py", line 1533, in add_fields
    name.split(LOOKUP_SEP), opts, alias, None, allow_m2m,
AttributeError: 'list' object has no attribute 'split'

I found I was able to access data directly just using "plain" pandas as follows:

import pandas as pd
from locales.models import Place
qs = Place.objects.all()
df = pd.DataFrame.from_records(qs.values('code', 'type'))

UnboundLocalError: local variable 'field' referenced before assignment

First, thanks for this great package, which is so much useful! I love it :)

I'm getting this error while trying to do a queryset.to_dataframe(['field'], index='id'). With queryset.to_dataframe(['id', 'field']).set_index('id') it works perfectly.

Here is my traceback:

  File "/cygdrive/c/Users/N6601004/.virtualenvs/intermediation/lib/python2.7/site-packages/django_pandas/managers.py", line 258, in to_dataframe
    index_col=index, coerce_float=coerce_float)
  File "/cygdrive/c/Users/N6601004/.virtualenvs/intermediation/lib/python2.7/site-packages/django_pandas/io.py", line 113, in read_frame
    update_with_verbose(df, fieldnames, fields)
  File "/cygdrive/c/Users/N6601004/.virtualenvs/intermediation/lib/python2.7/site-packages/django_pandas/utils.py", line 83, in update_with_verbose
    for fieldname, function in build_update_functions(fieldnames, fields):
  File "/cygdrive/c/Users/N6601004/.virtualenvs/intermediation/lib/python2.7/site-packages/django_pandas/utils.py", line 72, in build_update_functions
    for fieldname, field in zip(fieldnames, fields):
  File "/cygdrive/c/Users/N6601004/.virtualenvs/intermediation/lib/python2.7/site-packages/django_pandas/io.py", line 26, in to_fields
    yield field

My environment:

$ pip freeze
astroid==1.4.5
backports.shutil-get-terminal-size==1.0.0
colorama==0.3.7
Cython==0.24
decorator==4.0.9
Django==1.8.13
django-nose==1.4.3
django-pandas==0.4.1
geopy==1.11.0
h5py==2.6.0
ipython==4.2.0
ipython-genutils==0.1.0
lazy-object-proxy==1.2.2
mysql-connector-python==2.1.3
names==0.3.0
nose==1.3.7
numpy==1.11.0
pandas==0.17.1
pathlib2==2.1.0
pep8==1.7.0
pexpect==4.1.0
pickleshare==0.7.2
ptyprocess==0.5.1
pylint==1.5.5
python-dateutil==2.5.3
pytz==2016.3
PyYAML==3.11
scipy==0.17.1
simplegeneric==0.8.1
six==1.10.0
sqlparse==0.1.19
Tempita==0.5.2
traitlets==4.2.1
Unidecode==0.4.19
wrapt==1.10.8

Loading data back to the database model

Got this via an email from [email protected]

have a question related to putting the data back into the database. In the documentation it is stated that the django_pandas.io also handle saving data to the underlying model. However I don't see how it works ? or which method to call ? Can you help me out ?

Modernize and optimize travis and tox config

Obviously we need to work on tox.ini and .travis.yml
But can we integrate them?

Avoid dependency management redundancy in travis ci build instructions

The current .travis.yml file install section looks like:

install:
  - pip install $DJANGO
  - pip install coverage coveralls
  - pip install numpy>=1.6.1
  - pip install django-model-utils >=1.4.0
  - pip install pandas>=0.12.0

To avoid dependency management redundancy, would it be a good idea to move to something like this?

install:
  - pip install $DJANGO
  - pip install coverage coveralls
  - pip install .

Memory-efficient iteration

Been loving django-pandas so far!

One issue I've been having is with python eating up way too much memory for some of my larger tables. Once read in, a dataframe I have is only 2.1MB, built from around 150k rows. I'm getting memory quota errors on Heroku due to this.

A more memory-efficient iteration for io.read_frame would be great
Something like this: http://www.poeschko.com/2012/02/memory-efficient-django-queries/

Add compatibility with django-polymorphic

Essentially we need to a polymorphic version of DataFrameManager.

Add Django 1.9 to .travis.yml file

It would be nice to automatically test new PR with Django 1.9 :)

tox.ini needs to be refactored

For example we need to support Python 3.4

Support for Django 2.0 and above

Django 2.0 only supports Python 3.4, 3.5, and 3.6. Django 1.11.x series is the last to support Python 2.7.

Build failing with python3

Failure at test_io.verbose() as qs.values_list('trader__pk', flat=True) returns a list of strings but df1.trader.tolist() returns a list of objects.

object has no attribute '_iterable_class' error

qs = Measurement.objects.get(id = id)
df = read_frame(qs)

I am using the following code to convert my queryset to pandas dataframe for a single object but it is showing the following error

object has no attribute '_iterable_class'

Does the library does'nt support get() method

to_timeseries method on DataFrameManager should have coerce_float option

it is easier to do sum and mean if the decimal field correctly converts it.

Not working for me, are the docs wrong?

None of the manager methods are working for me. I've installed the latest version from Github using pip as described in the docs, also added django_pandas to INSTALLED_APPS. Then I am doing:

from django_pandas.managers import DataFrameManager
class Foo(models.Model):
    bar = models.DecimalField(decimal_places=2, max_digits=18)
    foo_date = models.DateTimeField()
    objects = DataFrameManager()

And when I do:

Foo.to_dataframe()

I get the following error:

AttributeError: type object 'Foo' has no attribute 'to_timeseries'

Full traceback:

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/django/contrib/staticfiles/handlers.py", line 72, in __call__
    return self.application(environ, start_response)
  File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/wsgi.py", line 255, in __call__
    response = self.get_response(request)
  File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/base.py", line 178, in get_response
    response = self.handle_uncaught_exception(request, resolver, sys.exc_info())
  File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/base.py", line 217, in handle_uncaught_exception
    return debug.technical_500_response(request, *exc_info)
  File "/home/dan/Envs/acerayenv/lib/python2.7/site-packages/django_extensions/management/technical_response.py", line 5, in null_technical_500_response
    six.reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/base.py", line 115, in get_response
    response = callback(request, *callback_args, **callback_kwargs)
  File "/home/dan/test_app/app/views.py", line 79, in calcs
    print  Foo.to_timeseries()
AttributeError: type object 'Foo' has no attribute 'to_timeseries'

Are properties accessible in to_dataframe()

If I have a model with a property

    class Item(models.Model):
        objects = DataFrameManager()
        # fields

        @property
         def someProperty:
             # returns something

Is there anyway to include this property in the dataframe via something like this:

 Item.objects.to_dataframe(['field_1',...,'field_n'],properties=[''someProperty"])

Thanks!

Can “objects = DataFrameManager()” inherit to children class?

Is there a way to let DataFrameManager() able to inherit to its children class?

model._meta.get_all_related_objects_with_model() fails in newer django-versions

Hello,

I changed the get_queryset method for my model via django.db.models.Manager, like:

from django.db.models import Model, Manager, IntegerField

class NewModelManager(Manager):
    def get_queryset(self):
        return super(NewModelManager, self).get_queryset().annotate(
            new_field=F('a') * F('b')
        )
class NewModel(Model):
    a = IntegerField('First')
    b = IntegerField('Second')

    objects = NewModelManager()

after doing so readframe() does not work with paramter verbose set to True (setting verbose=False works), because function _model.meta.get_all_related_objects_with_model() is deprecated:

qs = cls.objects.all()
df = read_frame(
    qs,
    fieldnames=['a', 'b', 'new_field'],
    verbose=True
)

DataFrameManager still using np.core.records.fromrecords

W've found that:

 list(self.values_list(Ifields))

is more performant than

 recs = np.core.records.fromrecords(qs, names=qs.field_names)

So why is it being used in io and not in the Manager

unable to install pandas in django but with python

I have installed django 1.6 and pandas 0.14.0. when i try import pandas in python shell ,it shows no error and works fine but after i have added "django_pandas" in INSTALLED_APPS . my django shell is not running and showing
ImportError: No module named django_pandas

Setup.py contains a redundant requirement for numpy

pandas requires numpy so this dependency will be taken care of once pandas is install

Question about using dataframe_from_qs

Hey there, i'm not a real dev, just trying to hack together a prototype of something. I want to use pandas to transpose a table from my database and the first step is to convert the queryset to a dataform and I stumbled upon dataframe_from_qs and was hoping it would help me out.

I put the function in my views.py and and then called it as so:

def index(request):
def index(request):
qs = Tablet.objects.all()
df = dataframe_from_qs(qs)
df.transpose()
return HttpResponse(df)

When I run it I get an error that a list object has no next referring to this line in dataframe_from_qs() :
df = DataFrame.from_records(rows.next(), columns=columns, coerce_float=True)

Is there something simple I'm missing here? I should add, I know that HttpResponse is kind of pointless. I'm just using it to see what I get. I'm planning to them pass everything to a proper template.

Thanks,
-patrick

Handling of django.db.models.query.ValuesQuerySet

Thoses queries are not properly handled by the library since it uses the values_list method.

For instance, if you define the following queryset
qs = Object.models.all().values("a", "b").annotate(c=Sum("c"))

it should return a dataframe such as

   a  b  c
0  a  b  1

which is not the case.
Do you agree?

Question: How to get back to a Django data type?

Django-Pandas works great. I can then use Pandas to do cool things like .Transpose. But how to get back to a data type that I can use with Django addons that only work with ORM? I don't want to write data back to the model, just pass it off to say DjangoTables2.

This would be a great additional feature!
No one else seems to know either
http://stackoverflow.com/questions/32733958/how-to-use-pandas-dataframe-with-django-tables2

README : typo in first `to_dataframe` example

qs.to_dataframe(['age', 'wage'], index='full_name')

instead of

qs.to_dataframe(['age', 'wage', index='full_name'])

UnicodeDecodeError when installing django-pandas with pip3 in Docker

build	25-Jan-2018 09:47:12	Collecting django-pandas==0.5.0 (from -r /tmp/requirements.txt (line 40))
build	25-Jan-2018 09:47:12	  Downloading django-pandas-0.5.0.tar.gz
build	25-Jan-2018 09:47:12	    Complete output from command python setup.py egg_info:
build	25-Jan-2018 09:47:12	    Traceback (most recent call last):
build	25-Jan-2018 09:47:12	      File "<string>", line 1, in <module>
build	25-Jan-2018 09:47:12	      File "/tmp/pip-build-_xb4994q/django-pandas/setup.py", line 5, in <module>
build	25-Jan-2018 09:47:12	        open('README.rst').read() + '\n\n' +
build	25-Jan-2018 09:47:12	      File "/usr/lib/python3.5/encodings/ascii.py", line 26, in decode
build	25-Jan-2018 09:47:12	        return codecs.ascii_decode(input, self.errors)[0]
build	25-Jan-2018 09:47:12	    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1046: ordinal not in range(128)

This is caused by line "Hélio Meira Lins" in README.rst.

Similar issues in other projects:

Here is how to fix it by using codecs.open() instead of plain open():

https://github.com/keleshev/schema/pull/69/files#diff-2eeaed663bd0d25b7e608891384b7298

install numpy and pandas in django virtualenv

I tried to install with pip install numpy and pandas in django virtualenv but it didn't work. Everytime the same response: "Unable to find vcvarsall.bat". I tried to install those libraries in the folder "Scripts" where I normally install the libraries with django.
May someone help me?

chrisdev / django-pandas Goto Github PK

django-pandas's Introduction

Django Pandas

Contributors

What's New

Dependencies

Contributing

Installation

Usage

IO Module

read_frame

Examples

DataFrameManager

to_dataframe

Examples

to_timeseries

Examples

to_pivot_table

django-pandas's People

Contributors

Stargazers

Watchers

Forkers

django-pandas's Issues

Recommend Projects

Recommend Topics

Recommend Org