Coder Social home page Coder Social logo

django-pandas's Introduction

Django Pandas

https://coveralls.io/repos/chrisdev/django-pandas/badge.png?branch=master

Tools for working with pandas in your Django projects

Contributors

What's New

This is release facilitates running of test with Python 3.10 and automates the publishing of the package to PYPI as per PR #146 (again much thanks @graingert). As usual we have attempted support legacy versions of Python/Django/Pandas and this sometimes results in deperation errors being displayed in when test are run. To avoid use python -Werror runtests.py

Dependencies

django-pandas supports Django (>=1.4.5) or later and requires django-model-utils (>= 1.4.0) and Pandas (>= 0.12.0). Note because of problems with the requires directive of setuptools you probably need to install numpy in your virtualenv before you install this package or if you want to run the test suite

pip install numpy
pip install -e .[test]
python runtests.py

Some pandas functionality requires parts of the Scipy stack. You may wish to consult http://www.scipy.org/install.html for more information on installing the Scipy stack.

You need to install your preferred version of Django. as that Django 2 does not support Python 2.

Contributing

Please file bugs and send pull requests to the GitHub repository and issue tracker.

Installation

Start by creating a new virtualenv for your project

mkvirtualenv myproject

Next install numpy and pandas and optionally scipy

pip install numpy
pip install pandas

You may want to consult the scipy documentation for more information on installing the Scipy stack.

Finally, install django-pandas using pip:

pip install django-pandas

or install the development version from github

pip install https://github.com/chrisdev/django-pandas/tarball/master

Usage

IO Module

The django-pandas.io module provides some convenience methods to facilitate the creation of DataFrames from Django QuerySets.

read_frame

Parameters

  • qs: A Django QuerySet.
  • fieldnames: A list of model field names to use in creating the DataFrame.
    You can span a relationship in the usual Django way by using double underscores to specify a related field in another model
  • index_col: Use specify the field name to use for the DataFrame index.
    If the index field is not in the field list it will be appended
  • coerce_float : Boolean, defaults to True
    Attempt to convert values to non-string, non-numeric objects (like decimal.Decimal) to floating point.
  • verbose: If this is True then populate the DataFrame with the
    human readable versions of any foreign key or choice fields else use the actual values set in the model.
  • column_names: If not None, use to override the column names in the
    DateFrame

Examples

Assume that this is your model:

class MyModel(models.Model):

    full_name = models.CharField(max_length=25)
    age = models.IntegerField()
    department = models.CharField(max_length=3)
    wage = models.FloatField()

First create a query set:

from django_pandas.io import read_frame
qs = MyModel.objects.all()

To create a dataframe using all the fields in the underlying model

df = read_frame(qs)

The df will contain human readable column values for foreign key and choice fields. The DataFrame will include all the fields in the underlying model including the primary key. To create a DataFrame using specified field names:

df = read_frame(qs, fieldnames=['age', 'wage', 'full_name'])

To set full_name as the DataFrame index

qs.to_dataframe(['age', 'wage'], index_col='full_name'])

You can use filters and excludes

qs.filter(age__gt=20, department='IT').to_dataframe(index_col='full_name')

DataFrameManager

django-pandas provides a custom manager to use with models that you want to render as Pandas Dataframes. The DataFrameManager manager provides the to_dataframe method that returns your models queryset as a Pandas DataFrame. To use the DataFrameManager, first override the default manager (objects) in your model's definition as shown in the example below

#models.py

from django_pandas.managers import DataFrameManager

class MyModel(models.Model):

    full_name = models.CharField(max_length=25)
    age = models.IntegerField()
    department = models.CharField(max_length=3)
    wage = models.FloatField()

    objects = DataFrameManager()

This will give you access to the following QuerySet methods:

  • to_dataframe
  • to_timeseries
  • to_pivot_table

to_dataframe

Returns a DataFrame from the QuerySet

Parameters

  • fieldnames: The model field names to utilise in creating the frame.
    to span a relationship, use the field name of related fields across models, separated by double underscores,
  • index: specify the field to use for the index. If the index
    field is not in the field list it will be appended
  • coerce_float: Attempt to convert the numeric non-string data
    like object, decimal etc. to float if possible
  • verbose: If this is True then populate the DataFrame with the
    human readable versions of any foreign key or choice fields else use the actual value set in the model.

Examples

Create a dataframe using all the fields in your model as follows

qs = MyModel.objects.all()

df = qs.to_dataframe()

This will include your primary key. To create a DataFrame using specified field names:

df = qs.to_dataframe(fieldnames=['age', 'department', 'wage'])

To set full_name as the index

qs.to_dataframe(['age', 'department', 'wage'], index='full_name'])

You can use filters and excludes

qs.filter(age__gt=20, department='IT').to_dataframe(index='full_name')

to_timeseries

A convenience method for creating a time series i.e the DataFrame index is instance of a DateTime or PeriodIndex

Parameters

  • fieldnames: The model field names to utilise in creating the frame.
    to span a relationship, just use the field name of related fields across models, separated by double underscores,
  • index: specify the field to use for the index. If the index
    field is not in the field list it will be appended. This is mandatory.
  • storage: Specify if the queryset uses the wide or long format
    for data.
  • pivot_columns: Required once the you specify long format
    storage. This could either be a list or string identifying the field name or combination of field. If the pivot_column is a single column then the unique values in this column become a new columns in the DataFrame If the pivot column is a list the values in these columns are concatenated (using the '-' as a separator) and these values are used for the new timeseries columns
  • values: Also required if you utilize the long storage the
    values column name is use for populating new frame values
  • freq: the offset string or object representing a target conversion
  • rs_kwargs: Arguments based on pandas.DataFrame.resample
  • verbose: If this is True then populate the DataFrame with the
    human readable versions of any foreign key or choice fields else use the actual value set in the model.

Examples

Using a long storage format

#models.py

class LongTimeSeries(models.Model):
    date_ix = models.DateTimeField()
    series_name = models.CharField(max_length=100)
    value = models.FloatField()

    objects = DataFrameManager()

Some sample data::

========   =====       =====
date_ix    series_name value
========   =====       ======
2010-01-01  gdp        204699

2010-01-01  inflation  2.0

2010-01-01  wages      100.7

2010-02-01  gdp        204704

2010-02-01  inflation  2.4

2010-03-01  wages      100.4

2010-02-01  gdp        205966

2010-02-01  inflation  2.5

2010-03-01  wages      100.5
==========  ========== ======

Create a QuerySet

qs = LongTimeSeries.objects.filter(date_ix__year__gte=2010)

Create a timeseries dataframe

df = qs.to_timeseries(index='date_ix',
                      pivot_columns='series_name',
                      values='value',
                      storage='long')
df.head()

date_ix      gdp     inflation     wages

2010-01-01   204966     2.0       100.7

2010-02-01   204704      2.4       100.4

2010-03-01   205966      2.5       100.5

Using a wide storage format

class WideTimeSeries(models.Model):
    date_ix = models.DateTimeField()
    col1 = models.FloatField()
    col2 = models.FloatField()
    col3 = models.FloatField()
    col4 = models.FloatField()

    objects = DataFrameManager()

qs = WideTimeSeries.objects.all()

rs_kwargs = {'how': 'sum', 'kind': 'period'}
df = qs.to_timeseries(index='date_ix', pivot_columns='series_name',
                      values='value', storage='long',
                      freq='M', rs_kwargs=rs_kwargs)

to_pivot_table

A convenience method for creating a pivot table from a QuerySet

Parameters

  • fieldnames: The model field names to utilise in creating the frame.
    to span a relationship, just use the field name of related fields across models, separated by double underscores,
  • values : column to aggregate, optional
  • rows : list of column names or arrays to group on
    Keys to group on the x-axis of the pivot table
  • cols : list of column names or arrays to group on
    Keys to group on the y-axis of the pivot table
  • aggfunc : function, default numpy.mean, or list of functions
    If list of functions passed, the resulting pivot table will have hierarchical columns whose top level are the function names (inferred from the function objects themselves)
  • fill_value : scalar, default None
    Value to replace missing values with
  • margins : boolean, default False
    Add all row / columns (e.g. for subtotal / grand totals)
  • dropna : boolean, default True

Example

# models.py
class PivotData(models.Model):
    row_col_a = models.CharField(max_length=15)
    row_col_b = models.CharField(max_length=15)
    row_col_c = models.CharField(max_length=15)
    value_col_d = models.FloatField()
    value_col_e = models.FloatField()
    value_col_f = models.FloatField()

    objects = DataFrameManager()

Usage

rows = ['row_col_a', 'row_col_b']
cols = ['row_col_c']

pt = qs.to_pivot_table(values='value_col_d', rows=rows, cols=cols)

django-pandas's People

Contributors

aisipos avatar andrlik avatar bertrandbordage avatar bixbyr avatar chrisdev avatar cjhwong avatar edwelker avatar ericgrenier avatar fredrikburman avatar graingert avatar grantjenks avatar heliomeiralins avatar henhuy avatar justin-f-perez avatar kevinali3 avatar kgabbott avatar middlefork avatar parbhat avatar rda-dev avatar rightx2 avatar safehammad avatar sternb0t avatar sundyloveme avatar utapyngo avatar vtoupet avatar whyscream avatar yonimdo avatar yuvallanger avatar zulupro avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

django-pandas's Issues

Remove the django>=1.4.2 dependency in setup.py

Would it be possible to remove the django>=1.4.2 dependency from setup.py?

Reason: now that Django 2 is out it makes me download it which is not what I want. It is annoying for a module I have that depends on django-pandas.

Generate docs using Sphix

We should generate docs using Sphinx and put them on ReadTheDocs. This would avoid outdated API, as it actually is the case in README.rst (for example, fill_na is still mentioned as a valid argument of to_dataframe).

Get all fields in related model

Is there a way to get all the fields in a related model instead of having to explicitly follow individual relationships with the double underscore notation ('offer__provider')?

For example I have the following statement:

productapps = ProductApplication.objects.all().select_related('offer')

And I want to turn that queryset into a dataframe with all the fields from ProductApplication and Offer.

README: "pivot_column"

The readme text for to_timeseries says "pivot_column" whereas the parameter name is "pivot_columns" with an S.

Speedup read_frame

Use pandas.io.sql.read_frame for performance speedup

from django.db import connection
import pandas as pd
import resource

def get_frame(qs):
    """ proposed solution: get an sql and pass it to pandas.io.sql.read_frame """
    compiler = qs.query.get_compiler(using='default')
    sql, args = compiler.as_sql()
    return pd.io.sql.read_frame(sql, connection, params=args)

def get_list(qs):
    """ That is under the hood of django-pandas 
    _clone is called to avoid django result cache usage
    """
    return pd.DataFrame.from_records(list(qs._clone()))

def get_iter(qs):
    """ First solution, replace list with iterator."""
    qs = qs._clone()
    compiler = qs.query.get_compiler(using='default')
    return pd.DataFrame.from_records(compiler.results_iter())

First, we need sample data. Three columns, 800k rows mysql table.

# Sample data from MySQL database
qs = Views.objects.filter(date='2014-05-13').values_list('platform_id', 'video_id', 'video_views')

qs.count() # 800K rows

Next, let's measure memory consumption (with interpreter restart after each test)

old_memory = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
get_frame(qs) # replace with get_iter and get_list
new_memory = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
print "allocated:", new_memory - old_memory

In my case get_frame worked better, 390MB instead of 423MB for other variants.

Next, timeit!

print "as_sql (alloc 390M)"
%timeit get_frame(qs)
print "results_iter (alloc 423M)"
%timeit get_iter(qs)
print "django_pandas (alloc 424M)"
%timeit get_list(qs)
as_sql (alloc 390M)
1 loops, best of 3: 6.51 s per loop
results_iter (alloc 423M)
1 loops, best of 3: 11.7 s per loop
django_pandas (alloc 424M)
1 loops, best of 3: 12 s per loop

So, with pandas.io.sql.read_frame we have less memory usage and almost twice speed-up.

A more verbose output

In some scenarii, the rendered output is not enough verbose. I see three cases:

  • verbose field names should be rendered (seems easy, just needs to change the names argument in this line)
  • DB fields with choices should have their values rendered with get_FOO_display
  • foreign keys should be more verbose than just the id (this can be tricky, because calling __str__ can trigger tons of queries & code)

Example showing those three cases:
current-table
Expected output:
expected-table

  • "Année" is the verbose name of the field "annee".
  • "Hommes" & "Femmes" are choices from the "sexe" field.
  • "Abbeville", "Agen", etc are __str__ representation of the model of a foreign key.

read_frame columns with model properties

reading model properies ( with @propery annotated methods of a model ) into DataFrame doesent work yet , because QuerySet.values_list() only works with model fields.

Error in readme

This isn't valid Python:

qs.to_dataframe(['age', 'wage', index='full_name'])

Did you mean?:

qs.to_dataframe(['age', 'wage'], index='full_name')

Column Translations in read_frame

A common use case (I'm guessing, since that's what I'm doing) is to:
Define models => do operations in dfs => save back down as models

This is kind of fiddly because of the column translations when dealing with fk columns.
read_frame(verbose = False) returns an _id suffix for the pk, and not for the fks
but Model(**kwargs) requires an _id suffix for the pk and also for the fks

So there's this translation that's happening that I either circumvent or do some fiddling to undo, the origin is the line:

fieldnames = [f.name for f in fields]

I'm wondering whether a default of:

fieldnames = [f.attname for f in fields]

might make more sense, although it's easy enough to wrap this and pass those fieldnames in. I'm curious whether there's a deeper reason for the current default.

to_timeseries - how to pivot?

I have the following model:

class Vote(models.Model):
    user = models.ForeignKey(User, blank=True, null=True)
    poll = models.ForeignKey(Poll)
    choice = models.ForeignKey(Choice)
    comment = models.TextField(max_length=144, blank=True, null=True)
    created = models.DateTimeField(auto_now_add=True)

I am trying to create a timeseries object that counts votes by day or week and poll:

votes = Vote.objects.to_timeseries(['id', 'poll'], pivot_columns='poll', verbose=True,
                                 index='created', freq='D', rs_kwargs=dict(how='count'))
votes.head()
=>
                            id  poll
created     
2015-05-28 00:00:00+00:00   15  15
2015-05-29 00:00:00+00:00   55  55
2015-05-30 00:00:00+00:00   61  61
2015-05-31 00:00:00+00:00   15  15
2015-06-01 00:00:00+00:00   112 112
(...)

Poll has several values, say 'A', 'B' as its string representation. I would expect something along the lines of:

                            id  poll_A  poll_B
created     
2015-05-28 00:00:00+00:00   15  7   8
2015-05-29 00:00:00+00:00   55  20  35
2015-05-30 00:00:00+00:00   61  30  31
2015-05-31 00:00:00+00:00   15  8   7
2015-06-01 00:00:00+00:00   112 60  52
(...)

What am I missing?

Explicitely specify DataFrameQuerySet.to_[*] kwargs

Currently, a kwargs dict is used to parse arguments in those methods.

A lot of code could be removed by explicitely specifying those arguments in the method declaration. Plus this eases code autocompletion, introspection, etc.

Before working on #9, I would like to fix this. Can I?

DataFrameManager.get_query_set method warning.

Hello,

I have been seeing the following warning when using django-pandas. Should this function be renamed?

C:\Python27\lib\site-packages\django_pandas\managers.py:183: RemovedInDjango18Warning: DataFrameManager.get_query_set method should be renamed get_queryset.
class DataFrameManager(PassThroughManager):

Kind regards,

Dan.

Django 1.9 support

Started working with Django Pandas lately and like it a lot :)
A few days ago I migrated my Django install to a new server running Django 1.9 and a couple of things broke, like django-pandas. I figured this is partially due to managers.py class DataFrameManager which depends on class PassThroughManager but PassThroughManager has been removed from latest django-utils. Not so sure how to resolve, perhaps something like https://docs.djangoproject.com/en/1.9/topics/db/managers/ (from_queryset). For example:

managers.py

from django.db import models
...
...
class BaseManager(models.Manager):
    def manager_only_method(self):
        return

DataFrameManager = BaseManager.from_queryset(DataFrameQuerySet)

Apart from that, io.py requires ValuesQuerySet but that has been removed from Django 1.9 as well..

Update io.py to something like:


if fieldnames:
        if index_col is not None and index_col not in fieldnames:
            # Add it to the field names if not already there
            fieldnames = tuple(fieldnames) + (index_col,)

        fields = to_fields(qs, fieldnames)
    elif isinstance(qs, django.db.models.query.QuerySet):
        if django.VERSION < (1, 8):
            annotation_field_names = qs.aggregate_names
        else:
            annotation_field_names = list(qs.values().query.annotation_select)

        fieldnames = list(qs.values().query.values_select) + annotation_field_names + list(qs.values().query.extra_select)

        fields = [qs.model._meta.get_field(f) for f in list(qs.values().query.values_select)] + \
                 [None] * (len(annotation_field_names) + len(list(qs.values().query.extra_select)))
    else:
    fields = qs.model._meta.fields
        fieldnames = [f.name for f in fields]

    if isinstance(qs, django.db.models.query.QuerySet):
        vqs = qs.values()
        recs = list(vqs)

Dataframe and many-to-many relationship

Is it possible to use django-pandas with models that stay in many-to-many relationship? It seems that currently this feature is not supported. E.g., for

class Topping(models.Model):
    name = models.CharField(max_length=30)

class Pizza(models.Model):
    name = models.CharField(max_length=50)
    toppings = models.ManyToManyField(Topping)

qs_pizza_with_toppings = Pizza.objects.all().prefetch_related('toppings')
df_pizza_with_toppings = read_frame(qs_pizza_with_toppings)

df_pizza_with_toppings does not contain any topping names.

Unable to use django-pandas

My first attempt, so I may have missed something. Running Django 1.6 and pandas 0.14. I have tried this:

from locales.models import Place
# the Place model has a pandas_data = DataFrameManager()
qs = Place.pandas_data.all()  
df = qs.to_dataframe(['code', 'type',])

but got this error trace:

  File "<console>", line 1, in <module>
  File "/home/creation/.virtualenvs/s2s/local/lib/python2.7/site-packages/django_pandas/managers.py", line 166, in to_dataframe
    qs = self.values_list(*fields)
  File "/home/creation/.virtualenvs/s2s/local/lib/python2.7/site-packages/django/db/models/query.py", line 535, in values_list
    _fields=fields)
  File "/home/creation/.virtualenvs/s2s/local/lib/python2.7/site-packages/django/db/models/query.py", line 849, in _clone
    c._setup_query()
  File "/home/creation/.virtualenvs/s2s/local/lib/python2.7/site-packages/django/db/models/query.py", line 992, in _setup_query
    self.query.add_fields(self.field_names, True)
  File "/home/creation/.virtualenvs/s2s/local/lib/python2.7/site-packages/django/db/models/sql/query.py", line 1533, in add_fields
    name.split(LOOKUP_SEP), opts, alias, None, allow_m2m,
AttributeError: 'list' object has no attribute 'split'

I found I was able to access data directly just using "plain" pandas as follows:

import pandas as pd
from locales.models import Place
qs = Place.objects.all()
df = pd.DataFrame.from_records(qs.values('code', 'type'))

UnboundLocalError: local variable 'field' referenced before assignment

First, thanks for this great package, which is so much useful! I love it :)

I'm getting this error while trying to do a queryset.to_dataframe(['field'], index='id'). With queryset.to_dataframe(['id', 'field']).set_index('id') it works perfectly.

Here is my traceback:

  File "/cygdrive/c/Users/N6601004/.virtualenvs/intermediation/lib/python2.7/site-packages/django_pandas/managers.py", line 258, in to_dataframe
    index_col=index, coerce_float=coerce_float)
  File "/cygdrive/c/Users/N6601004/.virtualenvs/intermediation/lib/python2.7/site-packages/django_pandas/io.py", line 113, in read_frame
    update_with_verbose(df, fieldnames, fields)
  File "/cygdrive/c/Users/N6601004/.virtualenvs/intermediation/lib/python2.7/site-packages/django_pandas/utils.py", line 83, in update_with_verbose
    for fieldname, function in build_update_functions(fieldnames, fields):
  File "/cygdrive/c/Users/N6601004/.virtualenvs/intermediation/lib/python2.7/site-packages/django_pandas/utils.py", line 72, in build_update_functions
    for fieldname, field in zip(fieldnames, fields):
  File "/cygdrive/c/Users/N6601004/.virtualenvs/intermediation/lib/python2.7/site-packages/django_pandas/io.py", line 26, in to_fields
    yield field

My environment:

$ pip freeze
astroid==1.4.5
backports.shutil-get-terminal-size==1.0.0
colorama==0.3.7
Cython==0.24
decorator==4.0.9
Django==1.8.13
django-nose==1.4.3
django-pandas==0.4.1
geopy==1.11.0
h5py==2.6.0
ipython==4.2.0
ipython-genutils==0.1.0
lazy-object-proxy==1.2.2
mysql-connector-python==2.1.3
names==0.3.0
nose==1.3.7
numpy==1.11.0
pandas==0.17.1
pathlib2==2.1.0
pep8==1.7.0
pexpect==4.1.0
pickleshare==0.7.2
ptyprocess==0.5.1
pylint==1.5.5
python-dateutil==2.5.3
pytz==2016.3
PyYAML==3.11
scipy==0.17.1
simplegeneric==0.8.1
six==1.10.0
sqlparse==0.1.19
Tempita==0.5.2
traitlets==4.2.1
Unidecode==0.4.19
wrapt==1.10.8

Loading data back to the database model

Got this via an email from [email protected]

have a question related to putting the data back into the database. In the documentation it is stated that the django_pandas.io also handle saving data to the underlying model. However I don't see how it works ? or which method to call ? Can you help me out ?

Avoid dependency management redundancy in travis ci build instructions

The current .travis.yml file install section looks like:

install:
  - pip install $DJANGO
  - pip install coverage coveralls
  - pip install numpy>=1.6.1
  - pip install django-model-utils >=1.4.0
  - pip install pandas>=0.12.0

To avoid dependency management redundancy, would it be a good idea to move to something like this?

install:
  - pip install $DJANGO
  - pip install coverage coveralls
  - pip install .

Build failing with python3

Failure at test_io.verbose() as qs.values_list('trader__pk', flat=True) returns a list of strings but df1.trader.tolist() returns a list of objects.

object has no attribute '_iterable_class' error

qs = Measurement.objects.get(id = id)
df = read_frame(qs)

I am using the following code to convert my queryset to pandas dataframe for a single object but it is showing the following error

object has no attribute '_iterable_class'

Does the library does'nt support get() method

Not working for me, are the docs wrong?

None of the manager methods are working for me. I've installed the latest version from Github using pip as described in the docs, also added django_pandas to INSTALLED_APPS. Then I am doing:

from django_pandas.managers import DataFrameManager
class Foo(models.Model):
    bar = models.DecimalField(decimal_places=2, max_digits=18)
    foo_date = models.DateTimeField()
    objects = DataFrameManager()

And when I do:

Foo.to_dataframe()

I get the following error:

AttributeError: type object 'Foo' has no attribute 'to_timeseries'

Full traceback:

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/django/contrib/staticfiles/handlers.py", line 72, in __call__
    return self.application(environ, start_response)
  File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/wsgi.py", line 255, in __call__
    response = self.get_response(request)
  File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/base.py", line 178, in get_response
    response = self.handle_uncaught_exception(request, resolver, sys.exc_info())
  File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/base.py", line 217, in handle_uncaught_exception
    return debug.technical_500_response(request, *exc_info)
  File "/home/dan/Envs/acerayenv/lib/python2.7/site-packages/django_extensions/management/technical_response.py", line 5, in null_technical_500_response
    six.reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/base.py", line 115, in get_response
    response = callback(request, *callback_args, **callback_kwargs)
  File "/home/dan/test_app/app/views.py", line 79, in calcs
    print  Foo.to_timeseries()
AttributeError: type object 'Foo' has no attribute 'to_timeseries'

Are properties accessible in to_dataframe()

If I have a model with a property

    class Item(models.Model):
        objects = DataFrameManager()
        # fields

        @property
         def someProperty:
             # returns something

Is there anyway to include this property in the dataframe via something like this:

 Item.objects.to_dataframe(['field_1',...,'field_n'],properties=[''someProperty"])

Thanks!

model._meta.get_all_related_objects_with_model() fails in newer django-versions

Hello,

I changed the get_queryset method for my model via django.db.models.Manager, like:

from django.db.models import Model, Manager, IntegerField

class NewModelManager(Manager):
    def get_queryset(self):
        return super(NewModelManager, self).get_queryset().annotate(
            new_field=F('a') * F('b')
        )
class NewModel(Model):
    a = IntegerField('First')
    b = IntegerField('Second')

    objects = NewModelManager()

after doing so readframe() does not work with paramter verbose set to True (setting verbose=False works), because function _model.meta.get_all_related_objects_with_model() is deprecated:

qs = cls.objects.all()
df = read_frame(
    qs,
    fieldnames=['a', 'b', 'new_field'],
    verbose=True
)

unable to install pandas in django but with python

I have installed django 1.6 and pandas 0.14.0. when i try import pandas in python shell ,it shows no error and works fine but after i have added "django_pandas" in INSTALLED_APPS . my django shell is not running and showing
ImportError: No module named django_pandas

Question about using dataframe_from_qs

Hey there, i'm not a real dev, just trying to hack together a prototype of something. I want to use pandas to transpose a table from my database and the first step is to convert the queryset to a dataform and I stumbled upon dataframe_from_qs and was hoping it would help me out.

I put the function in my views.py and and then called it as so:

def index(request):
def index(request):
qs = Tablet.objects.all()
df = dataframe_from_qs(qs)
df.transpose()
return HttpResponse(df)

When I run it I get an error that a list object has no next referring to this line in dataframe_from_qs() :
df = DataFrame.from_records(rows.next(), columns=columns, coerce_float=True)

Is there something simple I'm missing here? I should add, I know that HttpResponse is kind of pointless. I'm just using it to see what I get. I'm planning to them pass everything to a proper template.

Thanks,
-patrick

Handling of django.db.models.query.ValuesQuerySet

Thoses queries are not properly handled by the library since it uses the values_list method.

For instance, if you define the following queryset
qs = Object.models.all().values("a", "b").annotate(c=Sum("c"))

it should return a dataframe such as

   a  b  c
0  a  b  1

which is not the case.
Do you agree?

UnicodeDecodeError when installing django-pandas with pip3 in Docker

build	25-Jan-2018 09:47:12	Collecting django-pandas==0.5.0 (from -r /tmp/requirements.txt (line 40))
build	25-Jan-2018 09:47:12	  Downloading django-pandas-0.5.0.tar.gz
build	25-Jan-2018 09:47:12	    Complete output from command python setup.py egg_info:
build	25-Jan-2018 09:47:12	    Traceback (most recent call last):
build	25-Jan-2018 09:47:12	      File "<string>", line 1, in <module>
build	25-Jan-2018 09:47:12	      File "/tmp/pip-build-_xb4994q/django-pandas/setup.py", line 5, in <module>
build	25-Jan-2018 09:47:12	        open('README.rst').read() + '\n\n' +
build	25-Jan-2018 09:47:12	      File "/usr/lib/python3.5/encodings/ascii.py", line 26, in decode
build	25-Jan-2018 09:47:12	        return codecs.ascii_decode(input, self.errors)[0]
build	25-Jan-2018 09:47:12	    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1046: ordinal not in range(128)

This is caused by line "Hélio Meira Lins" in README.rst.

Similar issues in other projects:

Here is how to fix it by using codecs.open() instead of plain open():

install numpy and pandas in django virtualenv

I tried to install with pip install numpy and pandas in django virtualenv but it didn't work. Everytime the same response: "Unable to find vcvarsall.bat". I tried to install those libraries in the folder "Scripts" where I normally install the libraries with django.
May someone help me?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.