Move ChartController's non-django methods to a lower level python module

Goal: move all the ChartController's non-django datasources methods to an external lower level python module.

I need those generators outside of Django: the generators using dictionary and Altair data objects as datasource will move to a chartflo python module. Our ChartController class would inherit from it, so that our api will not change due to this modification. The class will be responsible only for all the generators that take Django queries as datasources.

Advantages: the generators are usable outside of Django, as wrappers around the Altair api, generating html. We will also get a better maintainability with simplification and single responsability

Composable dashboards

The goal is to be able to compose dashboards embedding different charts.

Principle: the data is pre-aggregated and chunks of html are generated for each chart: the dashboard assemble these chunks. It would be to costly too query data for multiple charts at each request. Options for building the charts:

Query constructor

This option has already been explored in the server_side_charts branch. Principle: to use a dashboard constructor in the admin interface with these models:

Query: an individual query with filters, using a custom line protocol to define the filters
Question: a chart composed by one or multiple queries. This is where the chunks of html live: a question must produce the html to render the chart. It uses the Vega Lite conventions to declare fields and for data transformation. It stores the chart data in html format and Vega Lite format. Has m2m relation to Query.
Dashboard: assemble questions to render in a view. Has m2m relation with Question

This question constructor actually generate the charts html on save. The question of chart regeneration when data changes is left apart and will be treated separately.

Generators

Principle: define questions in the code, assembling queries just like we do now to define the charts data. Some process should be able to generate the data based on triggers.

The generated html chunks would live in a model like Chart and could be assembled to compose a dashboard.

Support for Chartjs

I started to add Chartjs support. It works with the same api in Dataswim and is integrated in the Chartflo dashboards. The Chartjs related code is isolated in aPychartjs module.

Chartjs has advantages: the charts are nice and it is easy to use, as it has less abstraction layers than the other rendering engines we use.

Serve ChartsView over rest

The ChartsView's default template actually extends base.html, it would be good to have a way to serve this over rest, not extending the base template.

Will implement with a request.is_ajax() check.

Server side rendering engine

I made a new branch using the Bokeh library to render the charts from python. This approach is very different from what we use now.

[Edit]: it is here

How it works

The queries are constructed in the admin interface. When a Question object is saved from the admin the chart is generated by python: the resulting html and js are stored in the database.

When a user requests a chart, it will just fetch the corresponding html without expensive queries. This made it possible to code a dashboard that aggregates several charts. It would also be possible to generate files and call the charts from templates resulting in no queries at all.

Datastructure

Models:

Filter: defines a specific filter
Query: defines a query, m2m link to filters
Question: defines a chart, aggregates queries in m2m relation
Dashboard: defines a set of questions to assemble in a view, m2m with Question

Demo code

To run the demo code see the install instructions

Status

This has just been made up together and needs quite a lot of work to be usable. Partial list:

Add options for operators
Add timeline chart and manage date fields
Make a models registration mechanism to regenerate the data on change and/or use a time based worker
Make the png images generation work
Add more chart types

I will definitely continue to code in this direction as I find this approach more adapted to what I need and very productive and maintainable.

Remove empty files

When a file is empty, it should likely be deleted.

Goal

Clean codebase by removing unused/empty files.

Task

Remove the following empty files:

/chartflow/admin.py
/chartflow/models.py
/chartflow/tests.py - that or write at least one test! 😄

Add pypi badges to project README

After publishing on pypi (#9), we can add badges to our README to show our pypi pride. Consider the pypi badges on shields.io and add the relevant pypi badges to this project's README

Create documentation and host it at Read the docs

The readme starts to be big. It would be good to find the time to start a proper documentation.

Freeze the api

In order for this project to be easy to integrate into other projects, we need a stable API.

Goal

Create a stable API for 0.x series releases.

Sub-tasks

create a low-fidelity description of the hypothetical API
get feedback on the API design/description
make modifications to the API design/description based on feedback
write code to provide the basic API (remember the API can be extended, but should not break backwards compatibility in the 0.x series)

Add LICENSE file

After removing the dependency on AmCharts (issue #1), this project can be offered under an open source license.

Choose a license for the project, and add the license file to the root of the repository. The conventional name for the file is LICENSE or LICENSE.md.

Chart streaming data

Ability to update the charts with streaming data coming from websockets or other ways

Roadmap

Clean up the code
Move to Chart.js instead of Amcharts ( see #1 )
Freeze the api
Release on pip

And then add features:

Consider adding a ready to use view that could draw charts from model instances or paths
Merge django-mptt-graph: to draw hierarchical trees
Maybe merge django-chartmodels: to draw charts for models stats

Use the Vega Lite specification for charts data

I checked the Vega Lite spec and it looks very clear and useful to structure the data. As suggest by @brylie we could start using it for the rest views #14 , isolating the logic of serialization in something like serializers.py so that we can reuse it later if needed.

About Altair: it looks very similar to Bokeh. I'll definitely give it a shot and explore the python-generated charts way

Clean up the code for 0.2.0 release

Identify parts of the code to clean up in preparation for 0.2.0 release, such as:

software libraries incompatible with open source license terms
inconsistent naming conventions
boilerplate code or markup (e.g. placeholder paragraphs or 'dummy' headers)
commented out code
unused files/folders

Support for Holoviews / Bokeh rendering engine

Support for Holoviews with the Bokeh rendering engine has been added #13 . We can now choose what rendering engine to use for each chart. It is possible to compose dashboards that use both engines.

Technically many things have changed. All the charts generation logics now live in an external module: Dataswim, a data analytics library of my composition that is equipped to handle charts.
The generation logics for Altair is isolated in a specific module and freezed while waiting for Altair 2.

Chartflo is now only responsible for Django related stuff: mostly the dashboards view and the events-based charts generation. As a result both the code and api had been drastically simplified: ex:

from chartflo.charts import chart
from django.contrib.auth.models import User

# get the data
all_users = User.objects.filter(is_active=True)
staff = all_users.filter(is_staff=True).count()
superusers = all_users.filter(is_superuser=True).count()
users = all_users.filter(is_superuser=False, is_staff=False).count()

# declare the data
data = [users, staff, superusers]
index = ["Users", "Staff", "Superusers"]
columns = ["Number"]
chart.load_data("Groups", data, columns=columns, index=index)

# get the chart
c = chart.draw("Groups", "Number", chart_type="bar")
# now in a jupyter notebook you can type:just 'c' to draw the chart
# store the chart for later exporting
chart.stack("registrations1", "User registrations 1", c)
# ... make other charts
# then export to files in the folder templates/data/html
chart.export("data/html")

Example of a generator that takes the user registration dates, aggregate them by one day and draw a line chart using Bokeh and points chart using Altair:

from dataswim import ds
from django.contrib.auth.models import User
from chartflo.charts import chart


def run(events=None):
    # 1. crunch data
    q = User.objects.all()
    # load data from a django query
    ds.load_django(q, dateindex="date_joined")
    # keep only the relevant data
    ds.keep("date_joined", "username")
    # resample data by one day periods
    ds.rsum("1D")
    # 2. draw charts
    # Note: ds.df is a pandas DataFrame instance
    c = chart.draw("date", "num", ds.df, "line")
    chart.stack("registrations1", "User registrations 1", c)
    chart.engine = "altair"
    x = ("date", "date:T")
    y = ("num", "num:Q")
    # if no dataset is passed, it will use the previously declared one
    c2 = chart.draw(x, y, chart_type= "circle")
    chart.stack("user_registrations", "User registrations", c2)
   # Write the charts html to files
    chart.export("data/html")

The Holoviews Bokeh rendering engine is now default. Altair 1 is getting old and is lagging behind: we are waiting for Altair 2. Furthermore Bokeh is more powerful and offers a very nice interactivity.

To resume we now have:

More power
More lazyness

I will update the doc and the examples soon and release.

[Edit]: comments in code

Online demo

An online demo is now available. It is a dashboard with inflation numbers, showing line charts and layouts with Bokeh.

To run the generator install the demo locally from the repository and run:

python3 manage.py gen inflation

Make an 'initial' release

Since this code works, it deserves an 'official' release. Currently, the GitHub releases tab for this project is empty. Go ahead and make a '0.1.0' release, if you want to use semantic versioning.

Support multiple rendering engines

It could be good to add support for multiple rendering engines. The plan is actually to replace Amcharts by Charts.js. We could take the opportunity of refactoring to implement it. This would make the module extensible and less tight to one library. There are a lot of great ones in javascript that we could benefit from.

I will implement this so that we can keep the actual working js library as default, work on the switch, and just change the defaults when it is done then we kick out Amcharts after that.

I am actually learning Bokeh which renders charts directly from python. It would work differently but has significant advantages: it generates chunks of html so that it is much less work in frontend maintenance and zero javascript fatigue. It can also produce png and svg images which can be nice as a lightweight alternative for embedded material: once generated they would be way much cheaper to serve and render than js and db hits.
This approach could match the case of using pre-agregated data that is in the proposal #10 as it always has to pregenerate the charts, on-demand would be too costly. I may later make a branch to see if it could fit in the mix, when I will be more comfy working with Bokeh.

Ability to draw charts

Thinking about it I should put back the ability to draw charts inside this module: it would be more convenient for people, and it was made for this after all. This would mean inheriting from the Plot class of Dataswim where the charting logics now live and have this dependency.

I will also refactor with a class that has methods instead of importing functions all the time. So the api will change ... again, but we are close to reach something usable: stability is not that far anymore.

Eliminate ChartsView

Proposal: to eliminate the old ChartsView that was used to display only one chart. Now that we have the dashboards this appears to be useless: charts can be included directly in templates now.

I made a generic dashboard view so that users don't have to setup a view.

Add docstrings to all class and function definitions

In preparation for the 0.2 release, it would be helpful to make the source code as well-documented as possible. To this end, every class and function should have a docstring.

Goal

Well-documented sourcecode.

Task

Add docstrings in the following places:

Development roadmap and directions

Some work has been done regarding to our main immediate objective: to stop using Amcharts. We are actually structuring the module so that it can be extensible

What we have now

A serialization engine that uses Altair to produce some Vega Lite data
A generic view that can draw one chart from a query or a dataset
The Vega Lite rendering engine for charts

What we would need

A first prototype of external javascript library integration to get an idea on how it would work and how to convert the data in the appropriate format for the particular lib. Chart.js was our initial idea #1
To refine the serializers and review the possible options

Proposal: to switch from Amcharts using the Vega Lite rendering engine and make the Vega Lite serialization a standard to pass data around and try to make a first stable release with this. It is easy to add new rendering engines using the CHARTFLO_ENGINE setting.

@brylie: for now I will concentrate on the core, specially the serializers and the api design. If you wish to take care of the external js integration mechanisms and try with one lib feel free to go ahead in a branch. This is an important task to see if we are really heading in the right direction

Directions of research

The goal is to make composable views that can render multiple charts. There are different ways beeing currently explored:

Server side data aggregation with a query constructor in the admin interface. The chart data is stored in the database and rendered statically. The data can be stored in VL format or as pre-generated html
If we have static data we can generate files to be automatically included in templates, eliminating queries, it can be interesting
Consider serving charts from templatetags
Make dashboards that integrates multiple charts

The query constructor and dashboard parts are quite advanced but still needs some work, see the dashboards branch. I'll post details in another issue about the possible options for data pre-aggregation: this brings some challenges in, specially how to handle data changes

Create 0.2.0 milestone

If you want to use semantic versioning, it might be helpful to create a GitHub milestone to track work towards the next release.

After creating the initial release (issue #4), the next version would be 0.2.0. Create the 0.2.0 milestone, and assign the '0.2.0 Roadmap' task (issue #3) to that milestone.

Completely remove amcharts from repository if no longer in use

Based on discussion in #20, the amcharts library is being deprecated. If it is no longer needed, remove the amcharts directory entirely from this repository.

Display tabular data in dashboards

For info I made a module that can generate chunks of html to display tabular data in the dashboards: https://github.com/synw/django-tabular

I isolated it from Chartflo since it is not strictly related to charts. No screenshot yet: the module is just starting and some feature are to be implemented like sorting, pagination, custom filters.

It is designed to be included in Chartflo's generators and work the same way as chart generation: it produces chunks of html to be included in dashboards

Improve project test coverage

Lets start playing the test coverage game. The only rule is:

a change to the codebase should not reduce test coverage

Right now, test coverage is at zero % (woohoo! We're winning! Dangit, I lost the game.)

Goal

Increase the project test coverage by a teensy-weensy bit.

Task

Pick one or more function(s) and write a test case.

Add support for hierarchical trees

It would be nice if this charting package could support hierarchical trees.

Goal

Add support for hierarchical trees, by incorporating a charting library/component that provides tree layouts.

Possible solution

Merge django-mptt-graph to draw hierarchical trees

Use Altair to encode data to the Vega Lite format

Instead of a custom encoder use Altair to encode the data to the Vega Lite format. It is very convenient, for example to set encoding. I started to implemented this in the vegalite branch.

This means depending on altair and pandas: it sounds like the cleanest way to serialize the data. Anyway we would had to depend on pandas one day or the other, considering how good and widely used it is for the kind of tasks we are doing.

The declarative approach is really nice and we can still use other js libs to render the charts, isolating the boring imperative stuff in individual units for different rendering engines

About the API surface

I wonder about what the top level api should cover. I made a few methods for convenience and these essentially wrap the equivalent ones in the underlying library. In fact I don't use them and tend to generate the charts fully with Dataswim. Where Chartflo is interesting is for distributing the charts and compose dashboards. This and the events-based autogeneration mechanism are the features we need in this module.

My question is; should I keep the methods draw, stack and export that are just wrappers? I also made a convert_dataset method that translates a Django orm query into a dataframe, but I don't use it... [Edit]: I found an usage since

My method in a case like this is to remove everything that I don't need. I don't want to maintain useless code, and the module had been vastly simplified by the use of external libs so it is not the moment to bloat it again. What do you really need? Is it ok to proceed like that?

Use consistent words for 'chart'

We may be mixing terminology when referring to charts. In the following example, we use both 'chart' and 'graph'.

django-chartflo/chartflo/views.py

Lines 26 to 28 in 6961287

    
           context["graph_type"] = self.graph_type 
        
           context["title"] = context["label"] = self.title 
        
           context["chart_url"] = self._get_chart_url()

In general, we should use the same words for things, so our code is clean and easy to follow.

Charts regeneration on data change

The responsibility of charts regeneration on data changes is left to the user. A simple example of how it can be done is in the readme. It uses and events queue to watch models and trigger regeneration when data changes

Improvements in version 0.5

A complete rewrite has been made for version 0.5. Main improvements:

Use Bulma css instead of Admin lte
Use Vuejs to make it a single page app and improve the use experience while navigating in the dashboards
Simplify the dashboards creation and focus on productivity
Remove all the bloat: the dependency tree is much lighter

Note: this module is now responsible only for the dashboards management and widgets creation, the charts generation logic has been externalized to the Dataswim module. It is possible to use any other library or code to generate the charts

The doc has been updated as well as the demo project. The doc is a bit light for the moment and needs to be improved. I'm too close to the code to see if the doc is clear enough: feedback is welcome

Release on pypi

In order for this package to be used in other projects, it would be helpful to release a PIP package.

Goal

Make it easy to integrate this package into Django projects by releasing on PIP

Sub-tasks

choose package license (issue #6)
create package description file
add relevant metadata to description file
create package (preferably with Python Wheels format)

Dashboards views

Goal : make it easy to compose dashboards with a base view and template

We provide a base template, a simplified version of Admin LTE and views to load different pages in a dashboard. Dashboards are registered in the database with the authorized groups for each.

TODO: demo and update the docs

Example Jupyter notebooks

I created a repository for Jupyter demo notebooks to run with django-extensions.

There is only one for now demonstrating how to do a basic bar chart.

Move from Amcharts to Chart.js

Due to license concerns pointed by @brylie in this discussion we plan to move to Chart.js

[Edit] I made an amcharts branch for the old code. As this module is not yet released we can use the master to work on the new code.

Add a ready to use view that could draw charts from model instances or paths

Goal

Find an easy declarative way to render charts from input parameters without having to code anything. This means some kind of view that can draw charts on demand.

Prerequisite

In all cases a questions constructor is needed in the backend. It must take the input parameters from several queries or filters, construct orm queries, get the data and package it into template variables.

I suggest to adopt the Metabase terminology here: a question is an aggregator of several queries.

Options

Url based constructor

The parameters are declared in the path that hits an endpoint and returns the chart. Using something inspired by the Influxdb line protocol can probably do the job: I will use the simple example on the readme page that compares the user types:

model_path::filter1:val1

/chart/q=auth.User::is_staff:False+auth.User::is_superuser:True+auth.User::is_staff:True;is_superuser:False

Advantage: very declarative, nothing to do for the user other than constructing a line for his request

Inconvenient: not very clear, can probably get too complex or confusing at some point

Models based constructor

The questions are constructed and stored in the database as model instances. The user then requests for a question and receives a chart.

Advantage: powerful, it makes it possible to extend the concept to custom dashboards

Inconvenient: hits the db. Lots of work to do to build a questions constructor on the frontend

The user creates a question and constructs some queries to be attached to it. To serve the data we have two possibilities:

The question object is just aware of how to get the data and will hit the db to grab it at each request. Caching may be possible.
The data is preagregated and stored so that when the user requests for a question the data is instantly returned without any extra db hit. The data aggregation can be done automatically either with a time based worker or by registering the involved models so that they will update the aggregated data on each save (see the django-mqueue code where models are registered in settings and connected to signals that watch them and perform actions on update/save/delete actions)

So now?

I realize that this approach in the end of the day will translate into building a data visualization tool in Django. It could be very useful. And why not? This is a bit ambitious but the hard work has to be done if we want to be lazy.

I will start to research about the questions constructor for the backend in another module: this is a job for django-instrospection. It is an interesting challenge, and we must get this first to be able to go forward.

@brylie: what do you think of this, does it sound more or less realistic to you?

	context["graph_type"] = self.graph_type
	context["title"] = context["label"] = self.title
	context["chart_url"] = self._get_chart_url()

	class ChartController():

synw / django-chartflo Goto Github PK

django-chartflo's People

Contributors

Stargazers

Watchers

Forkers

django-chartflo's Issues

Query constructor

Generators

How it works

Datastructure

Demo code

Status

Goal

Task

Goal

Sub-tasks

Goal

Task

What we have now

What we would need

Directions of research

Goal

Task

Goal

Possible solution

Goal

Sub-tasks

Goal

Prerequisite

Options

Url based constructor

Models based constructor

So now?

Recommend Projects

Recommend Topics

Recommend Org