Coder Social home page Coder Social logo

peopledoc / django-ltree-demo Goto Github PK

View Code? Open in Web Editor NEW
89.0 12.0 12.0 18 KB

A demo for storing and querying trees in Django using PostgreSQL

License: MIT License

Python 100.00%
django hierarchical-data ltree postgresql python trees ghec-mig-migrated approved-public

django-ltree-demo's Introduction

How to store trees with Django & PostgreSQL

Rationale

If you ever had the need to store hierarchical data (trees) with Django, you probably had to use a library like django-mptt or django-treebeard. Those libraries work fine at a small scale, but here at PeopleDoc we have encountered a lot of issues when using them at a bigger scale (tables with hundreds of thousands of rows and quite a lot of writings).

It turns out that storing trees in a database has been a solved problem since a long time, at least with PostgreSQL. The ltree extension provides a convenient data structure which is very fast on reads, and with almost no impact on writes. The algorithm used is very close to django-treebeard's materialized paths, but with all the power of PostgreSQL.

The main downside of using ltree is that you have to maintain the materialized path yourself. It doesn't come with any tool to do it automatically. But fortunately, it's actually quite simple to maintain this path using PostgreSQL triggers!

Integration with Django

In demo/categories/ltree.py you will find a very simple Django field for the ltree data type. This field can be used in any Django model, and adds two lookups: descendant and ancestor. Those lookups allow you to query the descendants or the ancestors of any object with a very simple SQL query.

For example, let's say you have the following model:

from django.db import models

from project.ltree import LtreeField


class Category(models.Model):
  parent = models.ForeignKey('self', null=True)
  code = models.CharField(maxlength=32, unique=True)
  path = LtreeField()

The path field represents the path from the root to the node, where each node is represented by its code (it could also be its id, but using the code is more readable when debugging). For example, if you have a genetic category, under a science category, under a top category, its path would be top.science.category.

Thanks to the descendant and ancestor lookups, the get_descendants method in django-mptt can be rewritten as:

def get_descendants(self):
    return Category.objects.filter(path__descendant=self.path)

This would generate a SQL query close to:

SELECT * FROM category WHERE path <@ 'science.biology'

The magic part: PostgreSQL triggers

If you add a ltree field to your model, you will have to keep the field up-to-date when inserting or updating instances. We could do that with Django signals, but it turns out that PostgreSQL is far better for maintaining integrity & writing efficient code.

Every time we insert or update a row, we can reconstruct its path by appending its code to the path of its parent. If the path has changed, we'll also need to update the path of the children, which can be written as a simple UPDATE query.

All that can be done easily with PostgreSQL triggers. You can find an implementation of those triggers in the file demo/categories/sql/triggers.sql.

The demo

In the demo, the following files are the most important:

How to install the demo

  • Create & activate a virtualenv
  • Install the dependencies with pip install -r requirements.txt
  • Install PostgreSQL with your favorite way
  • Export the PGHOST and PGUSER variables accordingly
  • Create the django_ltree_demo table
  • Run python manage.py migrate
  • Launch the test with pytest -v

Conclusion

With a few lines a declarative, idiomatic Django code and ~50 lines of SQL we have implemented a fast and consistent solution for storing and querying trees.

Sometimes it's good to delegate complicated data manipulation to the database instead of doing everything in Python :) .

django-ltree-demo's People

Contributors

amaury1093 avatar k4nar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

django-ltree-demo's Issues

ltree demo adaptation- issues with path

hello, thanks for the repo and the example. I tried to adapt it in this way:
this is the 'employee' table from a django model:

created       | timestamp with time zone |           | not null |
modified      | timestamp with time zone |           | not null |
guid          | uuid                     |           | not null |
comp_user_id  | character varying(150)   |           | not null |
comp_username | character varying(150)   |           | not null |
comp_email    | character varying(254)   |           |          |
id            | integer                  |           | not null |
path          | ltree                    |           |          |
first_name    | character varying(150)   |           |          |
last_name     | character varying(150)   |           |          |
manager_id    | integer                  |           |          |
user_id       | integer                  |           |          |
Indexes:
    "corp_companyeuserprofile_pkey" PRIMARY KEY, btree (id)
    "corp_companyeuserprofile_user_id_key" UNIQUE CONSTRAINT, btree (user_id)
    "corp_companyeuserprofile_comp_email_6f7503d7" btree (comp_email)
    "corp_companyeuserprofile_comp_email_6f7503d7_like" btree (comp_email varchar_pattern_ops)
    "corp_companyeuserprofile_comp_user_id_328cb82e" btree (comp_user_id)
    "corp_companyeuserprofile_comp_user_id_328cb82e_like" btree (comp_user_id varchar_pattern_ops)
    "corp_companyeuserprofile_comp_username_9eec69b2" btree (comp_username)
    "corp_companyeuserprofile_comp_username_9eec69b2_like" btree (comp_username varchar_pattern_ops)
    "corp_companyeuserprofile_guid_e3160b25" btree (guid)
    "corp_companyeuserprofile_manager_id_2491a6e2" btree (manager_id)
    "cup_path_btree_idx" btree (path)
    "cup_path_gist_idx" gist (path)
Check constraints:
    "check_no_recursion" CHECK (index(path, id::text::ltree) = (nlevel(path) - 1))
Foreign-key constraints:
    "corp_comp_manager_id_2491a6e2_fk_cornersto" FOREIGN KEY (manager_id) REFERENCES corp_companyeuserprofile(id) DEFERRABLE INITIALLY DEFERRED
    "corp_comp_user_id_39765502_fk_users_use" FOREIGN KEY (user_id) REFERENCES users_user(id) DEFERRABLE INITIALLY DEFERRED
Referenced by:
    TABLE "corp_companyeuserprofile" CONSTRAINT "corp_comp_manager_id_2491a6e2_fk_cornersto" FOREIGN KEY (manager_id) REFERENCES corp_companyeuserprofile(id) DEFERRABLE INITIALLY DEFERRED
Triggers:
    cup_path_after_trg AFTER UPDATE ON corp_companyeuserprofile FOR EACH ROW WHEN (new.path IS DISTINCT FROM old.path) EXECUTE PROCEDURE _update_descendants_manager_path()
    cup_path_insert_trg BEFORE INSERT ON corp_companyeuserprofile FOR EACH ROW EXECUTE PROCEDURE _update_manager_path()
    cup_path_update_trg_two BEFORE UPDATE ON corp_companyeuserprofile FOR EACH ROW EXECUTE PROCEDURE _update_manager_path()

I also adapted the triggers:

-- function to calculate the path of any given manager
CREATE OR REPLACE FUNCTION _update_manager_path() RETURNS TRIGGER AS
$$
BEGIN
    IF NEW.manager_id IS NULL THEN
        NEW.path = NEW.id::text::ltree;
    ELSE
--           SELECT concat_ws('.', path::text, NEW.id::text)::ltree
          SELECT concat_ws('.', path::text, NEW.id::text)::ltree
          FROM comp_companyuserprofile
         WHERE NEW.manager_id IS NULL or id = NEW.manager_id
          INTO NEW.path;
    END IF;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

-- function to update the path of the descendants of a c.u.p.
CREATE OR REPLACE FUNCTION _update_descendants_manager_path() RETURNS TRIGGER AS
$$
BEGIN
    UPDATE comp_companyuserprofile
       SET path = concat_ws('.', NEW.path::text, subpath(comp_companyuserprofile.path, nlevel(OLD.path))::text)::ltree
     WHERE comp_companyuserprofile.path <@ OLD.path AND id != NEW.id;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

-- calculate the path every time we insert a new c.u.p.
DROP TRIGGER IF EXISTS cup_path_insert_trg ON comp_companyuserprofile;
CREATE TRIGGER cup_path_insert_trg
               BEFORE INSERT ON comp_companyuserprofile
               FOR EACH ROW
               EXECUTE PROCEDURE _update_manager_path();

-- calculate the path when updating the manager or the csod_user_id
DROP TRIGGER IF EXISTS cup_path_update_trg ON comp_companyuserprofile;
CREATE TRIGGER cup_path_update_trg
               BEFORE UPDATE ON comp_companyuserprofile
               FOR EACH ROW
               WHEN (OLD.manager_id IS DISTINCT FROM NEW.manager_id
                     OR OLD.csod_user_id IS DISTINCT FROM NEW.csod_user_id)
               EXECUTE PROCEDURE _update_descendants_manager_path();

-- if the path was updated, update the path of the descendants
DROP TRIGGER IF EXISTS cup_path_after_trg ON comp_companyuserprofile;
CREATE TRIGGER cup_path_after_trg
               AFTER UPDATE ON comp_companyuserprofile
               FOR EACH ROW
               WHEN (NEW.path IS DISTINCT FROM OLD.path)
               EXECUTE PROCEDURE _update_descendants_manager_path();

However path remains always empty after inserting or updating the entire row or the 'manager_id' column.
Am I doing something wrong?
I there a way to launch the update functions ath the end of the import process instead of implementing them as triggers?

Is the AFTER UPDATE trigger really needed?

-- if the path was updated, update the path of the descendants
DROP TRIGGER IF EXISTS category_path_after_trg ON categories_category;
CREATE TRIGGER category_path_after_trg
AFTER UPDATE ON categories_category
FOR EACH ROW
WHEN (NEW.path IS DISTINCT FROM OLD.path)
EXECUTE PROCEDURE _update_descendants_category_path();

I can't get my head around why the above trigger is needed. Imo it will do the same job again as the BEFORE UPDATE trigger.

Can you provide an example where the AFTER does something different than the BEFORE?

Consider adding a License file

I would like to take inspiration from this project, but am reluctant to do so now, because of the lack of a license.

Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.