maxtepkeev / architect Goto Github PK
View Code? Open in Web Editor NEWA set of tools which enhances ORMs written in Python with more features
License: Other
A set of tools which enhances ORMs written in Python with more features
License: Other
The current release on PyPi (0.5.0) has the following code in architect/databases/mysql/partition.py:
class Partition(BasePartition):
def prepare(self):
"""
Prepares table for partitioning by reconstructing table's primary key.
"""
if self.column_name not in self.pks:
self.database.execute("""
ALTER TABLE {parent_table} DROP PRIMARY KEY, ADD PRIMARY KEY ({pk}, {column});
""".format(pk=', '.join(pk for pk in self.pks), parent_table=self.table, column=self.column_name))
return self.database.execute("""
DROP PROCEDURE IF EXISTS {parent_table}_create_partition;
DELIMITER $$
CREATE PROCEDURE {parent_table}_create_partition(IN column_value VARCHAR(255))
BEGIN
DECLARE
tablename VARCHAR(255);
columntype VARCHAR(30);
{declarations}
{common_variables}
IF NOT EXISTS(
SELECT 1 FROM information_schema.partitions
WHERE table_name='{parent_table}' AND partition_name=tablename})
THEN
SELECT data_type INTO columntype
FROM information_schema.columns
WHERE table_name = '{parent_table}' AND column_name = '{column}';
{creation_variables}
SET @sql := CONCAT(
'ALTER TABLE {parent_table} ADD PARTITION ('
{checks}
');'
);
PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
END IF;
END $$
DELIMITER ;
""".format(parent_table=self.table))
As expected, it blows up with missing keys on the .format().
Python Version: 3.5
Architect Version: 0.5.5 (same with 0.5.6)
Django Version: 1.10.5
Mysql Version: 5.7.16 ( also tested with 5.7.18)
mysqlclient Version: 1.3.9 (same with 1.3.10)
Operating System: Windows 7 64bit (also tested on Ubuntu 16.04 LTS docker image)
My model:
@architect.install('partition', type='range', subtype='date', constraint='year', column='ontime')
class Measurement(models.Model):
id = models.BigAutoField(primary_key=True)
service_id = models.IntegerField(db_index=True)
value = models.FloatField(blank=False, null=True)
ontime = models.DateTimeField(blank=False, null=False)
I have made sure that architect partition command is run beforehand.
I am getting the following error when I try to create an Measurement objects in any of my views:
[2017-05-12 12:14:12,265: ERROR/MainProcess] Task api.tasks.create_measurements[c2f94f7c-0ce2-48f0-9b60-f22a2dc21cb8] raised unexpected: InterfaceError(0, '')
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/celery/app/trace.py", line 240, in trace_task
R = retval = fun(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/celery/app/trace.py", line 438, in __protected_call__
return self.run(*args, **kwargs)
File "/opt/path/to/my/service/tasks.py", line 43, in create_measurements
Measurement.objects.create(service_id=measurement['service_id'], value=measurement['value'], ontime=timestamp)
File "/usr/local/lib/python3.5/dist-packages/django/db/models/manager.py", line 85, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/django/db/models/query.py", line 399, in create
obj.save(force_insert=True, using=self.db)
File "/usr/local/lib/python3.5/dist-packages/architect/orms/django/features.py", line 103, in wrapper
if not partition.exists():
File "/usr/local/lib/python3.5/dist-packages/architect/databases/mysql/partition.py", line 36, in exists
""".format(parent_table=self.table, name=self._get_name()))
File "/usr/local/lib/python3.5/dist-packages/architect/orms/django/features.py", line 47, in select_one
self.execute(sql)
File "/usr/local/lib/python3.5/dist-packages/architect/orms/django/features.py", line 44, in execute
return self.connection.execute(sql)
File "/usr/local/lib/python3.5/dist-packages/django/db/backends/utils.py", line 64, in execute
return self.cursor.execute(sql, params)
File "/usr/local/lib/python3.5/dist-packages/django/db/utils.py", line 94, in __exit__
six.reraise(dj_exc_type, dj_exc_value, traceback)
File "/usr/local/lib/python3.5/dist-packages/django/utils/six.py", line 685, in reraise
raise value.with_traceback(tb)
File "/usr/local/lib/python3.5/dist-packages/django/db/backends/utils.py", line 62, in execute
return self.cursor.execute(sql)
File "/usr/local/lib/python3.5/dist-packages/django/db/backends/mysql/base.py", line 110, in execute
return self.cursor.execute(query, args)
File "/usr/local/lib/python3.5/dist-packages/MySQLdb/cursors.py", line 250, in execute
self.errorhandler(self, exc, value)
File "/usr/local/lib/python3.5/dist-packages/MySQLdb/connections.py", line 42, in defaulterrorhandler
raise errorvalue
File "/usr/local/lib/python3.5/dist-packages/MySQLdb/cursors.py", line 247, in execute
res = self._query(query)
File "/usr/local/lib/python3.5/dist-packages/MySQLdb/cursors.py", line 411, in _query
rowcount = self._do_query(q)
File "/usr/local/lib/python3.5/dist-packages/MySQLdb/cursors.py", line 374, in _do_query
db.query(q)
File "/usr/local/lib/python3.5/dist-packages/MySQLdb/connections.py", line 270, in query
_mysql.connection.query(self, query)
django.db.utils.InterfaceError: (0, '')
The problem exists when invoking object creation in any of the two ways:
and the creation is performed in any of my views, or a celery task. When I create the object via the shell (by running python manage.py shell ...) the object is created with no problem. More specifically:
Object creation in a view : fails
Object creation in a celery task: succeeds
Object creation via the shell: succeeds
Object creation in a view : fails
Object creation in a celery task: fails
Object creation via the shell: succeeds
I decorated a model with @architect.uninstall('partition')
then ran architect partition --module myapp.models
with the following result:
Traceback (most recent call last):
File "~/.virtualenvs/myproject/bin/architect", line 11, in <module>
sys.exit(main())
File "~/.virtualenvs/myproject/lib/python2.7/site-packages/architect/commands/__init__.py", line 93, in main
commands[command]['parser'].result(args.func(vars(args)))
File "~/.virtualenvs/myproject/lib/python2.7/site-packages/architect/commands/partition.py", line 26, in run
module_clss = filter(lambda obj: isinstance(obj, type), __import__(module, fromlist=module).__dict__.values())
File "~/Projects/myproject/myapp/models/__init__.py", line 8, in <module>
from .mymodels import * # noqa
File "~/Projects/myproject/myapp/models/mymodels.py", line 1048, in <module>
class MyModel(BaseModel):
File "~/.virtualenvs/myproject/lib/python2.7/site-packages/architect/orms/decorators.py", line 155, in __call__
self.deinit_feature(self.feature, model)
File "~/.virtualenvs/myproject/lib/python2.7/site-packages/architect/orms/decorators.py", line 171, in deinit_feature
allowed=[name for name, obj in model.architect.__dict__.items() if isinstance(obj, BaseFeature)])
AttributeError: type object 'MyModel' has no attribute 'architect'
I might be doing something wrong. The docs seem a tad too brief, though :)
I have tried configuring range partitioning based on the key date
. It works fine for inserts, but if the date
timestamp is modified to fit into another partition, the item in inserted into a new table, but the previous version stays in the original partition.
I am getting the following traceback when trying to setup my tables to include partitioning on postgres.
Traceback (most recent call last):
File "/lib/python3.5/site-packages/django/db/backends/utils.py", line 62, in execute
return self.cursor.execute(sql)
psycopg2.ProgrammingError: invalid type name "stats_repstat.date%TYPE"
LINE 6: match stats_repstat.date%TYPE;
^
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/bin/architect", line 11, in <module>
sys.exit(main())
File "/lib/python3.5/site-packages/architect/commands/__init__.py", line 93, in main
commands[command]['parser'].result(args.func(vars(args)))
File "/lib/python3.5/site-packages/architect/commands/partition.py", line 32, in run
cls.architect.partition.get_partition().prepare()
File "/lib/python3.5/site-packages/architect/databases/postgresql/partition.py", line 107, in prepare
column=self.column_name
File "/lib/python3.5/site-packages/architect/orms/django/features.py", line 44, in execute
return self.connection.execute(sql)
File "/lib/python3.5/site-packages/django/db/backends/utils.py", line 79, in execute
return super(CursorDebugWrapper, self).execute(sql, params)
File "/lib/python3.5/site-packages/django/db/backends/utils.py", line 64, in execute
return self.cursor.execute(sql, params)
File "/lib/python3.5/site-packages/django/db/utils.py", line 95, in __exit__
six.reraise(dj_exc_type, dj_exc_value, traceback)
File "/lib/python3.5/site-packages/django/utils/six.py", line 685, in reraise
raise value.with_traceback(tb)
File "/lib/python3.5/site-packages/django/db/backends/utils.py", line 62, in execute
return self.cursor.execute(sql)
django.db.utils.ProgrammingError: invalid type name "stats_repstat.date%TYPE"
LINE 6: match stats_repstat.date%TYPE;
^
architect==0.5.3
psycopg2==2.6.1
Django==1.9.7
version
-----------------------------------------------------------------------------------------------
PostgreSQL 9.4.8 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit
(1 row)
I've implemented a move_data
command as suggested in this issue. I'd like to contribute it back to the project, but my employer requires that there be a CLA in place before I am allowed to do so. Can Architect add a CLA to make it easier for us to give our enhancements back to the project?
How would be the best way to make a partition by example name:
White_list
blacklist
general_list
list_first_income
I just create model
@architect.install('partition', type='range', subtype='date', constraint='month', column='date_from')
class GroupUserActivity(models.Model):
date_from = models.DateField()
date_to = models.DateField()
period = models.PositiveSmallIntegerField(db_index=True)
and put into the table dataset with different months in date_from column. And I don't see any difference in database comparing to using without architect.
What should I do to create triggers all other staff for handling child tables automatically
Database - Postgres 9.1,
Django 1.5.4
I have a table with big amount of data inside. Just applied
@architect.install('partition', type='range', subtype='date', constraint='month', column='date_from')
and created triggers. But my data is still in master table. How I can move it to the children tables in the most efficient way?
I use Django framework
DJango 1.7, MySQL (RDS) 5.6, latest Architect.
The model prefix:
@architect.install ("partition", type = "range", subtype = "date",
constraint = "month", column = "date")
class QEvent (TimeStampedModel):
...stuff...
./manage migrate run cleanly. New database. Empty table.
$ architect partition --module qeventlog.models
architect partition: result: successfully (re)configured the database for the following models: QEvent
What MySQL has to think about it:
mysql> create database qeventlog;
Query OK, 1 row affected (0.00 sec)
mysql> show create table qeventlog_qevent\G
ERROR 1046 (3D000): No database selected
mysql> use qeventlog;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> show create table qeventlog_qevent\G
*************************** 1. row ***************************
Table: qeventlog_qevent
Create Table: CREATE TABLE `qeventlog_qevent` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`created` datetime NOT NULL,
`modified` datetime NOT NULL,
`entity` varchar(64) NOT NULL,
`source` varchar(64) NOT NULL,
`timestamp` decimal(30,6) DEFAULT NULL,
`keyname` varchar(64) NOT NULL,
`value_num` decimal(30,6) DEFAULT NULL,
`value_str` varchar(2048) DEFAULT NULL,
PRIMARY KEY (`id`,`created`),
KEY `qeventlog_qevent_c096cf48` (`entity`),
KEY `qeventlog_qevent_a34b03a6` (`source`),
KEY `qeventlog_qevent_d80b9c9a` (`timestamp`),
KEY `qeventlog_qevent_6e313b5a` (`keyname`),
KEY `qeventlog_qevent_101309c5` (`entity`,`source`,`timestamp`,`keyname`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
/*!50100 PARTITION BY RANGE (TO_DAYS(created))
(PARTITION qeventlog_qevent_y0000m00 VALUES LESS THAN (0) ENGINE = InnoDB) */
1 row in set (0.00 sec)
But on an attempted write into the table:
File "/home/jcl/src/qeventlog/qeventlog/qetask.py", line 39, in record
QEvent.bulk_import (record)
File "/home/jcl/src/qeventlog/qeventlog/models.py", line 53, in bulk_import
obj.save ()
File "/usr/local/lib/python2.7/dist-packages/architect/orms/django/features.py", line 103, in wrapper
if not partition.exists():
File "/usr/local/lib/python2.7/dist-packages/architect/databases/mysql/partition.py", line 36, in exists
""".format(parent_table=self.table, name=self._get_name()))
File "/usr/local/lib/python2.7/dist-packages/architect/orms/django/features.py", line 47, in select_one
self.execute(sql)
File "/usr/local/lib/python2.7/dist-packages/architect/orms/django/features.py", line 44, in execute
return self.connection.execute(sql)
File "/usr/local/lib/python2.7/dist-packages/django/db/backends/utils.py", line 65, in execute
return self.cursor.execute(sql, params)
File "/usr/local/lib/python2.7/dist-packages/django/db/utils.py", line 94, in __exit__
six.reraise(dj_exc_type, dj_exc_value, traceback)
File "/usr/local/lib/python2.7/dist-packages/django/db/backends/utils.py", line 63, in execute
return self.cursor.execute(sql)
File "/usr/local/lib/python2.7/dist-packages/django/db/backends/mysql/base.py", line 128, in execute
return self.cursor.execute(query, args)
File "build/bdist.linux-x86_64/egg/MySQLdb/cursors.py", line 205, in execute
File "build/bdist.linux-x86_64/egg/MySQLdb/connections.py", line 36, in defaulterrorhandler
InterfaceError: (0, '')
Suspect I'm doing something stupid?
I'm working on a data model, and in the process of creating it I'm finding that if partitions are created and I try to use SQLAlchemy's MetaData.drop_all()
method with Postgresql 9.4 the table is not dropped due to other tables inheriting my parent table.
My motivation for this question is to be able to fully automate my unit / integration tests. I'd like to be able to drop my project on a clean server with no outside dependencies except for what I've declared.
I've been trying to go over the SQLAlchemy docs to see if there's an easy way to overload the 'DROP TABLE' clause to use Postgresql's 'CASCADING' option to no avail.
Additionally, I'd like to be able to clean up my data model to drop the functions which architect installs to partition the table's data.
My current solution is to do these tasks with a SQLAlchemy event monitor for 'before_drop' where I run some queries against the information_schema. Unfortunately this must be done on a table-by-table basis so I was wondering if there's a cleaner way to perform these tasks.
Hi @maxtepkeev - Looks like several of the tests relating to Pony are failing. Is this keeping new release builds from being pushed out to pypi?
I'm hoping to help get some changes I need for a project im working on so if this is the case let me know if I can contribute.
Hello,
I'm happily using architect with django and there's a use-case I'd love to see simplified.
I'm partitioning a table on a date field by month, and the insert pattern on this table is the following :
Every month, I use a python/django/psycopg script to COPY a lot of rows with the date field set to Y-m-01. I could totally preempt table creation and load that data directly to the partition table, thus avoiding the overhead.
The requested feature is :
Extract a partition creation function from the _insert_child trigger, and optionally make it available as API on the Django model.
What do you think ?
Hi Max,
First of all, I'm excited to see that this project exists. I believe there's something missing in either the documentation or in my understanding of partitioning.
When I go to use this model, should this not have already installed at the database level? All of my data is being written to only one table, meaning partitioning doesn't seem to be working.
Any insights would be appreciated. I am likely just missing a step.
from sqlalchemy import Column, Integer, String
from sqlalchemy.ext.declarative import declarative_base
from project.app import example_engine
Base = declarative_base()
@architect.install(
'partition', type='range', subtype='string_lastchars',
constraint='2', column='example_field',
db='postgresql+psycopg2://{}:{}@{}:{}/{}'.format( 'omitted', 'omitted', 'omitted', 'omitted', 'omitted')
)
class SomeModel(Base):
__tablename__ = 'some_table_name'
__table_args__ = (Index('_field', 'example_field', 'another_example_field'), )
id = Column('id', Integer, primary_key=True)
example_field = Column('example_field', String(32))
another_example_field = Column('another_example_field', String(32))
Base.metadata.create_all(example_engine)```
if has more one db connection, have to do like this:
from architect.orms.django.mixins import PartitionableMixin
from django.db import connections
from django.db.utils import ConnectionDoesNotExist
class RoutingPartitionableMixin(PartitionableMixin):
def get_cursor(self):
try:
cursor = connections[self.PartitionableMeta.db_name].cursor()
except AttributeError:
cursor = super(RoutingPartitionableMixin, self).get_cursor()
except ConnectionDoesNotExist as err:
# need self.PartitionableMeta.db_name
raise err
return cursor
I would like something native implementation for this. Please, think about.
This is what I'm getting when trying to install the partition feature on an existing model
Source:
@architect.install('partition', type='range', subtype='date',
constraint='month', column='day')
Result:
Traceback (most recent call last):
File "/Users/alexei/.virtualenvs/myproject/bin/architect", line 11, in <module>
sys.exit(main())
File "/Users/alexei/.virtualenvs/myproject/lib/python2.7/site-packages/architect/commands/__init__.py", line 93, in main
commands[command]['parser'].result(args.func(vars(args)))
File "/Users/alexei/.virtualenvs/myproject/lib/python2.7/site-packages/architect/commands/partition.py", line 32, in run
cls.architect.partition.get_partition().prepare()
File "/Users/alexei/.virtualenvs/myproject/lib/python2.7/site-packages/architect/orms/bases.py", line 91, in get_partition
database = get_database(self.model_meta['dialect'])
File "/Users/alexei/.virtualenvs/myproject/lib/python2.7/site-packages/architect/orms/django/features.py", line 89, in model_meta
'pk': meta.pk.column,
AttributeError: 'NoneType' object has no attribute 'column'
Hello,
Thank you so much for this awesome tool.
I use postgresql 9.4 and SQLAlchemy 1.0.9.
I added this as in the doc:
@architect.install('partition', type='range', subtype='date', constraint='month', column='time_received', db=config.database.engine)
Then I run:
architect partition -m core.db.models, it returns success.
Then, when I start inserting rows from sqlalchemy, architect it creates the right tables but I find my rows in the parent and also the child table, they should be available only in my child tables.
Do you know why ?
Thanks !
SUBSTR(LOWER(NEW.<column_name>), 1, <n>)
is used in the trigger function. If the column value contains special characters it breaks. And if you chose n large enough SQL injections become possible. We fixed this using:
SUBSTR(LOWER(regexp_replace(NEW.<column_name>, '[^a-zA-Z]', '', 'g')), 1, <n>);
Which makes everything much more stable
good morning
I create index on one table for example LoginInfo column uid, all of my partition tables are also has index on column uid, but the master table doesn't. The master table's index is another column which I didn't create manually, what is the mechenism of the architect creating index, especially why it create an index on the master table which I did not specify
Thank you !
i 'm trying save
into a partitioned table. the following exception raises:
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/lib/python2.7/site-packages/django/db/models/manager.py", line 92, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/lib/python2.7/site-packages/django/db/models/query.py", line 409, in bulk_create
self._batched_insert(objs_without_pk, fields, batch_size)
File "/lib/python2.7/site-packages/django/db/models/query.py", line 938, in _batched_insert
using=self.db)
File "/lib/python2.7/site-packages/django/db/models/manager.py", line 92, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/lib/python2.7/site-packages/django/db/models/query.py", line 921, in _insert
return query.get_compiler(using=using).execute_sql(return_id)
File "/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 920, in execute_sql
cursor.execute(sql, params)
File "/lib/python2.7/site-packages/django/db/backends/utils.py", line 81, in execute
return super(CursorDebugWrapper, self).execute(sql, params)
File "/lib/python2.7/site-packages/django/db/backends/utils.py", line 65, in execute
return self.cursor.execute(sql, params)
File "/lib/python2.7/site-packages/django/db/utils.py", line 94, in __exit__
six.reraise(dj_exc_type, dj_exc_value, traceback)
File "/lib/python2.7/site-packages/django/db/backends/utils.py", line 65, in execute
return self.cursor.execute(sql, params)
DataError: query string argument of EXECUTE is null
CONTEXT: PL/pgSQL function gadget_metadata_insert_child() line 17 at EXECUTE statement
I'm using python 2.7 + django 1.7 + postgresql 9.4
the partition_column
is of type DateField
, which creates a postgresql date
column.
as columntype
casting handled properly in date partition function, my use case has nothing special,
I wonder if any one had faced this before?
ProgrammingError: record "new" has no field "end"
LINE 1: SELECT NEW.end IS NULL
^
QUERY: SELECT NEW.end IS NULL
CONTEXT: PL/pgSQL function test_table_insert_child() line 8 at IF
Hi! Thank you for making this library. I am really happy that someone has done this. I am having trouble getting it to work as-expected, though (maybe related to #34 ?)
I have annotated a model as follows. This works rather nicely to shuttle incoming data into different tables:
@architect.install('partition', type='range', subtype='string_firstchars',
constraint='2', column='state')
class Parcel(models.Model):
I then query it as follows:
qs = Parcel.objects.filter(state='15', boundary__intersects=polygon)
I would expect this query to only try to hit the table distribution_parcel_15
, but when I didn't see any performance improvement post-adding-partitions, I ran explain()
on the above, and saw that every child table was being hit.
-> Index Scan using distribution_parcel_34_boundary_idx on distribution_parcel_34 (cost=0.41..8.68 rows=1 width=1089)
Index Cond: (boundary && '0106000020E61000000100000001030000000100000005000000969EB6EA0DB763C060D33019BE503540F9408D560EB763C021FADE3CC45035402FD0AC070EB763C0E82FABB7FA503540406BAFC50DB763C0E625D2CAF8503540969EB6EA0DB763C060D33019BE503540'::geometry)
Filter: (((state)::text = '15'::text) AND _st_intersects(boundary, '0106000020E61000000100000001030000000100000005000000969EB6EA0DB763C060D33019BE503540F9408D560EB763C021FADE3CC45035402FD0AC070EB763C0E82FABB7FA503540406BAFC50DB763C0E625D2CAF8503540969EB6EA0DB763C060D33019BE503540'::geometry))
-> Index Scan using distribution_parcel_12_boundary_idx on distribution_parcel_12 (cost=0.41..8.68 rows=1 width=1805)
Index Cond: (boundary && '0106000020E61000000100000001030000000100000005000000969EB6EA0DB763C060D33019BE503540F9408D560EB763C021FADE3CC45035402FD0AC070EB763C0E82FABB7FA503540406BAFC50DB763C0E625D2CAF8503540969EB6EA0DB763C060D33019BE503540'::geometry)
Filter: (((state)::text = '15'::text) AND _st_intersects(boundary, '0106000020E61000000100000001030000000100000005000000969EB6EA0DB763C060D33019BE503540F9408D560EB763C021FADE3CC45035402FD0AC070EB763C0E82FABB7FA503540406BAFC50DB763C0E625D2CAF8503540969EB6EA0DB763C060D33019BE503540'::geometry))
...
and EXPLAIN SELECT COUNT(*) FROM distribution_parcel WHERE state='15';
gives:
-> Parallel Index Only Scan using state_34 on distribution_parcel_34 (cost=0.43..4.44 rows=1 width=0)
Index Cond: (state = '15'::text)
-> Parallel Index Only Scan using state_12 on distribution_parcel_12 (cost=0.43..4.44 rows=1 width=0)
Index Cond: (state = '15'::text)
...
Am I fundamentally misunderstanding partitions, or what?
Hi.
I'm not able to initialize architect using Django
I'm trying using:
architect partition --module mdata.models
where mdata is my app and models the module with my partitioned model.
I'm using virtualenv and my current path is where "mdata" exists.
I have a database owned by django:
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
------------------+----------+----------+-------------+-------------+-----------------------
mobilesurvey | django | UTF8 | en_US.utf8 | en_US.utf8 |
... and my app connects to the database with the django database user, but the partitions are being created for the postgres user:
Schema | Name | Type | Owner
--------+----------------------------+-------+----------
...
public | rm_signup_survey_19_19 | table | postgres
public | rm_signup_survey_23_23 | table | postgres
...
My app was having problems inserting records, so I tried it by hand, using the django user, and I received:
ERROR: permission denied for relation rm_signup_survey_19_19
I have another server where this is working correctly (inserts succeeding), and the partition tables are being created as the django user there:
Schema | Name | Type | Owner
--------+----------------------------+-------+----------
...
public | rm_signup_survey_5_5 | table | django
public | rm_signup_survey_6_6 | table | django
...
... and I assume that's the problem.
The database owner is still django, but the permissions do look different:
Name | Owner | Encoding | Collate | Ctype | Access privileges
--------------+----------+----------+-------------+-------------+-----------------------
mobilesurvey | django | UTF8 | en_US.utf8 | en_US.utf8 | =Tc/django +
| | | | | django=CTc/django +
| | | | | reader=c/django
How can I fix this?
architect is creating the partitions correctly, but how can I query the partition directly if I'm using Django?
Also, is there a way to delete the records from the parent table after inserting them into the child table?
Hello.
Is the foreignkey limitation only when you try to partition based on a column which is a foreignkey? Or is it like you cannot partition a table if any of its columns are involved in foreignkeys?
I'm using PostgreSQL and have followed all the methods described in the docs. When I add partition to a table with the foreignkey column as the column name, it prevents with foreignkey violation, again if I re-create the partition based on a date field which is not a foreignkey it also gives the same error.
I was just asking if I was doing something very wrong here?
Now that Postgresql 10 and 11 support declarative partitioning, it would be nice if the architect install decorator had an option where the BEFORE INSERT trigger could be created that generates CREATE TABLE {tablename} PARTITION OF {parent_table}
, rather than the (legacy) "INHERITS" form.
Is this supposed to be functional? If, for example I have a Django model that is partitioned over date and one changes the date of that object such that it would cross a partitioning boundary (to go in the next month and therefore the next partition, say), shouldn't the object simply get saved in the new partition? This appears to not be supported and gives me an IntegrityError claiming to violate the check constraint of the date field. Should this be supported or is there some behind the scenes reason this doesn't work or assumption on how this library should be used that this use case violates? Thanks
Hi,
Please check below error is coming
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/neha/venvs/django-env/bin/architect", line 10, in
sys.exit(main())
File "/home/neha/venvs/django-env/lib/python3.6/site-packages/architect/commands/init.py", line 93, in main
commands[command]['parser'].result(args.func(vars(args)))
File "/home/neha/venvs/django-env/lib/python3.6/site-packages/architect/commands/partition.py", line 32, in run
cls.architect.partition.get_partition().prepare()
File "/home/neha/venvs/django-env/lib/python3.6/site-packages/architect/databases/postgresql/partition.py", line 107, in prepare
column='"{0}"'.format(self.column_name)
File "/home/neha/venvs/django-env/lib/python3.6/site-packages/architect/orms/django/features.py", line 48, in execute
return self.connection.execute(sql)
File "/home/neha/venvs/django-env/lib/python3.6/site-packages/django/db/backends/utils.py", line 100, in execute
return super().execute(sql, params)
File "/home/neha/venvs/django-env/lib/python3.6/site-packages/django/db/backends/utils.py", line 68, in execute
return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
File "/home/neha/venvs/django-env/lib/python3.6/site-packages/django/db/backends/utils.py", line 77, in _execute_with_wrappers
return executor(sql, params, many, context)
File "/home/neha/venvs/django-env/lib/python3.6/site-packages/django/db/backends/utils.py", line 85, in _execute
return self.cursor.execute(sql, params)
File "/home/neha/venvs/django-env/lib/python3.6/site-packages/django/db/utils.py", line 89, in exit
raise dj_exc_value.with_traceback(traceback) from exc_value
File "/home/neha/venvs/django-env/lib/python3.6/site-packages/django/db/backends/utils.py", line 83, in _execute
return self.cursor.execute(sql)
django.db.utils.ProgrammingError: syntax error at or near "_insert_child"
LINE 3: ...REATE OR REPLACE FUNCTION "general"."email_master"_insert_ch...
^
when i am trying to run command
architect partition --module ti_webapp.models.EmailMaster
my configurations are
addict 2.2.0
alabaster 0.7.3
architect 0.5.6
Babel 2.6.0
beautifulsoup4 4.7.1
blessings 1.7
bpython 0.17.1
bs4 0.0.1
certifi 2018.11.29
chardet 3.0.4
curtsies 0.3.0
Django 2.1.7
django-composite-foreignkey 1.1.0
django-redis 4.10.0
django-redis-sessions 0.6.1
docutils 0.14
Dumper 1.2.0
elasticsearch 1.9.0
greenlet 0.4.15
idna 2.8
imagesize 1.1.0
Jinja2 2.10
MarkupSafe 1.1.1
nltk 3.4
packaging 19.0
pip 19.0.3
pkg-resources 0.0.0
psycopg2 2.7.7
psycopg2-binary 2.7.7
pycrypto 2.6.1
Pygments 2.3.1
pyketama 0.2.1
pyparsing 2.3.1
python-decouple 3.1
python-memcached 1.59
pytz 2018.9
redis 3.2.0
requests 2.21.0
setuptools 40.8.0
simple-crypt 4.1.7
simplejson 3.16.0
singledispatch 3.4.0.3
six 1.12.0
snowballstemmer 1.2.1
soupsieve 1.9
Sphinx 2.0.0
sphinxcontrib-applehelp 1.0.1
sphinxcontrib-devhelp 1.0.1
sphinxcontrib-htmlhelp 1.0.1
sphinxcontrib-jsmath 1.0.1
sphinxcontrib-qthelp 1.0.2
sphinxcontrib-serializinghtml 1.1.1
typing 3.6.6
urllib3 1.24.1
wcwidth 0.1.7
wheel 0.33.0
Hi!
With Postgres 12.0 release we finally get proper support for ForeignKeys to partitioned tables: https://www.postgresql.org/docs/current/release-12.html
In other words, one could say that table partitioning is finally feature complete in Postgres.
Are there any plans to update architect
to support that? If I wanted to add such support myself, where should I start?
Thanks,
Bartosz
I add the partition configuration in my class:
@architect.install('partition', type='range', subtype='integer', constraint='100', column='id')
class Article(ArticleGeneric,
ConfidenceScoreModel,
EditableModel,
HitTrackedModel,
HotnessScoreModel,
VotableModel):
I export the path: $ export DJANGO_SETTINGS_MODULE=prisvo.settings
But when i execute architect partition --module article.models
Traceback (most recent call last):
File "/home/developer/.virtualenvs/prisvo/bin/architect", line 11, in <module>
sys.exit(main())
File "/home/developer/.virtualenvs/prisvo/local/lib/python2.7/site-packages/architect/commands/__init__.py", line 91, in main
orms.init()
File "/home/developer/.virtualenvs/prisvo/local/lib/python2.7/site-packages/architect/orms/__init__.py", line 17, in init
getattr(__import__(name, globals(), level=1), 'init', lambda: None)() # if yes, run init() for this ORM
File "/home/developer/.virtualenvs/prisvo/local/lib/python2.7/site-packages/architect/orms/django/__init__.py", line 7, in init
django.setup()
File "/home/developer/.virtualenvs/prisvo/local/lib/python2.7/site-packages/django/__init__.py", line 21, in setup
apps.populate(settings.INSTALLED_APPS)
File "/home/developer/.virtualenvs/prisvo/local/lib/python2.7/site-packages/django/apps/registry.py", line 108, in populate
app_config.import_models(all_models)
File "/home/developer/.virtualenvs/prisvo/local/lib/python2.7/site-packages/django/apps/config.py", line 202, in import_models
self.models_module = import_module(models_module_name)
File "/usr/lib/python2.7/importlib/__init__.py", line 37, in import_module
__import__(name)
File "/home/developer/projects/prisvo/prisvo-backend/article/models.py", line 17, in <module>
from edition.base import EditableModel
File "/home/developer/projects/prisvo/prisvo-backend/edition/base.py", line 6, in <module>
from user.models import User
ImportError: No module named models
My site works with no problem, my model user work perfectly, my migrations are done without errors. Only here is giving me an error. And if i comment where the import is imported is give another import error with the same model but in another code.
The project is running on Django. I set up my model as follows:
import architect
from django.contrib.postgres.fields import JSONField
@architect.install('partition', type='range', subtype='string_firstchars', constraint='10', column='collection')
class PartitionedDocument(models.Model):
collection = models.CharField(max_length=10)
content = JSONField(default=dict)
In other words, I am partitioning using the first 10 characters of the collection
field. Plus, I ensure the collection
is always a string of 10 characters (from a pre-determined list), so its a nice partitioning system. The list looks like this:
collection_labels = [
'91IJAYXLPc',
'kzMZFCHavP',
'SMyJzyY9ze',
'EAW95zQtED',
'AzDWtgZ6MA',
'Kv7wrKX9m3',
'BO9CsGIqpd',
'fQhDlqVNJs',
'tvj1zF3Nmm',
'IAPzdfsjFM',
'hLuumRvCK4',
'ErwIeGjMwE',
'ebS3sy7xmJ',
'aLyd8Hrcpe',
'8hdhGCFmJU',
'AfT7kLR0ep',
'0MPS0KLsrT',
'k17vDmktNB',
'SluHmP1X2K',
'9OMXJZJqHE'
]
Anyway, I added some data to my database after doing all this, and then ran a query like so:
PartitionedDocument.objects.annotate(
full_name = RawSQL("(content->>'full_name')::text", []),
age = RawSQL("(content->>'age')::int", [])
).filter(
age__gte = 30
).extra(where=["lower(substr(collection, 1, 10)) = '91IJAYXLPc'"])
If everything went correctly, I should only be hitting the table that corresponds to the collection
with a value of '91IJAYXLPc', and then I would filter based on the JSONField. I know for a fact that objects that fall under this condition exists in my database, however, the queryset returned is empty.
Did I do something wrong? Thanks.
Hello,
I think many of us run into the situation when the primary key of the table which is partitioned is referenced somehow in another table and it would be great if architect could detect this and creates shadow table which will keep track of FKs as show here:
https://postgresinfo.wordpress.com/2012/05/29/table-partition-with-foreign-keys/
Thank you so much.
Hi there,
Just a bump for this question on the dbparti repo: maxtepkeev/django-db-parti#14 (comment) .
Also, it would be great if sqlite could be supported for Django's orm. I use SQLite in Dev and parity with production would be great.
Thanks
Kevin
Hi,
Do you have a plan to support postgresql 10's native partitioning mechanism?
BR
# APP = create_app(mode=os.getenv("ENV", "dev"))
# DB.init_app(APP)
@architect.install('partition', type='range', subtype='date', constraint='month', column='created')
class TodoHistory(DB.Model):
__tablename__ = "todo_history"
__table_args__ = {'mysql_collate': 'utf8_general_ci'}
id = DB.Column("todo_id", DB.Integer, primary_key=True)
description = DB.Column("description", DB.String(270), nullable=False)
created = DB.Column(DB.TIMESTAMP, server_default=text("CURRENT_TIMESTAMP"), nullable=False)
def __init__(self, description):
self.description = description
architect.exceptions.ORMError: Unsupported ORM "flask_sqlalchemy" requested for class "MyModel", available ORMs are: django, peewee, pony, sqlalchemy, sqlobject
from sqlalchemy.ext.declarative import declarative_base
# APP = create_app(mode=os.getenv("ENV", "dev"))
# DB.init_app(APP)
@architect.install('partition', type='range', subtype='date', constraint='month', column='created')
class TodoHistory(declarative_base(DB.Model)):
__tablename__ = "todo_history"
__table_args__ = {'mysql_collate': 'utf8_general_ci'}
id = DB.Column("todo_id", DB.Integer, primary_key=True)
description = DB.Column("description", DB.String(270), nullable=False)
created = DB.Column(DB.TIMESTAMP, server_default=text("CURRENT_TIMESTAMP"), nullable=False)
def __init__(self, description):
self.description = description
I solve it!
def test_history_partition(app_context):
""" history test """
from app.models.todo.orm import TodoHistory
history = TodoHistory(description="test")
import datetime
history.created = datetime.datetime.utcnow()
history.metadata.bind=DB.engine
DB.session.add(history)
DB.session.commit()
Hi
I have a simple PostgreSQL database / data model ...
Backup -> Snapshot
Snapshot -> Entry
The data that is being backed up are file system records and a record in the Entry table can correspond to either a directory or a file.
The reason why I'm considering partitioning my data is that a Backup of a directory can easily have 10 million Entry records in a given Snapshot.
I'm using Django and my Entry model looks like this (only showing relevant details) ...
class Entry(models.Model):
snapshot = models.ForeignKey(Snapshot, verbose_name='Snapshot', on_delete=models.CASCADE)
path = models.TextField(blank=False)
...
I want to partition the entries table on snapshot_id.
That makes the most sense to me - but I'm not sure what value to use for the 'constraint' in the architect.install decorator ...
@architect.install('partition', type='range', subtype='integer', constraint=???, column='snapshot_id')
To be 100% clear - I want all the entries associated with a given snapshot_id to be in the same partition.
I've organized my models into separate files within a models directory (within the arc/models directory). My models directory is a module.
I call the 'partition' command as follows ...
architect partition --module arc.models.entry
this command executes but generates the following error ...
architect partition: error: unsupported partition constraint "integer" in "Entry" model, supported partition constraints for "postgresql" database are: positive integer
what changes do I need to make to the decorator so that the partition command works ?
Also - presumably its possible to run the 'partition' command as soon as I create a Snapshot - but before I run my script to identify and insert Entry records - correct ?
Thanks
Dave
I have a Django model like this:
@architect.install('partition', type='range', subtype='integer', constraint='100', column='search_id')
class SearchResult(models.Model):
search = models.ForeignKey(Search, on_delete=models.CASCADE)
...
When I am selecting with ORM filtering by partition column
SearchResult.objects.filter(search_id=3502).only('id').first()
Django selects from master table:
(0.005) SELECT "app_searchresult"."id" FROM "app_searchresult" WHERE "app_searchresultapp_searchresult"."search_id" = 3502 ORDER BY "app_searchresult"."id" ASC LIMIT 1; args=(3502,)
How I can select something with ORM from tables caled like "app_searchresult_3501_3600"?
I believe that I don't need to know a table name for this.
Django ConnectionMixin supports DEFAULT_DB_ALIAS only. https://github.com/maxtepkeev/architect/blob/master/architect/orms/django/features.py#L20
There should be support for multi db Django configuration https://docs.djangoproject.com/en/1.9/topics/db/multi-db/
I have a high loaded postgres database - a lot of inserts and updates on partitioned table. I faced with bloat problem and as solution I'm using pg_reorg tool. it's working perfectly with any child separately and parent. But I found that parent table get bloated as well as children without any row inside. What the reason of it? For my opinion it should be empty, because all data is inserting into children tables.
It would be nice to have some examples for each of the supported db abstraction libraries. Right now, I can not easily judge if architect could maybe help my with my use case.
Hi,
I saw a similar issue about this problem but I don't have the same configuration:
If the app name is uppercase and you want to execute the command architect, you will have the error invalid type name.
To correct this I changed in architect/databases/postgresql/partition.py the PSQL code.
`
return self.database.execute("""
-- We need to create a before insert function
CREATE OR REPLACE FUNCTION {{parent_table}}_insert_child()
RETURNS TRIGGER AS $$
DECLARE
match {{parent_table_origin}}.{{column}}%TYPE;
tablename VARCHAR;
checks TEXT;
{declarations}
BEGIN
IF NEW.{{column}} IS NULL THEN
tablename := '{{parent_table}}_null';
checks := '{{column}} IS NULL';
ELSE
{variables}
END IF;
IF NOT EXISTS(
SELECT 1 FROM information_schema.tables WHERE table_name=tablename)
THEN
BEGIN
EXECUTE 'CREATE TABLE ' || tablename || ' (
CHECK (' || checks || '),
LIKE "{{parent_table}}" INCLUDING DEFAULTS INCLUDING CONSTRAINTS INCLUDING INDEXES
) INHERITS ("{{parent_table}}");';
EXCEPTION WHEN duplicate_table THEN
-- pass
END;
END IF;
EXECUTE 'INSERT INTO ' || tablename || ' VALUES (($1).*);' USING NEW;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
-- Then we create a trigger which calls the before insert function
DO $$
BEGIN
IF NOT EXISTS(
SELECT 1
FROM information_schema.triggers
WHERE event_object_table = '{{parent_table}}'
AND trigger_name = 'before_insert_{{parent_table_lowcase}}_trigger'
) THEN
CREATE TRIGGER before_insert_{{parent_table}}_trigger
BEFORE INSERT ON "{{parent_table}}"
FOR EACH ROW EXECUTE PROCEDURE {{parent_table}}_insert_child();
END IF;
END $$;
-- Then we create a function to delete duplicate row from the master table after insert
CREATE OR REPLACE FUNCTION {{parent_table}}_delete_master()
RETURNS TRIGGER AS $$
BEGIN
DELETE FROM ONLY "{{parent_table}}" WHERE {{pk}};
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
-- Lastly we create the after insert trigger that calls the after insert function
DO $$
BEGIN
IF NOT EXISTS(
SELECT 1
FROM information_schema.triggers
WHERE event_object_table = '{{parent_table}}'
AND trigger_name = 'after_insert_{{parent_table_lowcase}}_trigger'
) THEN
CREATE TRIGGER after_insert_{{parent_table}}_trigger
AFTER INSERT ON "{{parent_table}}"
FOR EACH ROW EXECUTE PROCEDURE {{parent_table}}_delete_master();
END IF;
END $$;
""".format(**definitions).format(
pk=' AND '.join('{pk} = NEW.{pk}'.format(pk=pk) for pk in self.pks),
parent_table=self.table,
parent_table_lowcase=str(self.table).lower(),
parent_table_origin='"{0}"'.format(self.table),
column='{0}'.format(self.column_name)
))`
As I use only Postgresql, I couldn't do any other tests for others db.
The before-insert trigger in Postgres is something like:
IF NOT EXISTS(
SELECT 1 FROM information_schema.tables WHERE table_name=tablename)
THEN
BEGIN
EXECUTE 'CREATE TABLE ' || tablename || ' (
CHECK (' || checks || '),
LIKE "{{parent_table}}" INCLUDING DEFAULTS INCLUDING CONSTRAINTS INCLUDING INDEXES
) INHERITS ("{{parent_table}}");';
EXCEPTION WHEN duplicate_table THEN
-- pass
END;
END IF;
And it will check whether table's existense by SELETC 1
, then create table if select failed.I tried to replace with CREATE TABLE IF NOT EXIST
BEGIN
EXECUTE 'CREATE TABLE IF NOT EXISTS ' || tablename || ' (
CHECK (' || checks || '),
LIKE "{{parent_table}}" INCLUDING DEFAULTS INCLUDING CONSTRAINTS INCLUDING INDEXES
) INHERITS ("{{parent_table}}");';
EXCEPTION WHEN duplicate_table THEN
-- pass
END;
In my case, it will speed up insert action about 25% (I really care about the decrease of insert action's speed). But I doubt that it is too special, so I open this issue to discuss about it. Thanks!
Hi.
I would like to work with an existing partitioned database. I thought that I would need architect
to deal with the partitioning, and I would like to use the django inspectdb
command to retrieve the database models. Is there any support for this command with architect? I could not find any documentation about it. If no, is there any plans to support it?
Thanks for your help.
Insert performance, when using partioning with PostgreSQL, could be increased significantly by avoiding the extra insert and delete in the parent table. This can be achieved by returning NULL instead of NEW in the before insert trigger function, thus aborting the insert action in the parent table. The only downside to this is that subsequent triggers aren't fired. Because of this, a good solution would be to include this optimization as an option in the partition configuration.
How would be the best way to make a partition by example name:
White_list
blacklist
general_list
list_first_income # @
I want to bulk insert rows into a specific partition which all have the same date. Assume that data are partitioned by month on date_column. Does the architect API allow for:
p = get_partition(column_name='date_column', value='2015-01-01')
then run: "copy from '/my/file.txt' to %s" % p.table_name
Perhaps I am unclear on how to use the get_partition API. Can you provide an example?
Hello! I would like to know if we have old AdminSite features as we had in django-db-parti
Regards!
Hi!
I tried to use architect but I encountered a problem I don't really know how to handle.
I described it here: http://stackoverflow.com/questions/33260386/partitioning-django-with-architect-package-translation-error
I don't know if it is the Architect issue or my lack of knowledge how to handle Django, but if you could help me on the subject I would be grateful.
Hi,
Running this:
$ architect partition --module myapp.models
And getting this error:
django.db.utils.ProgrammingError: invalid type name "myapp_modelname."reportingdteonly"%TYPE"
LINE 6: match myapp_modelname."reportingdteonly"...
^
the reportingdteonly is of type Datetime
DB postgres 9.4
Really appreciate your help here
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.