humlab-sead / sead_change_control Goto Github PK

View Code? Open in Web Editor NEW

1.0 2.0 0.0 120.59 MB

Sane SEAD change control using Sqitch.

PLpgSQL 98.07% Shell 1.72% Batchfile 0.01% Makefile 0.15% Python 0.04%

sead visead sqitch schema-changes database

sead_change_control's People

Stargazers

Watchers

sead_change_control's Issues

20200203_DML_ADD_YEAR_TYPES: Populate tbl_years_types

Populate tbl_years_type as follows (values from Bugs plus an extra 'Unknown' - which maps to nulls and '?' in Bugs import)

years_type_id	name	description	date_updated
	Calendar	Calendar years
	C14	Radiocarbon years
	Radiometric	Radiometric years (but not C14)
	Unknown	Unknown years type (either to be defined or unspecified in source)

20191213_BUGS_IMPORT_POST_APPLY: Bugs Sample codes need to be stored as alt refs in SEAD

In BugsCEP, unique identifier for a sample is TSample.SampleCODE.
This is stored in bugs_trace during import but not stored in the main SEAD database.
SampleCODE needs storing in SEAD as:
tbl_sample_alt_refs.alt_ref with type "Database identifier" (new alt_ref_type)

This is needed for tracing samples through SEAD to Bugs, for example when Francesca creates a matrix of all insect samples.

Dendrochronology record type needed

Dendrochronology data needs a record type to enable some query and filter functionality.

tbl_record_types.record_name = 'Dendrochronology'
tbl_record_types.record_name = SEE ROGER'S QUERY

Update tbl_methods.record_type_id to match the new record_type

Raä-IDs

Should be next to the site name in the site filter, but requires back-end support

Improve script that creates new staging database

Automate as much as possible.

Isotope: Remaining errors in Isotope data

Remaining errors in Isotope data:

Reference to missing method_id:
- method_id 23 in tbl_isotope_measurement
- method_id 5 in tbl_sample_groups
Reference to missing sample_type_id:
- method_id 15 in tbl_physical_samples

20190101_DDL_FACET_SCHEMA: add facet description field

A new field containing a short description of the facet. Related to humlab-sead/sead_query_api/issues/54

20180613_DDL_AUDIT_DEPLOY_AUDIT_SYSTEM: Audit system

System is now in wrongs schema.

20191219_DML_METHODS_UPDATE: Method abbrev. is sometimes null

For example, see: http://supersead.humlab.umu.se/site/102

20191221_DDL_UDF_BUGS_IMPORT_POST_APPLY: Add Bugs alt ref type

New alt ref type needed for SEAD - Bugs tractability .

Filter has corrupt name

Filter with facetCode "tbl_denormalized_measured_values_37" has a DisplayTitle of "P┬░", which is obviously due to some sort of encoding translation error.

20191222_DML_DATASET_SUBMISSION_UPDATE: Dataset submission entries updates

On creation of each bugs datasets, tbl_dataset_submissions entries need to be created automatically.
Submission type as follows:
"submission_type_id";"submission_type";"description"
5;"Compilation into SEAD from another database";"Single dataset from another database submission into SEAD"

20191014_DML_DATASET_NULL_MASTER_ID

Update NULL MAL master_set_id in tbl_datasets.

Resolves #100

Identity changed for ROOT facet group

Identity of facet group ROOT (facet.facet_group.facet_group_id) changed from 0 to 999. Value 0 is reserved for undefined.

Deploy of submission failed: "Cannot change to directory ./general"

Errour output from sqitch command:

Cannot change to directory ./general: No such file or directory

Trace begun at /bin/../lib/perl5/App/Sqitch.pm line 279
App::Sqitch::_parse_core_opts('App::Sqitch', 'ARRAY(0x558dc7e04d30)') called at /bin/../lib/perl5/App/Sqitch.pm line 182
App::Sqitch::go('App::Sqitch') called at /bin/sqitch line 17

SQL update of references after DDL refactor

Update is currently missing in refactor DDL. Need SQL update of new fields authors, title, full_ reference based on old fields.

20191012_DDL_ISOTOPE_MODEL

See humlab-sead/sead_clearinghouse_import#5.

Add README file

Add a rudimentary description of SEAD CCS in README.

Review pending change set(s)

Review of CRs currently added to Github repository
Review CR naming and Sqitch subprojects

Integration of temperature proxy data

Timeline needs temperature data in the database. I have created the table tbl_temperatures for this purpose, but it should be integrated into the sead change control so that it carries over with different database versions. The name is also wrong since according to Phil this is not actually temperature data but only a proxy for it. Perhaps this data should also be inserted into an existing table somewhere rather than having it's own? If you don't mind I'll leave it up to you @visead and @roger-mahler to figure out where/how to best integrate this data into the SEAD database in a more proper manner.

Attaching SQL for creating the table and inserting the data.

tbl_temperatures.zip

20220916_DDL_RESULTS_CHRONOLOGY: analysis_entity_ages is empty

tbl_measured_value_dimensions and tbl_analysis_entity_dimensions are empty

These two tables are currently empty and are not in use. The tables seem to be meant for another layer of storing information about subsamples and their measured dimensions e.g. before and after burning weights for LOI analysis.

Although I think this function seem to exist indirectly in other populated tables, but I might be cofusing things.

Since they aren't in use and you can get sample information form other tables I would suggest looking into perhaps removing them

20200203_DML_METHOD_UPDATE: Update characters in method description

20200203_DML_METHOD_UPDATE

Method description needs editing to remove linebreak between 'use.' and 'See' as it causes problem when exporting data.

"Swedish Rikets Nät (National Grid) system. Full name "RT 90 2.5 gon V 0:-15". X = south-north, Y = west-east. Essentially superceded by SWEREF 99 although still in extensive use.
See http://www.lantmateriet.se/templates/LMV_Page.aspx?id=4766&lang=EN (NOTE: include URL as biblio link)"

Check handling of MS and MS550 datasets etc for filters

Phil is unsure how pre treatment etc is handled and how data should be selected for range (histogram) filters for geochemical analysis methods

20190101_DDL_FACET_SCHEMA: Region facet request

Can we have a "Region" facet under the space-time group?

Contents would be something like (filtering sites?):

select location_name, site_id, location_type_id
from tbl_locations
inner join tbl_site_locations on tbl_locations.location_id=tbl_site_locations.location_id
where tbl_locations.location_type_id = 2 or tbl_locations.location_type_id = 7 or tbl_locations.location_type_id = 14 or tbl_locations.location_type_id = 16 or tbl_locations.location_type_id = 18
order by location_name

Use case: A point of entry for all sites in Västerbotten linked from https://sparfran10000ar.se/

20191219_DML_METHODS_UPDATE: Methods missing record type

87 records in tbl_methods lack record_type. All data related to these methods will be filtered out from SEAD Query API reporting queries since all facet tables are inner joined.

New result facet that targets datasets

A new result facet that target datasets is added using the following JSON specification:

{
   "facet_id": 40,
   "facet_code": "result_facet_dataset",
   "display_title": "Datasets",
   "description": "Datasets",
   "facet_group_id":"99",
   "facet_type_id": 1,
   "category_id_expr": "tbl_datasets.dataset_id",
   "category_name_expr": "tbl_datasets.dataset_name",
   "sort_expr": "tbl_datasets.dataset_name",
   "is_applicable": false,
   "is_default": false,
   "aggregate_type": "count",
   "aggregate_title": "Number of datasets",
   "aggregate_facet_code": null,
   "tables": [
   {
   	"sequence_id": 1,
   	"table_name": "tbl_datasets",
   	"udf_call_arguments": null,
   	"alias":  null
   } ],
   "clauses": [  ]
}

Sqitch reports "Use of uninitialized value"...

Resolved by sqitchers/sqitch#459 (comment).

Facets: Measured value facets should target analysis entity

Previous QSEAD system counted values by grouping on physical_sample_ìd, not analysis_entity_id. This two approaches gives totally different results.

Add bibliography filters

humlab-sead/sead_browser_client#17

20190101_DDL_FACET_SCHEMA: New facet "Master datasets"

A new facet with the following JSON specification:

{
	"facet_id": 38,
	"facet_code": "dataset_master",
	"display_title": "Master datasets",
	"description": "Master datasets",
	"facet_group_id": "2",
	"facet_type_id": 1,
	"category_id_expr": "tbl_dataset_masters.master_set_id ",
	"category_name_expr": "tbl_dataset_masters.master_name",
	"sort_expr": "tbl_dataset_masters.master_name",
	"is_applicable": true,
	"is_default": false,
	"aggregate_type": "count",
	"aggregate_title": "Number of samples",
	"aggregate_facet_code": "result_facet",
	"tables": [
    	{
    		"sequence_id": 1,
    		"table_name": "tbl_dataset_masters",
    		"udf_call_arguments": null,
    		"alias":  null
    	}
    ],
	"clauses": [  ]
}

Inconsistent sqitch.plan prevents deploy

Manual changes (delete) in plan of deployed changes prevents deploy.
Changes:
[x] Make system forward only for now (remove reverts)
[x] Make all changes idempotent
[x] Do a sqitch rebase - does a revert and deploy in sequence

Column facet.facet.facet_key changed to facet_code.

Name of column facet.facet.facet_key changed to facet_code.

Reason for this is that name should be in line with used terminology.

Update all MAL-datasets that is NULL to ID 2

Erik mail May 5, 2018:

I have troubles with the datasets in sead_master_9. There are a bunch in there without master_set_id. I don’t know if these should be there or not?

Should we have some kind of collection of all the lookup data somewhere so we can include it into a datastore separately?`

20191012_DML_ASSOCIATION_TYPE_UPDATES

Resolves humlab-sead/sead_bugs_import#13.

20190503_DDL_BUGS_ADD_TRANSLATIONS (updated)

Consolidate public schema.

[ x ] Setup sqitch project

Compare the different SEAD versions for schema changes, excluding bugs-related data.
Create a DDL change (using sqitch) for each identified difference.

Wrong record_type for some methods?

Methods relating to geoarchaeology might have been marked with the wrong record_type. They have been marked as "Non-biological taxa" (record_type_id 10) instead of "Soil chemistry/property" (record_type_id 12), you can see the mixup when searching in the webbrowser and sorting the sites based on their record types in the table-view

Only samples containing soil chemistry seem to have this error

20170906_DDL_RELATIVE_DATES_ALTER_RELATION (update)

The change 20170906_DDL_RELATIVE_DATES_ALTER_RELATION was run OK but still schema unchanged.

SEAD @2019.12: Release checklist

Deploy new SEAD database

Note! This checklist _only_deploys the database. Relevant systems need to be deployed desperately and in conjunction.

Prerequisites

Docker
Git

Description

This will be the first deploy of SEAD database using the new change control system. The starting point of the deployment is an SQL dump of sead_master_9 stored as zipped file sead_master_9_public.sql.gz in folder starting_point in the sead_change_control repository.

The overall deployment process is as follows:

Prepare deploy
Create new staging
Import pending submissions
Deploy submissions
Import bugs data
Switch production database
Deploy relevant systems

Prepare deploy

Make sure you have an up-to-date version of the SEAD repositories sead_change_control and sead_clearinghouse*.

Run git clone if you haven't checked out the repository before, or update the code using git pull

% mkdir -p ~/source && cd ~/source
% git clone https://github.com/humlab-sead/sead_change_control.git
% git clone https://github.com/humlab-sead/sead_clearinghouse.git
% git clone https://github.com/humlab-sead/sead_clearinghouse_import.git

Freeze deployment by adding tag @2019.12 to all Sqitch plans (for each subfolder xyz) to be included in the release (this can also be done manually by editing the .plan files):

% sqitch tag --tag @2019.12 --plan-file ./xyz/sqitch.plan --note "2019 December release"

Add tag @2019.12 to all relevant repositories.

Create new staging

$ ./bin/deploy_staging --user humlab_admin --target-db-name sead_staging --create-database --source-type dump --on-conflict drop --deploy-to-tag @2019.12

Import pending submissions

The following workflow imports and commits submissions that have been prepared in Excel files i.e. Ceramics, Dendrochronology and Isotope data. The flow assumes that relevant updates to the SEAD database model have been committed and all lookup data reference by the new data submissions has been inserted (separate change requests).

Checkout or update systems sead_clearinghouse and sead_clearinghouse_import.

Prepare submission data (Excel) files and copy them to sead_clearinghouse_import/data/input.
Import data to SEAD Clearinghouse System using sead_clearinghouse_import. Edit options in ~/sorce/sead_clearinghouse_import/runner.py i.e. pecify name of input files and data types.
Verify imported data using SEAD Clearinghouse.
Commit & deploy submission data to SEAD Change Control System
Deploy submission change request to SEAD staging

Run the following commands to import the data to Clearinghouse:

$ cd ~/source/sead_clearinghouse_import
$ pipenv shell
$ vi runner.py # edit run options
$ SEAD_CH_PASSWORD=qwerty python runner.py

Store staging phase 1 database:

create database sead_staging_phase_2 owner sead_master template sead_staging;

Deploy submissions to SEAD Change Control system:

$ cd ~/source/sead_clearinghouse/transport_system$ .
$ ./deploy_submission.sh --dbname=sead_staging --id=1 --force --add-change-request
$ ./deploy_submission.sh --dbname=sead_staging --id=2 --force --add-change-request
$ ./deploy_submission.sh --dbname=sead_staging --id=3 --force --add-change-request
$ ./deploy_submission.sh --dbname=sead_staging --id=4 --force --add-change-request

Store staging phase 2 database:

create database sead_staging_phase_2 owner sead_master template sead_staging;

Deploy submissions

$ sqitch deploy --target staging --mode change --no-verify -C ./submissions 20191220_DML_SUBMISSION_CERAMICS_001_COMMIT
$ sqitch deploy --target staging --mode change --no-verify -C ./submissions 20191220_DML_SUBMISSION_DENDRO_BUILDING_002_COMMIT
$ sqitch deploy --target staging --mode change --no-verify -C ./submissions 20191220_DML_SUBMISSION_DENDRO_ARCHEOLOGY_003_COMMIT
$ sqitch --target staging --mode change --no-verify -C ./submissions 20191220_DML_SUBMISSION_ISOTOPE_004_COMMIT

Store staging phase 3 database:

create database sead_staging_phase_3 owner sead_master template sead_staging;

Import bugs data

$ cd ~/source/sead_bugs_import || cd ~/source && git clone https://github.com/humlab-sead/sead_bugs_import
$ cd ~/source/sead_bugs_import && git pull
$ mvn -Dmaven.test.skip=true clean
$ mvn -Dmaven.test.skip=true package
$ vi config/application.properties # set sead_staging as target database
$ java -jar target/bugs.import-0.1-SNAPSHOT.jar --file=./bugsdata/bugsdata_20190503.mdb  > bugsimport_20191220.log 2>&1 &
$ Execute (not yet created post_bugs_import.sh) or

select bugs_import.post_import_updates()

Create Bugs import change request:

% cd ~/source/sead_change_control/
% sqitch add --change-name 20191221_DML_SUBMISSION_BUGS_20190303_COMMIT --note "Initial Bugs Import" --chdir ./submissions
% cd source/sead_change_control/submissions/deploy/20191221_DML_SUBMISSION_BUGS_20190303_COMMIT
% __navicat public schema diff pre/post bugs > public_data_diff.sql__
% pg_dump --data-only --blobs -d sead_staging --schema bugs_import -h seadserv.humlab.umu.se -F p -U humlab_admin -f bugs_import_schema.sql
% gzip bugs_import_schema.sql
% gzip public_data_diff.sql

Switch production database

-- alter database sead_production rename sead_yyyymm;
create database sead_production owner sead_master template sead_staging;

Deploy relevant systems

Rebuild SEAD Query API:

% cd ~/source/sead_query_api/
% # wget https://github.com/humlab-sead/..././docker-build.sh && chmod +x ./docker-build.sh
% vi conf/appsettings.Production.json # edit run settings
% docker-compose down
% ./docker-build.sh

Start API

% docker-compose up -d

20200203_DML_FACET_UI_ELEMENTS_UPDATE: Correct Filter group and item titles

Correct filter visible elements as follows:

measured_values - measurements
MS -Magnetic sus.
LOI - Loss on Ignition
PT? - Phosphates

space_time - space/time
Site - Sites
Country - Countries
Sample group - Sample groups

Taxa - Taxon

Reorder filter groups:
space/time
ecology
measurements
taxonomy

Reorder items within space/time filter
Master datasets
Countries
Sites
Sample groups
Time periods
Geochronology

20191125_DML_CERAMICS_LOOKUP: Ceramics lookup data

CH import system has guards that prevent import of new lookup data via Clearinghouse. The ceramics data import files were created before these guards were activated.

The lookup data are added to SEAD via a change request via the change control system, and the Excel is updated with the new SEAD system identities returned by the inserts.

Add CCS repository for security related tasks

Project for roles and privileges.

20200315_DML_ECOCODE_ADD_SYSTEM_4: Anolds & van der Maarel (plants)

tbl.ecocode_sytem.name 'Anolds & van der Maarel (plants)' was not transferred from sead master 8 during migration. This needs doing, including code assignations to taxa and reference for system.
System provides ecology codes for plants in the Netherlands in environments as classified for biodiversity surveys. Part of Ida Lundberg's Magister project.

New facet "Dataset methods"

A new facet with the following specification:

{
	"facet_id": 39,
	"facet_code": "dataset_methods",
	"display_title": "Dataset methods",
	"description": "Dataset methods",
	"facet_group_id": "2",
	"facet_type_id": 1,
	"category_id_expr": "tbl_methods.method_id ",
	"category_name_expr": "tbl_methods.method_name",
	"sort_expr": "tbl_methods.method_name",
	"is_applicable": true,
	"is_default": false,
	"aggregate_type": "count",
	"aggregate_title": "Number of datasets",
	"aggregate_facet_code": "result_facet_dataset",
	"tables": [
    	{
    		"sequence_id": 1,
    		"table_name": "tbl_dataset_masters",
    		"udf_call_arguments": null,
    		"alias":  null
    	},
    	{
    		"sequence_id": 2,
    		"table_name": "tbl_datasets",
    		"udf_call_arguments": null,
    		"alias":  null
    	}
    ],
	"clauses": [  ]
}

Incorrect column name in queries to ecocode_definitions

tbl_ecocode_definitions.label needs to be changed to tbl_ecocode_definitions.name

Affects supersead facet queries.

Results from column name change between sead master 8 and 9.

New Bugs methods must have fixed IDs.

Fixed ID is needed to avoid id clashes.

20170101_DDL_BIBLIO_REFACTOR_MODEL: Biblio refactoring MAL data transfer

tbl_biblio is being simplified and existing MAL data in this and related tables need to be merged into the simplified fields. The order of fields to be concatenated will depend on the publication type - e.g. book vs journal article.

create string for title and full_reference

Phil will create the list for each type
e.g.
Publication_type = journal then
tbl_biblio.title = tbl_biblio.title & " " tbl...

for publication_type_id
9
5
4
6
12
26
23
22

Eco code system 4 missing

The change requests that add ecocode system 4 is commented out from SCC plan.

20190503_DDL_TAXA_ATTRIBUTE_TYPE_LENGTH SqlException

Import fails with exception "ERROR: value too long for type character varying(25)" when values are inserted into tbl_taxa_measured_attributes, column attribute_type.

Sample values that exceeeds 25 characters:

Length larva final (10th) instar
Length (brachypterous females)
Length female brachypterous
Length female macropterous
Length, head-end of elytra

Bugs import assigns NULL value as system id to all ecocode groups.

This query should give 0 count as result:

SELECT count(*)
FROM tbl_ecocodes e
JOIN tbl_ecocode_definitions ed USING (ecocode_definition_id)
JOIN tbl_ecocode_groups eg USING (ecocode_group_id)
WHERE ecocode_system_id is null;