Comments (7)
Prepacte za neskoru odpoved - minuly tyzden som toho mal vela.
Tu to mam s upravami a komentarmi:
SELECT
-- MT: Tu je otazka ci je tento identifikator dostatocne stabilny aby sme ho puzivali. standarne pouzivame uuid alebo triplet institutionCode:collectionCode:catalogNumber
CONCAT(p.institution_id, ':', p.abbrev, ':', r.id) AS occurrenceID,
-- VK: chybně t.scientific_name AS scientificName,
-- MT: pravda - netusim odkial som nabral scientific_name :)
t.name_lat AS scientificName,
r.latitude AS decimalLatitude,
r.longitude AS decimalLongitude,
r.datum AS eventDate,
r.locality,
r.altitude_min AS minimumElevationInMeters,
r.altitude_max AS maximumElevationInMeters,
r.comment AS occurrenceRemarks,
-- VK: Co tady potřebujete? Záznam má více autorů. Je to vazba M : N. Předpokládám, že id z databáze Pladias Vám nepomůže nebo ano?
-- MT: Mate pravdu - to id nepotrebujem - skoda ze nepouzivate ORCID. Mohlo byt is ist takto? :
string_agg(CONCAT(ra.name, ' ', ra.surname), '|') AS recordedBy
r.source,
r.environment,
-- VK: Zde Vám stačí id integer z tabulky atlas.projects?
-- MT: Toto mozeme vypustit - to bola GPT halucinacia
-- r.project_id AS associatedSequences, -- Example of mapping project ID to associated sequences; adjust based on actual data mapping
-- VK: Co by mělo být zde? Jedná se o id z tabulky atlas.record_validation_status
-- MT: tu neviem ake su moznosti - v priklade co ste mi poslal mali vsetky hodnotu 3. viete mi o tom povedat viac?
r.validation_status AS identificationVerificationStatus,
r.original_name AS verbatimScientificName,
r.gps_coords_precision AS coordinatePrecision,
-- VK: možné hodnoty Y, M nebo D, které určují, jestli je z pole datum platný pouze rok nebo i měsíc nebo celé datum.
-- I see takze r.datum je vzdy ISO date ale podla tohoto stlpca treba ignorovat mesiac/den ?
-- r.datum_precision AS verbatimEventDate,
-- VK: duplicitní s r.locality
-- r.locality AS locality,
-- VK: jedná se o integer id z tabulky fytochorionů (geodata.phytochorions)
-- MT: Kedze sa nechystame publikovat aj tabulku fytochorionov, tak mozeme tento stlpec zhodit tiez
-- r.phytochorion_id AS locationID -- Example of using phytochorion_id as a location ID
FROM
atlas.records AS r
LEFT JOIN public.taxons AS t ON r.taxon_id = t.id
LEFT JOIN atlas.records_authors AS ra ON r.id = ra.records_id
LEFT JOIN public.projects AS p ON r.records_id = p.id
WHERE
-- VK: U tohoto příznaku je to komplikovanější, protože v určitých případech chtěli v českém Pladiasu, aby bylo možné vložit do mapy i odmítnuté záznamy. Možná by bylo vhodnější vybírat podle validation status.
r.include_in_map = TRUE;
-- pridame toto nech vieme robit agregaciu
GROUP BY r.id
from pladias-dwc-export.
Upravil jsem:
SELECT
-- MT: Tu je otazka ci je tento identifikator dostatocne stabilny aby sme ho puzivali. standarne pouzivame uuid alebo triplet institutionCode:collectionCode:catalogNumber
-- VK2: Myslím, že by se to nemělo brát z projektu. Projekt je pro nás například databáze ze které jsou záznamy importovány.
-- Dával by mi smysl konstantní prefix pro identifikaci instituce. Např. BU-SAV.
CONCAT(p.institution_id, ':', p.abbrev, ':', r.id) AS occurrenceID_wrong,
CONCAT('BU-SAV:', r.id) AS occurrenceID,
-- VK: chybně t.scientific_name AS scientificName,
-- MT: pravda - netusim odkial som nabral scientific_name :)
t.name_lat AS scientificName,
r.latitude AS decimalLatitude,
r.longitude AS decimalLongitude,
r.datum AS eventDate,
r.locality,
r.altitude_min AS minimumElevationInMeters,
r.altitude_max AS maximumElevationInMeters,
r.comment AS occurrenceRemarks,
-- VK: Co tady potřebujete? Záznam má více autorů. Je to vazba M : N. Předpokládám, že id z databáze Pladias Vám nepomůže nebo ano?
-- MT: Mate pravdu - to id nepotrebujem - skoda ze nepouzivate ORCID. Mohlo byt is ist takto? :
-- VK2: zkoušel jsem to a zdá se, že subquery je výrazně efektivnější
(
SELECT string_agg(CASE WHEN a.name = '' THEN a.surname ELSE CONCAT(a.name, ' ', a.surname) END, '|')
FROM atlas.records_authors
INNER JOIN atlas.authors AS a ON records_authors.authors_id = a.id
WHERE records_id = r.id) AS recordedBy,
r.source,
r.environment,
-- VK: Zde Vám stačí id integer z tabulky atlas.projects?
-- MT: Toto mozeme vypustit - to bola GPT halucinacia
-- r.project_id AS associatedSequences, -- Example of mapping project ID to associated sequences; adjust based on actual data mapping
-- VK: Co by mělo být zde? Jedná se o id z tabulky atlas.record_validation_status
-- MT: tu neviem ake su moznosti - v priklade co ste mi poslal mali vsetky hodnotu 3. viete mi o tom povedat viac?
r.validation_status AS identificationVerificationStatus,
r.original_name AS verbatimScientificName,
r.gps_coords_precision AS coordinatePrecision
-- VK: možné hodnoty Y, M nebo D, které určují, jestli je z pole datum platný pouze rok nebo i měsíc nebo celé datum.
-- I see takze r.datum je vzdy ISO date ale podla tohoto stlpca treba ignorovat mesiac/den ?
-- r.datum_precision AS verbatimEventDate,
-- VK: duplicitní s r.locality
-- r.locality AS locality,
-- VK: jedná se o integer id z tabulky fytochorionů (geodata.phytochorions)
-- MT: Kedze sa nechystame publikovat aj tabulku fytochorionov, tak mozeme tento stlpec zhodit tiez
-- r.phytochorion_id AS locationID -- Example of using phytochorion_id as a location ID
FROM
atlas.records AS r
LEFT JOIN public.taxons AS t ON r.taxon_id = t.id
LEFT JOIN atlas.projects AS p ON r.project_id = p.id
WHERE
-- VK: U tohoto příznaku je to komplikovanější, protože v určitých případech chtěli v českém Pladiasu, aby bylo možné vložit do mapy i odmítnuté záznamy. Možná by bylo vhodnější vybírat podle validation status.
r.include_in_map = TRUE
LIMIT 1000;
from pladias-dwc-export.
Dakujem za promptnu odpoved. myslim ze nam ostava len zopar detailov tak ich tu rozpisem:
- occurrenceID: Co si myslite o niecom takomto
BU-SAV:PLADIAS:{r.id}
? v tom pripade ak by BU-SAV publikoval v buducnosti nejaky iny dataset, mohli by pouzit podobnu strukturu. - recordedBy: zaujimave - ja som vzdy pokladal subquery ako najpomalsi :) clovek sa uci kazdy den
- r.validation_status: ake moznosti su tam?
- r.datum_precision: ak tomu rozumiem spravne tak v
r.datum
mate vzdy plny datum ale kedze DATE nepodporuje nekompletne datumy, tak to mate ako flag. Vec je ze veventDate
by som idealne potreboval ISO datum ktory podporuje aj reduced precision. Rozumiem tomu spravne?
from pladias-dwc-export.
- occurrenceID: BU-SAV:PLADIAS:{r.id} je za mě v pohodě, ale myslím, že bych neměl rozhodovat, jak bude vypadat identifikátor slovenských dat
- recordedBy: Přemýšlel jsem nad tím a mohlo to být rychlejší jen díky tomu, že jsem použil LIMIT 1000. V případě GROUP BY musel procházet všechny záznamy, které vyhovují podmínce, ale v případě subquery prošel prvních 1000 a měl hotovo.
- r.validation_status: posílal jsem na e-mail s výstupem skriptu
- r.datum_precision: na toto jsem zapomněl odpovědět. Je to jak píšete. Pokud má být výstupem string, tak je potřeba ho poskládat podle datum_precision.
from pladias-dwc-export.
Tak som to precistil a implementoval komentare. vyzera to rozumne? Trebalo by skusit ktore je rychleisie na celom datasete.
subquerry:
SELECT
CONCAT('BU-SAV:PLADIAS:', r.id) AS occurrenceID,
t.name_lat AS scientificName,
r.latitude AS decimalLatitude,
r.longitude AS decimalLongitude,
CASE
WHEN r.datum_precision = 'Y' THEN TO_CHAR(r.datum, 'YYYY')
WHEN r.datum_precision = 'M' THEN TO_CHAR(r.datum, 'YYYY-MM')
WHEN r.datum_precision = 'D' THEN TO_CHAR(r.datum, 'YYYY-MM-DD')
ELSE NULL
END AS eventDate,
r.locality,
r.altitude_min AS minimumElevationInMeters,
r.altitude_max AS maximumElevationInMeters,
r.comment AS occurrenceRemarks,
(
SELECT string_agg(CASE WHEN a.name = '' THEN a.surname ELSE CONCAT(a.name, ' ', a.surname) END, '|')
FROM atlas.records_authors
INNER JOIN atlas.authors AS a ON records_authors.authors_id = a.id
WHERE records_id = r.id) AS recordedBy,
r.source,
r.environment,
CASE
WHEN r.validation_status IN (0, 1) THEN 'verification required'
WHEN r.validation_status = 3 THEN 'verified'
ELSE NULL
END AS identificationVerificationStatus
r.original_name AS verbatimScientificName,
r.gps_coords_precision AS coordinatePrecision
FROM
atlas.records AS r
LEFT JOIN public.taxons AS t ON r.taxon_id = t.id
LEFT JOIN atlas.projects AS p ON r.project_id = p.id
WHERE
r.validation_status != 2
alebo aggregate
SELECT
CONCAT('BU-SAV:PLADIAS:', r.id) AS occurrenceID,
t.name_lat AS scientificName,
r.latitude AS decimalLatitude,
r.longitude AS decimalLongitude,
CASE
WHEN r.datum_precision = 'Y' THEN TO_CHAR(r.datum, 'YYYY')
WHEN r.datum_precision = 'M' THEN TO_CHAR(r.datum, 'YYYY-MM')
WHEN r.datum_precision = 'D' THEN TO_CHAR(r.datum, 'YYYY-MM-DD')
ELSE NULL
END AS eventDate,
r.locality,
r.altitude_min AS minimumElevationInMeters,
r.altitude_max AS maximumElevationInMeters,
r.comment AS occurrenceRemarks,
string_agg(CONCAT(ra.name, ' ', ra.surname), '|') AS recordedBy
r.source,
r.environment,
CASE
WHEN r.validation_status IN (0, 1) THEN 'verification required'
WHEN r.validation_status = 3 THEN 'verified'
ELSE NULL
END AS identificationVerificationStatus
r.original_name AS verbatimScientificName,
r.gps_coords_precision AS coordinatePrecision
FROM
atlas.records AS r
LEFT JOIN public.taxons AS t ON r.taxon_id = t.id
LEFT JOIN atlas.projects AS p ON r.project_id = p.id
WHERE
r.validation_status != 2
GROUP BY r.id
from pladias-dwc-export.
Este som si vsimol, r.gps_coords_precision
je v metroch alebo v stupnoch?
from pladias-dwc-export.
Ta r.gps_coords_precision
je v metrech.
Upravil jsem SQL, aby se daly spustit. Otestoval jsem export na 14 mil. záznamech z českého Pladiasu. Dopadlo to lépe pro subquery (3:26) oproti group by(12:20). Možná je problém, že group by seskupuje podle všech sloupců včetně double latitude a longitude. Pro úplnost jsem ještě vymyslel jeden přístup přes join(4:19). Ten je o něco horší než subquery.
subquery
SELECT
CONCAT('BU-SAV:PLADIAS:', r.id) AS occurrenceID,
t.name_lat AS scientificName,
r.latitude AS decimalLatitude,
r.longitude AS decimalLongitude,
CASE
WHEN r.datum_precision = 'Y' THEN TO_CHAR(r.datum, 'YYYY')
WHEN r.datum_precision = 'M' THEN TO_CHAR(r.datum, 'YYYY-MM')
WHEN r.datum_precision = 'D' THEN TO_CHAR(r.datum, 'YYYY-MM-DD')
ELSE NULL
END AS eventDate,
r.locality,
r.altitude_min AS minimumElevationInMeters,
r.altitude_max AS maximumElevationInMeters,
r.comment AS occurrenceRemarks,
(
SELECT string_agg(CASE WHEN a.name = '' THEN a.surname ELSE CONCAT(a.name, ' ', a.surname) END, '|')
FROM atlas.records_authors
INNER JOIN atlas.authors AS a ON records_authors.authors_id = a.id
WHERE records_id = r.id) AS recordedBy,
r.source,
r.environment,
CASE
WHEN r.validation_status IN (0, 1) THEN 'verification required'
WHEN r.validation_status = 3 THEN 'verified'
ELSE NULL
END AS identificationVerificationStatus,
r.original_name AS verbatimScientificName,
r.gps_coords_precision AS coordinatePrecision
FROM
atlas.records AS r
LEFT JOIN public.taxons AS t ON r.taxon_id = t.id
LEFT JOIN atlas.projects AS p ON r.project_id = p.id
WHERE
r.validation_status != 2
group by
SELECT
CONCAT('BU-SAV:PLADIAS:', r.id) AS occurrenceID,
t.name_lat AS scientificName,
r.latitude AS decimalLatitude,
r.longitude AS decimalLongitude,
CASE
WHEN r.datum_precision = 'Y' THEN TO_CHAR(r.datum, 'YYYY')
WHEN r.datum_precision = 'M' THEN TO_CHAR(r.datum, 'YYYY-MM')
WHEN r.datum_precision = 'D' THEN TO_CHAR(r.datum, 'YYYY-MM-DD')
ELSE NULL
END AS eventDate,
r.locality,
r.altitude_min AS minimumElevationInMeters,
r.altitude_max AS maximumElevationInMeters,
r.comment AS occurrenceRemarks,
string_agg(CASE WHEN a.name = '' THEN a.surname ELSE CONCAT(a.name, ' ', a.surname) END, '|') AS recordedBy,
r.source,
r.environment,
CASE
WHEN r.validation_status IN (0, 1) THEN 'verification required'
WHEN r.validation_status = 3 THEN 'verified'
ELSE NULL
END AS identificationVerificationStatus,
r.original_name AS verbatimScientificName,
r.gps_coords_precision AS coordinatePrecision
FROM
atlas.records AS r
LEFT JOIN public.taxons AS t ON r.taxon_id = t.id
LEFT JOIN atlas.projects AS p ON r.project_id = p.id
LEFT JOIN atlas.records_authors AS ra ON ra.records_id = r.id
LEFT JOIN atlas.authors AS a ON ra.authors_id = a.id
WHERE
r.validation_status != 2
GROUP BY r.id, t.name_lat, r.latitude, r.longitude, r.datum_precision, r.locality, r.altitude_min,
r.altitude_max, r.comment, r.source, r.environment, r.validation_status,
r.original_name, r.gps_coords_precision;
join
SELECT
CONCAT('BU-SAV:PLADIAS:', r.id) AS occurrenceID,
t.name_lat AS scientificName,
r.latitude AS decimalLatitude,
r.longitude AS decimalLongitude,
CASE
WHEN r.datum_precision = 'Y' THEN TO_CHAR(r.datum, 'YYYY')
WHEN r.datum_precision = 'M' THEN TO_CHAR(r.datum, 'YYYY-MM')
WHEN r.datum_precision = 'D' THEN TO_CHAR(r.datum, 'YYYY-MM-DD')
ELSE NULL
END AS eventDate,
r.locality,
r.altitude_min AS minimumElevationInMeters,
r.altitude_max AS maximumElevationInMeters,
r.comment AS occurrenceRemarks,
a.names AS recordedBy,
r.source,
r.environment,
CASE
WHEN r.validation_status IN (0, 1) THEN 'verification required'
WHEN r.validation_status = 3 THEN 'verified'
ELSE NULL
END AS identificationVerificationStatus,
r.original_name AS verbatimScientificName,
r.gps_coords_precision AS coordinatePrecision
FROM
atlas.records AS r
LEFT JOIN public.taxons AS t ON r.taxon_id = t.id
LEFT JOIN atlas.projects AS p ON r.project_id = p.id
LEFT JOIN (
SELECT records_id, string_agg(CASE WHEN a.name = '' THEN a.surname ELSE CONCAT(a.name, ' ', a.surname) END, '|') AS names
FROM atlas.records_authors
INNER JOIN atlas.authors AS a ON records_authors.authors_id = a.id
GROUP BY records_id) AS a ON r.id = a.records_id
WHERE
r.validation_status != 2
;
from pladias-dwc-export.
Related Issues (2)
- Ochrana HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pladias-dwc-export.