Coder Social home page Coder Social logo

mjanez / ckan-docker Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ckan/ckan-docker

0.0 0.0 2.0 9.46 MB

Custom CKAN Docker Compose Deployment

Home Page: https://mjanez.github.io/ckan-docs

Shell 57.50% HTML 10.51% Dockerfile 31.99%
ckan dcat-ap docker docker-compose geodcat-ap inspire iso-19139 linked-data metadata spatial-data

ckan-docker's Introduction

๐Ÿ‘‹ Hi, Iโ€™m @mjanez

๐ŸŒ Geospatial enthusiast and software developer working mainly with containers ๐Ÿ‹ & Python ๐Ÿ. Passionate about data, spatial analysis and environmental solutions. Specialised in Geographic Information Systems (GIS) development, Open Data portals and compliance with international standards (ISO, INSPIRE, OGC, DCAT).


๐Ÿ‘ท Check out what I'm currently working on ...

GIS & Open Data
CKAN improvements
Your One-Stop Shop for All Things CKAN
  • mjanez/ckan-docs - CKAN Docs: A comprehensive guide for deploying CKAN in various environments, complete with API documentation, tips, and more, all in a multilang Docusaurus website (EN/ES).
  • mjanez/ckan-openapi - Documents the CKAN API using Swagger, offering clear and concise reference documentation for CKAN users and developers.

๐Ÿ“Š Stats


๐Ÿค Collaboration

Open to collaborating on geospatial projects, open data initiatives, and open-source GIS development. Let's connect and innovate together!

ckan-docker's People

Contributors

ajs6f avatar alasdairgray avatar amercader avatar avdata99 avatar clementmouchet avatar gauravp-nec avatar kowh-ai avatar mjanez avatar niryuu avatar wardi avatar

ckan-docker's Issues

Load data into datastore

Once the use of the datapusher is deprecated (7db1611), several alternatives for loading structured data into the CKAN database (datastore) are proposed:

  1. ckanext-xloader (in background ckan-xloader container)

    • Fix xloader API Token update in CKAN_INI.

      #!/bin/bash
      # Add ckanext.xloader.api_token to the CKAN config file
      echo "Loading ckanext-xloader settings in the CKAN config file"
      ckan config-tool $CKAN_INI \
      "ckanext.xloader.api_token = xxx" \
      "ckanext.xloader.jobs_db.uri = $CKANEXT__XLOADER__JOBS__DB_URI"
      # Create ckanext-xloader API_TOKEN
      echo "Set up ckanext.xloader.api_token in the CKAN config file"
      ckan config-tool $CKAN_INI "ckanext.xloader.api_token = $(ckan -c $CKAN_INI user token add ckan_admin xloader | tail -n 1 | tr -d '\t')"
      #TODO: Setup worker background
      #echo "Set up CKAN jobs worker"
      #ckan -c $CKAN_INI jobs worker default

    • Update ckan/setup/supervisord to include xloader worker in the background.

      supervisor.conf

      [unix_http_server]
      file = /tmp/supervisor.sock
      chmod = 0777
      chown = nobody:nogroup
      
      [supervisord]
      logfile = /tmp/supervisord.log
      logfile_maxbytes = 50MB
      logfile_backups=10
      loglevel = info
      pidfile = /tmp/supervisord.pid
      nodaemon = true
      umask = 022
      identifier = supervisor
      
      [supervisorctl]
      serverurl = unix:///tmp/supervisor.sock
      
      [rpcinterface:supervisor]
      supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface
      
      [include]
      files = /etc/supervisord.d/*.conf
      

      supervisor.worker.conf

      [program:ckan-worker]
      command=ckan -c /srv/app/ckan.ini jobs worker
      priority=501
      autostart=true
      autorestart=true
      redirect_stderr=true
      stdout_logfile=/dev/stdout
      stdout_logfile_maxbytes=0
      stderr_logfile=/dev/stdout
      stderr_logfile_maxbytes=0
      user=ckan
      environment=HOME="/srv/app",USER="ckan"
      
    • Update setup/prerun.py

           import os
           import sys
           import subprocess
           import psycopg2
           try:
               from urllib.request import urlopen
               from urllib.error import URLError
           except ImportError:
               from urllib2 import urlopen
               from urllib2 import URLError
           
           import time
           import re
           import json
           
           ckan_ini = os.environ.get("CKAN_INI", "/srv/app/ckan.ini")
           
           RETRY = 5
           
           
           def update_plugins():
           
               plugins = os.environ.get("XLOADER__PLUGINS", "")
               print(("[prerun] Setting the following plugins in {}:".format(ckan_ini)))
               print(plugins)
               cmd = ["ckan", "config-tool", ckan_ini, "ckan.plugins = {}".format(plugins)]
               subprocess.check_output(cmd, stderr=subprocess.STDOUT)
               print("[prerun] Plugins set.")
           
           
           def check_main_db_connection(retry=None):
           
               conn_str = os.environ.get("CKAN_SQLALCHEMY_URL")
               if not conn_str:
                   print("[prerun] CKAN_SQLALCHEMY_URL not defined, not checking db")
               return check_db_connection(conn_str, retry)
           
           
           def check_datastore_db_connection(retry=None):
           
               conn_str = os.environ.get("CKAN_DATASTORE_WRITE_URL")
               if not conn_str:
                   print("[prerun] CKAN_DATASTORE_WRITE_URL not defined, not checking db")
               return check_db_connection(conn_str, retry)
           
           
           def check_db_connection(conn_str, retry=None):
           
               if retry is None:
                   retry = RETRY
               elif retry == 0:
                   print("[prerun] Giving up after 5 tries...")
                   sys.exit(1)
           
               try:
                   connection = psycopg2.connect(conn_str)
           
               except psycopg2.Error as e:
                   print(str(e))
                   print("[prerun] Unable to connect to the database, waiting...")
                   time.sleep(10)
                   check_db_connection(conn_str, retry=retry - 1)
               else:
                   connection.close()
           
           
           def check_solr_connection(retry=None):
           
               if retry is None:
                   retry = RETRY
               elif retry == 0:
                   print("[prerun] Giving up after 5 tries...")
                   sys.exit(1)
           
               url = os.environ.get("CKAN_SOLR_URL", "")
               search_url = '{url}/schema/name?wt=json'.format(url=url)
           
               try:
                   connection = urlopen(search_url)
               except URLError as e:
                   print(str(e))
                   print("[prerun] Unable to connect to solr, waiting...")
                   time.sleep(10)
                   check_solr_connection(retry=retry - 1)
               else:
                   import re                                                                                                                                                      
                   conn_info = connection.read()                                                                                                                                  
                   schema_name = json.loads(conn_info)                                                                                                                            
                   if 'ckan' in schema_name['name']:                                                                                                                              
                       print('[prerun] Succesfully connected to solr and CKAN schema loaded')                                                                                     
                   else:                                                                                                                                                          
                       print('[prerun] Succesfully connected to solr, but CKAN schema not found')
           
           
           def init_db():
           
               db_command = ["ckan", "-c", ckan_ini, "db", "init"]
               print("[prerun] Initializing or upgrading db - start")
               try:
                   subprocess.check_output(db_command, stderr=subprocess.STDOUT)
                   print("[prerun] Initializing or upgrading db - end")
               except subprocess.CalledProcessError as e:
                   if "OperationalError" in e.output:
                       print(e.output)
                       print("[prerun] Database not ready, waiting a bit before exit...")
                       time.sleep(5)
                       sys.exit(1)
                   else:
                       print(e.output)
                       raise e
           
           
           def init_datastore_db():
           
               conn_str = os.environ.get("CKAN_DATASTORE_WRITE_URL")
               if not conn_str:
                   print("[prerun] Skipping datastore initialization")
                   return
           
               datastore_perms_command = ["ckan", "-c", ckan_ini, "datastore", "set-permissions"]
           
               connection = psycopg2.connect(conn_str)
               cursor = connection.cursor()
           
               print("[prerun] Initializing datastore db - start")
               try:
                   datastore_perms = subprocess.Popen(
                       datastore_perms_command, stdout=subprocess.PIPE
                   )
           
                   perms_sql = datastore_perms.stdout.read()
                   # Remove internal pg command as psycopg2 does not like it
                   perms_sql = re.sub(b'\\\\connect "(.*)"', b"", perms_sql)
                   cursor.execute(perms_sql)
                   for notice in connection.notices:
                       print(notice)
           
                   connection.commit()
           
                   print("[prerun] Initializing datastore db - end")
                   print(datastore_perms.stdout.read())
               except psycopg2.Error as e:
                   print("[prerun] Could not initialize datastore")
                   print(str(e))
           
               except subprocess.CalledProcessError as e:
                   if "OperationalError" in e.output:
                       print(e.output)
                       print("[prerun] Database not ready, waiting a bit before exit...")
                       time.sleep(5)
                       sys.exit(1)
                   else:
                       print(e.output)
                       raise e
               finally:
                   cursor.close()
                   connection.close()
           
           
           def create_sysadmin():
           
               name = os.environ.get("CKAN_SYSADMIN_NAME")
               password = os.environ.get("CKAN_SYSADMIN_PASSWORD")
               email = os.environ.get("CKAN_SYSADMIN_EMAIL")
           
               if name and password and email:
           
                   # Check if user exists
                   command = ["ckan", "-c", ckan_ini, "user", "show", name]
           
                   out = subprocess.check_output(command)
                   if b"User:None" not in re.sub(b"\s", b"", out):
                       print("[prerun] Sysadmin user exists, skipping creation")
                       return
           
                   # Create user
                   command = [
                       "ckan",
                       "-c",
                       ckan_ini,
                       "user",
                       "add",
                       name,
                       "password=" + password,
                       "email=" + email,
                   ]
           
                   subprocess.call(command)
                   print("[prerun] Created user {0}".format(name))
           
                   # Make it sysadmin
                   command = ["ckan", "-c", ckan_ini, "sysadmin", "add", name]
           
                   subprocess.call(command)
                   print("[prerun] Made user {0} a sysadmin".format(name))
           
                   # cleanup permissions
                   # We're running as root before pivoting to uwsgi and dropping privs
                   data_dir = "%s/storage" % os.environ['CKAN_STORAGE_PATH']
           
                   command = ["chown", "-R", "ckan:ckan", data_dir]
                   subprocess.call(command)
                   print("[prerun] Ensured storage directory is owned by ckan")
           
           if __name__ == "__main__":
           
               maintenance = os.environ.get("MAINTENANCE_MODE", "").lower() == "true"
           
               if maintenance:
                   print("[prerun] Maintenance mode, skipping setup...")
               else:
                   check_main_db_connection()
                   init_db()
                   update_plugins()
                   check_datastore_db_connection()
                   init_datastore_db()
                   check_solr_connection()
                   create_sysadmin()
           ```
    • Update setup/start_ckan.sh
         #!/bin/sh
         
         # Add ckan.datapusher.api_token to the CKAN config file (updated with corrected value later)
         ckan config-tool $CKAN_INI ckan.datapusher.api_token=xxx
         
         # Add ckan.xloader.api_token to the CKAN config file (updated with corrected value later)
         ckan config-tool $CKAN_INI ckan.xloader.api_token=xxx
         
         # Set up the Secret key used by Beaker and Flask
         # This can be overriden using a CKAN___BEAKER__SESSION__SECRET env var
         if grep -E "beaker.session.secret ?= ?$" ckan.ini
         then
             echo "Setting beaker.session.secret in ini file"
             ckan config-tool $CKAN_INI "beaker.session.secret=$(python3 -c 'import secrets; print(secrets.token_urlsafe())')"
             ckan config-tool $CKAN_INI "WTF_CSRF_SECRET_KEY=$(python3 -c 'import secrets; print(secrets.token_urlsafe())')"
             JWT_SECRET=$(python3 -c 'import secrets; print("string:" + secrets.token_urlsafe())')
             ckan config-tool $CKAN_INI "api_token.jwt.encode.secret=${JWT_SECRET}"
             ckan config-tool $CKAN_INI "api_token.jwt.decode.secret=${JWT_SECRET}"
         fi
         
         # Run the prerun script to init CKAN and create the default admin user
         python3 prerun.py
         
         echo "Set up ckan.datapusher.api_token in the CKAN config file"
         ckan config-tool $CKAN_INI "ckan.datapusher.api_token=$(ckan -c $CKAN_INI user token add ckan_admin datapusher | tail -n 1 | tr -d '\t')"
         
         echo "Set up ckan.xloader.api_token in the CKAN config file"
         ckan config-tool $CKAN_INI "ckan.xloader.api_token=$(ckan -c $CKAN_INI user token add ckan_admin xloader | tail -n 1 | tr -d '\t')"
         
         echo "Set up ckanext.xloader.jobs_db.uri in the CKAN config file"
         ckan config-tool $CKAN_INI "ckanext.xloader.jobs_db.uri=${CKAN_SQLALCHEMY_URL}"
         
         # Run any startup scripts provided by images extending this one
         if [[ -d "/docker-entrypoint.d" ]]
         then
             for f in /docker-entrypoint.d/*; do
                 case "$f" in
                     *.sh)     echo "$0: Running init file $f"; . "$f" ;;
                     *.py)     echo "$0: Running init file $f"; python3 "$f"; echo ;;
                     *)        echo "$0: Ignoring $f (not an sh or py file)" ;;
                 esac
                 echo
             done
         fi
         
         # Set the common uwsgi options
         UWSGI_OPTS="--plugins http,python \
                     --socket /tmp/uwsgi.sock \
                     --wsgi-file /srv/app/wsgi.py \
                     --module wsgi:application \
                     --uid 92 --gid 92 \
                     --http 0.0.0.0:5000 \
                     --master --enable-threads \
                     --lazy-apps \
                     -p 2 -L -b 32768 --vacuum \
                     --harakiri $UWSGI_HARAKIRI"
         
         if [ $? -eq 0 ]
         then
             # Start supervisord
             supervisord --configuration /etc/supervisord.conf &
             # Start uwsgi
             uwsgi $UWSGI_OPTS
         else
           echo "[prerun] failed...not starting CKAN with XLoader."
         fi
         ```
      
      
  2. Standalone container (aircan) with ckanext-aircan. [Preferred]

Update docker.yml actions

Fix warnings:

Node.js 12 actions are deprecated. Please update the following actions to use Node.js 16: actions/checkout@v2, docker/setup-buildx-action@v1, docker/setup-qemu-action@v1. For more information see: https://github.blog/changelog/2022-09-22-github-actions-all-actions-will-begin-running-on-node16-instead-of-node12/

name: Build & Push CKAN-Spatial Docker image
on:
# Trigger the workflow after build.yml,
# but only for the master branch
workflow_run:
workflows: ["Build CKAN Docker auxiliary images"]
branches: [master]
types:
- completed
env:
REGISTRY: ghcr.io
IMAGE_NAME: mjanez/ckan-spatial
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
- name: Set up QEMU
uses: docker/setup-qemu-action@v1
- name: Login to registry
if: github.event_name != 'pull_request'
uses: docker/login-action@v2
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract Docker metadata
id: meta
uses: docker/metadata-action@v4
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
- name: CKAN Build and push
uses: docker/build-push-action@v3
with:
push: ${{ github.event_name != 'pull_request' }}
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
context: ./ckan
file: ./ckan/Dockerfile

Improve .env documentation

Add documentation on environment variables. The `.env' file is large and lacks detail and clarity. This can lead to confusion and errors when setting up and configuring Docker Compose files.

Feature - ckanext-fluent

Add the mjanez/ckanext-fluent extension to allow multilingual metadata in the catalogue.

Tasks

Plugins

ckan-docker components

Feature - Separate containers for harvest (gather/consumer) and xloader

Info

Background

First a deployment was developed with all workers in the background in the main CKAN container, but in order to handle high availability and scalability, it is proposed to separate the workers into isolated containers, connected to each other.

More info: #102

To do

To allow horizontal scalability, it would be better to use workers in separate containers rather than as background cron jobs within the `ckan' container.

They communicate with the ckan container via redis, and need access to DB, solr and filesystem (config, storage, logs, etc.)

References:

Example

  gather_consumer:
    << : *default-common-ckan
    image: ${COMPOSE_PROJECT_NAME}-dcatapit_ckan:latest
    container_name: ${COMPOSE_PROJECT_NAME}-gather
    depends_on:
      ckan:
        condition: service_healthy
    entrypoint: /consumer-entrypoint.sh
    volumes:
      - ckan-data:/var/lib/ckan
      - ckan-config:/etc/ckan
      - ckan-logs:/var/log/ckan
      - ./docker/ckan/ckan_harvesting_gather.conf:/etc/supervisor/conf.d/ckan_harvesting.conf

  fetch_consumer:
    << : *default-common-ckan
    image: ${COMPOSE_PROJECT_NAME}-dcatapit_ckan:latest
    container_name: ${COMPOSE_PROJECT_NAME}-fetch
    depends_on:
      ckan:
        condition: service_healthy
    entrypoint: /consumer-entrypoint.sh
    volumes:
      - ckan-data:/var/lib/ckan
      - ckan-config:/etc/ckan
      - ckan-logs:/var/log/ckan
      - ./docker/ckan/ckan_harvesting_fetch.conf:/etc/supervisor/conf.d/ckan_harvesting.conf

Upload large files to CKAN

CKAN's filestore was not designed to handle large file sizes (>GB), the alternative is to use cloud storage. ckanext-cloudstorage implements support for using S3, Azure, or any of the 15 different storage providers supported by libcloud for CKAN.

Check ckanext-cloudstorage setup and:

  • Add ckanext-cloudstorage extension to /ckan/Dockerfile
  • Add ckanext vars to .env (plugins, cloudstorage vars, etc.)

Update actions

Update actions/checkout@v2 to actions/checkout@v3

Node.js 12 actions are deprecated. Please update the following actions to use Node.js 16: actions/checkout@v2. For more information see: https://github.blog/changelog/2022-09-22-github-actions-all-actions-will-begin-running-on-node16-instead-of-node12/.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.