Coder Social home page Coder Social logo

merative / spm-kubernetes Goto Github PK

View Code? Open in Web Editor NEW
9.0 20.0 13.0 23.96 MB

This repository contains artifacts to assist IBM Cúram SPM customers in their journey to Kubernetes

License: Apache License 2.0

Dockerfile 4.06% Shell 3.52% Smarty 8.25% JavaScript 5.64% SCSS 0.90% MDX 77.62%

spm-kubernetes's Issues

Helm labelling improvements - Service Mesh

Have been going through setting up SPM with OpenShift service mesh. I came across a few inconsistences with the naming of ports and pod labels which caused some problems getting this set up

  1. Naming of ports following: protocol-suffix
    For example, MQ server has:
    name: console-https
    To show metrics in Kiali the expected is:
    name: https-console

  2. Adding in the labels for pods app and version to the helm charts, as this is what is displayed on the graph charts, rather than a user having to go through and add them in individually.
    Labels to add: name and version for each deployment

Potential documentation update

During going through the SPM Runbook (https://ibm.github.io/spm-kubernetes/prereq/kubernetes/minikube), I have found that some commands seem to be outdated or incorrect.

For example, in "minikube start --vm-driver=virtualbox --cpus 4 --memory 8G --insecure-registry "192.168.0.0/16" --disk-size='30G' --kubernetes-version v1.18.6", --vm-driver='': DEPRECATED, use driver instead and no need to have '' for disk-size...etc.

If I am correct, we might need to update the document? Thanks.

  • Minikube version is v1.17.1 and docker is 20.10.2

Inconsistent MQ tlsSecretName variable logic

Issue:

When setting variables for MQ as follows:

  mq:
    version: 9.1.3.0
    # Set to True if running MQ in HA mode
    useConnectionNameList: true
    tlsSecretName: 'spm-dev01-mq-secret'

The pods for mqserver-curam and mqserver-rest don't start correctly and throw the following error:

MountVolume.SetUp failed for volume "service-certs" : secret "spm-dev01-spm-dev01-mq-secret" not found

This seems to be caused by inconsistencies between apps/templates/deployment-consumer.yaml, apps/templates/deployment-producer.yaml and mqserver/templates/deployment.yaml

the Apps producer/consumer deployment scripts set the mq-cert secret volume as such:

        {{- if $.Values.global.mq.tlsSecretName }}
        - name: mq-certs
          secret:
            {{- if $.Values.global.mq.useConnectionNameList }}
            secretName: {{ $.Values.global.mq.tlsSecretName }}
            {{- else }}
            secretName: {{ $.Release.Name }}-mq-secret
            {{- end }}
        {{- end}}

Whereas the MQ deployment.yaml sets service-certs as:

        {{- if $.Values.global.mq.tlsSecretName }}
        - name: service-certs
          secret:
            secretName: {{ $.Release.Name }}-{{ $.Values.global.mq.tlsSecretName }}
        {{- end}}

This leads the the namespace, in the this case spm-dev01 been suffixed to service-certs but not to mq-certs

If tlsSecretName is left at the default of mq-secret the opposite occurs and the consumer/producer pods fail to deploy.

Solution:

Changing mqserver/templates/deployment.yaml and mqserver/templates/statefulset.yaml

from:

        {{- if $.Values.global.mq.tlsSecretName }}
        - name: service-certs
          secret:
            secretName: {{ $.Release.Name }}-{{ $.Values.global.mq.tlsSecretName }}
        {{- end}}

to:

        {{- if $.Values.global.mq.tlsSecretName }}
        - name: service-certs
          secret:
            {{- if $.Values.global.mq.useConnectionNameList }}
            secretName: {{ $.Values.global.mq.tlsSecretName }}
            {{- else }}
            secretName: {{ $.Release.Name }}-mq-secret
            {{- end }}
        {{- end}}

Fixes the issue, although i'm a bit unsure on some of the logic here, don't the if statements need to be other other way round e.g

        {{- if $.Values.global.mq.useConnectionNameList }}
        - name: service-certs
          secret:
            {{- if $.Values.global.mq.tlsSecretName }}
            secretName: {{ $.Values.global.mq.tlsSecretName }}
            {{- else }}
            secretName: {{ $.Release.Name }}-mq-secret
            {{- end }}
        {{- end}}

helm commands not working

The following commands are not working, note I am on helm 3.

helm push apps local-development

Error: unknown command "push" for "helm"
Did you mean this?
pull
Run 'helm --help' for usage.

CuramBirtViewer chart still causes frequent logout's

Issue:

When building the CuramBirtViewer image like so:

# Building curambirtviewer image
cd "${SPM_CONTAINERISATION_HOME}/dockerfiles/Liberty/"
docker build \
     --tag curambirtviewer:latest \
     --file ClientEAR.Dockerfile \
     --build-arg "SERVERCODE_IMAGE=servercode:latest" \
     --build-arg "EAR_NAME=CuramBIRTViewer" .

On deployment if this container using the SPM helm charts, SPM constantly logs out the logged in user, this was an issue in OpenLiberty/open-liberty#9663 which has now been fixed in Liberty version 20.0.0.6 which seems to be used to build the container images, however the logout's still occur.

SPM Install fails with Invalid Secret Key format error message

When I try to install SPM (https://ibm.github.io/spm-kubernetes/01-deploy-spm/SPM-sw)

java -jar IBM\ Curam\ Social\ Program\ Management\ Platform\ Development.jar

I get following error message:

Apr 14, 2020 8:41:50 PM java.io.ObjectInputStream filterCheck
INFO: ObjectInputFilter REJECTED: class com.sun.crypto.provider.SealedObjectForKeyProtector, array length: -1, nRefs: 1, depth: 1, bytes: 70, ex: n/a
java.io.IOException: Invalid secret key format
	at com.ibm.crypto.provider.JceKeyStore.engineLoad(Unknown Source)
	at java.security.KeyStore.load(KeyStore.java:1445)
	at curam.util.security.Encryption$CryptoConfig.getKeyFromKeyStore(Encryption.java:1040)
	at curam.util.security.Encryption$CryptoConfig.getCipherKey(Encryption.java:947)
	at curam.util.security.Encryption.setConfiguration(Encryption.java:201)
	at curam.util.security.EncryptionConfiguration.<init>(EncryptionConfiguration.java:31)
	at curam.util.security.EncryptPassword.encryptDBPassword(EncryptPassword.java:16)
	at curam.installerinf.panelhelpers.BootstrapHelper.encryptPassword(BootstrapHelper.java:204)
	at curam.installerinf.panelhelpers.BootstrapHelper.updateBoostrapFile(BootstrapHelper.java:97)
	at curam.installerinf.processmanager.BootstrapManager.run(BootstrapManager.java:84)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.izforge.izpack.installer.ProcessPanelWorker$ExecutableClass.run(Unknown Source)
	at com.izforge.izpack.installer.ProcessPanelWorker$ProcessingJob.run(Unknown Source)
	at com.izforge.izpack.installer.ProcessPanelWorker.run(Unknown Source)
	at java.lang.Thread.run(Thread.java:748)
Apr 14, 2020 8:41:50 PM java.io.ObjectInputStream filterCheck
INFO: ObjectInputFilter REJECTED: class com.sun.crypto.provider.SealedObjectForKeyProtector, array length: -1, nRefs: 1, depth: 1, bytes: 70, ex: n/a
java.io.IOException: Invalid secret key format
	at com.ibm.crypto.provider.JceKeyStore.engineLoad(Unknown Source)

I have even tried to add unrestricted JCE jar files in my jre > security folder

Docker execution fails

I am trying to download WebSphere Liberty image, and execute the following:

https://ibm.github.io/spm-kubernetes/02-build-images/setup_docker_context
docker run --rm
-v $ANT_HOME:/tmp/ant
-v $SPM_HOME/dockerfiles/Liberty/content/release-stage:/work/dir
-v $SPM_HOME/dockerfiles/Liberty/content/release-stage/SetEnvironment.sh:/work/SetEnvironment.sh
-w /work/dir
-u root
-e ANT_HOME=/tmp/ant
-e WLP_HOME=/opt/ibm/wlp
websphere-liberty:19.0.0.12-full-java8-ibmjava
bash -c 'export PATH=$ANT_HOME/bin:$PATH:.; build.sh internal.update.crypto.jar'

There are couple of issues:

  1. I get image not found, however, I was able to pull first
    docker pull websphere-liberty:19.0.0.12-full-java8-ibmjava

then run the above.

  1. While running I get the following error:
    docker run --rm
    -v $ANT_HOME:/tmp/ant
    -v $SPM_HOME/dockerfiles/Liberty/content/release-stage:/work/dir
    -v $SPM_HOME/dockerfiles/Liberty/content/release-stage/SetEnvironment.sh:/work/SetEnvironment.sh
    -w /work/dir
    -u root
    -e ANT_HOME=/tmp/ant
    -e WLP_HOME=/opt/ibm/wlp
    websphere-liberty:19.0.0.12-full-java8-ibmjava
    bash -c 'export PATH=$ANT_HOME/bin:$PATH:.; build.sh internal.update.crypto.jar'
    Unable to locate tools.jar. Expected to find it in /opt/ibm/java/lib/tools.jar

Underlying reason is that /opt/ibm/java/lib folder is missing

helm install is getting "toomanyrequests: You have reached your pull rate limit" error

My helm install failed with:

Error: failed pre-install: timed out waiting for the condition
helm.go:81: [debug] failed pre-install: timed out waiting for the condition

I used the following to get details on the pod: kubectl describe pod releasename-apps-create-ltpa-keys-xrdp5 - note the "toomanyrequests: You have reached your pull rate limit" error:

Events:
  Type     Reason     Age                 From               Message
  ----     ------     ----                ----               -------
  Normal   Scheduled  3m46s               default-scheduler  Successfully assigned ocp/releasename-apps-create-ltpa-keys-fnkq7 to crc-ctj2r-master-0
  Normal   Pulled     3m6s                kubelet            Container image "ibmcom/websphere-liberty:kernel-java8-ibmjava-ubi" already present on machine
  Normal   Created    2m58s               kubelet            Created container create-ltpa-keys
  Normal   Started    2m48s               kubelet            Started container create-ltpa-keys
  Warning  Failed     60s                 kubelet            Failed to pull image "bitnami/kubectl:1.19": rpc error: code = Unknown desc = Error reading manifest 1.19 in docker.io/bitnami/kubectl: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit
  Warning  Failed     60s                 kubelet            Error: ErrImagePull
  Normal   BackOff    59s                 kubelet            Back-off pulling image "bitnami/kubectl:1.19"
  Warning  Failed     59s                 kubelet            Error: ImagePullBackOff 
  Normal   Pulling    46s (x2 over 2m9s)  kubelet            Pulling image "bitnami/kubectl:1.19"

To try to get around that, I authenticated to Docker Hub, pulled and tagged "kubectl" image as just "ocp/kubectl:latest" and did:
docker push default-route-openshift-image-registry.apps-crc.testing/ocp/kubectl:latest

I then modified Helm Charts to look for "kubectl:latest" instead of "bitnami/kubectl:1.19", issued
helm push apps + helm repo update
and then tried it all again, but that also failed - note the "unauthorized: authentication required":

Events:
  Type     Reason          Age                    From               Message
  ----     ------          ----                   ----               -------
  Normal   Scheduled       10m                    default-scheduler  Successfully assigned ocp/releasename-apps-create-ltpa-keys-rfq2j to crc-ctj2r-master-0
  Normal   AddedInterface  9m56s                  multus             Add eth0 [10.217.0.189/23]
  Normal   Pulled          9m41s                  kubelet            Container image "ibmcom/websphere-liberty:kernel-java8-ibmjava-ubi" already present on machine
  Normal   Created         9m32s                  kubelet            Created container create-ltpa-keys
  Normal   Started         9m19s                  kubelet            Started container create-ltpa-keys
  Normal   Pulling         7m3s (x4 over 8m45s)   kubelet            Pulling image "kubectl:latest"
  Warning  Failed          6m58s (x4 over 8m40s)  kubelet            Failed to pull image "kubectl:latest": rpc error: code = Unknown desc = Error reading manifest latest in docker.io/library/kubectl: errors:
denied: requested access to the resource is denied
unauthorized: authentication required
  Warning  Failed   6m58s (x4 over 8m40s)   kubelet  Error: ErrImagePull
  Normal   BackOff  6m44s (x5 over 8m14s)   kubelet  Back-off pulling image "kubectl:latest"
  Warning  Failed   4m32s (x14 over 8m14s)  kubelet  Error: ImagePullBackOff

Question: it seems that one way or another, I need to authenticate to a registry so that the docker pull's can succeed. Please advise - thanks!

Missing Batch debug-file configmap

Issue:

The batch pod no longer starts with the following error:

MountVolume.SetUp failed for volume "debug-file" : configmap "spm-dev01-debug" not found

The batch/templates/cronjob.yaml file references {{ $.Release.Name }}-debug per:

            - name: debug-file
              configMap:
                name: {{ $.Release.Name }}-debug

However there doesn't seem to be a confiqmap-debug.yaml file or anything in the project for that volume to reference.

Discrepancy in the pre-req information for Openshift

On the pre-reqs page (https://ibm.github.io/spm-kubernetes/prereq/prereq/) the prereq minimum for Openshift says 4.6 in the table, but note 8 still has 4.5 "(8) | IBM Cúram Social Program Management supports OpenShift 4.5 or later".

Frankly, Note 8 doesn't provide any more information than the table itself - it may not be needed.

There is also a typo in Note 10 "(10) | Support for Docker 20.10 was introducted as part of the SPM@Kubernetes 21.2.0 release."

MQ statefulset.yaml init container uses mqserver image instead of ibmcom/mq

Issue:

in mqservers/templates/statefulsets.yaml the containers uses the ibmcom/mq image from the public IBM repo, whereas the init-containers uses the

container:

      containers:
        - name: {{ $.Chart.Name }}-{{ $name }}
          image: ibmcom/mq:{{ $.Values.global.mq.version }}

init-container:

      initContainers:
      - name: {{ $.Chart.Name }}-{{ $name }}-init
        image: {{ include "mqserver.imageFullName" $.Values.global.images }}

Shouldn't these be the same, the instructions in https://ibm.github.io/spm-kubernetes/03-build-images/build_images don't mention building or pushing the mqserver image. Because if this i'd get errors on pulling the init containers.

Solution:

Change the image to:

      initContainers:
      - name: {{ $.Chart.Name }}-{{ $name }}-init
        image: ibmcom/mq:{{ $.Values.global.mq.version }}

Once this is done the init container pulls OK.

Incorrect formatting of heredoc in createSSC.sh causes script to fail.

when running the createSSC.sh script like below:

./createSCC.sh -n spm-deploy

The script throws the following error on MacOS running zsh and windows using gitbash

./createSCC.sh: line 77: syntax error: unexpected end of file

this is due to indentation in the usage function, changing this:

function usage() {
  cat <<-USAGE #| fmt
  Usage: $0 [OPTIONS] [arg]
    OPTIONS:
    =======
    --namespace        [namespace]           - The name of an existing namespace for the SPM deployment.

    USAGE
}

to this:

function usage() {
  cat <<-USAGE #| fmt
  Usage: $0 [OPTIONS] [arg]
    OPTIONS:
    =======
    --namespace        [namespace]           - The name of an existing namespace for the SPM deployment.

USAGE
} 

fixes the issue.

MQ Server pods fail when deploying to AWS using EFS storage

MQ Server pods fail when deploying to AWS using EFS storage


name: MQ Server pods fail when deploying to AWS using EFS storage


Describe the bug
When deploying the solution to an Openshift 4.4 cluster running in AWS and using AWS EFS for persistent storage the spm-mqserver-curam-0 and spm-mqserver-rest-0 pods fail with the message Error setting admin password: /usr/bin/sudo: exit status 1: sudo: effective uid is not 0, is /usr/bin/sudo on a file system with the 'nosuid' option set or an NFS file system without root privileges?

To Reproduce
Steps to reproduce the behavior:

  1. Provision an EFS Filesystem in AWS and give access to the security groups used by the OCP nodes
  2. Create required directories on the EFS filesystem - Deployment fails if directories do not exist
  3. Deploy helm charts using the command
    helm upgrade --install spm local-development/spm -f curam_containerisation/static/resources/os-values.yaml
  4. Rerun the deployment command - it always fails first time with the message Error: secrets "spm-mq-credentials" already exists
  5. Wait for deployment to complete

Expected behavior
Solution to be deployed to Openshift and the spm-mqserver-curam-0 and spm-mqserver-rest-0 pods to move into a running state.

Screenshots
If applicable, add screenshots to help explain your problem.

Please complete the following information:
* Openshift Version: [4.4.26]
* Cúram SPM Version: [7.0.10]

Additional context
Add any other context about the problem here.

Log Collection

2020-10-23T11:59:07.670Z CPU architecture: amd64
2020-10-23T11:59:07.670Z Linux kernel version: 4.18.0-193.23.1.el8_2.x86_64
2020-10-23T11:59:07.670Z Container runtime: kube
2020-10-23T11:59:07.670Z Base image: Red Hat Enterprise Linux Server 7.6 (Maipo)
2020-10-23T11:59:07.672Z Running as user ID 1000590000 (1000590000 user) with primary group 0, and supplementary groups 1000590000
2020-10-23T11:59:07.672Z Capabilities (bounding set): chown,dac_override,fowner,fsetid,setpcap,net_bind_service,net_raw,sys_chroot
2020-10-23T11:59:07.672Z seccomp enforcing mode: disabled
2020-10-23T11:59:07.672Z Process security attributes: system_u:system_r:container_t:s0:c19,c24
2020-10-23T11:59:07.672Z Detected 'nfs4' volume mounted to /mnt/mqm-log
2020-10-23T11:59:07.672Z Detected 'nfs4' volume mounted to /mnt/mqm-data
2020-10-23T11:59:07.672Z Detected 'nfs4' volume mounted to /mnt/mqm
2020-10-23T11:59:07.672Z Multi-instance queue manager: enabled
2020-10-23T11:59:07.674Z Error setting admin password: /usr/bin/sudo: exit status 1: sudo: effective uid is not 0, is /usr/bin/sudo on a file system with the 'nosuid' option set or an NFS file system without root privileges?

Can't access Curam after helm deployment

Looking at https://www.ibm.com/support/knowledgecenter/SS8S5A_7.0.11/com.ibm.curam.wlp.doc/Deployment_WLP/cWLPTestingDeployment.html it says to get the URL for the deployed app to search for CWWKT0016I.
For me this gives:
[AUDIT ] CWWKT0016I: Web application available (default_host): https://rel6-apps-curam-producer-88b8cc888-rml4q:8443/Curam/

This fails to return anything.

I also tried using the IP given from 'minikube ip', i.e. https://192.168.99.104:8443/Curam/, but this gives a 403:
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {

},
"status": "Failure",
"message": "forbidden: User "system:anonymous" cannot get path "/Curam/"",
"reason": "Forbidden",
"details": {

},
"code": 403
}

I tried kubectl describe service rel6-web and kubectl describe service rel6-apps-curam to see if there was an Ingress IP as suggested by some googling, but that didn't yield anything

attached output of kubectl logs for web, producer and consumer.

I can see in the consumer some issue with curamtimerdb (possibly missed a WLS config step) but wouldn't have thought this would cause the basic access issue? Any thoughts on what to do next much appreciated.

(also dockered onto the actual consumer/producer to see if anything in logs but the kubectl version seemed to have better info, unless i was looking in wrong place on WLS)

thanks
consumer.log
producer.log
web.log

NFS Mount paths don't mount unless the directories exist

Issue:

When running MQ with the following properties:

  mq:
    version: 9.1.3.0
    # Set to True if running MQ in HA mode
    useConnectionNameList: true
    tlsSecretName: 'spm-dev01-mq-secret'
    queueManager:
      name: 'QM1'
      secret:
        # name is the secret that contains the 'admin' user password and the 'app' user password to use for messaging
        name: ''
        # adminPasswordKey is the secret key that contains the 'admin' user password
        adminPasswordKey: 'adminPasswordKey'
        # appPasswordKey is the secret key that contains the 'admin' user password
        appPasswordKey: 'appPasswordKey'
    metrics:
      enabled: false
    resources: {}
    multiInstance:
      cephEnabled: false
      storageClassName: 'nfs'
      nfsEnabled: true
      nfsIP: 'fs-xxxxxxxx.efs.eu-west-2.amazonaws.com'
      nfsFolder: 'spm-dev01'
      nfsMountOptions:
        - "nfsvers=4.1"
        - "rsize=1048576"
        - "wsize=1048576"
        - "hard"
        - "timeo=600"
        - "retrans=2"
        - "noresvport"

When the curam-mq and rest-mq pods start they connect mount the AWS EFS file system, and Kubernetes(EKS) returns the following error:

  Warning  FailedMount  2m31s  kubelet, ip-100-64-18-180.eu-west-2.compute.internal  MountVolume.SetUp failed for volume "spm-dev01-curam-pv-qm" : mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/11e52fd5-fb8c-40b6-9cf7-b252f1f4e1ac/volumes/kubernetes.io~nfs/spm-dev01-curam-pv-qm --scope -- mount -t nfs -o hard,nfsvers=4.1,noresvport,retrans=2,rsize=1048576,timeo=600,wsize=1048576 fs-1c2e6eed.efs.eu-west-2.amazonaws.com:/spm-dev01/curam /var/lib/kubelet/pods/11e52fd5-fb8c-40b6-9cf7-b252f1f4e1ac/volumes/kubernetes.io~nfs/spm-dev01-curam-pv-qm
Output: Running scope as unit run-12552.scope.
mount.nfs: mounting fs-xxxxxx.efs.eu-west-2.amazonaws.com:/spm-dev01/curam failed, reason given by server: No such file or directory

Solution:

The only solution I've found for this is to the manually mount the EFS filesystem to one of the worker nodes and make the directories using:

sudo mount -t nfs -o hard,nfsvers=4.1,noresvport,retrans=2,rsize=1048576,timeo=600,wsize=1048576 fs-xxxxx.efs.eu-west-2.amazonaws.com:/ /mnt/
sudo mkdir -p /mnt/spm-dev01/curam/logs
sudo mkdir -p /mnt/spm-dev01/curam/data
sudo mkdir -p /mnt/spm-dev01/rest/data
sudo mkdir -p /mnt/spm-dev01/rest/logs

I'm not sure if there's an easier why do this using IKS but this seems to be the only way for AWS EFS, other suggestions include starting a init-container(see here) to start, mount the filesystem and then create the paths.

Couldn't the file system just about mounted as / and any paths created by the pods themselves at runtime?

JVM Segmentation error when building the database from batch image

Since the 12th of December 2020, a JVM Segmentation error when building the database from batch image.

prior to the 12th of December this was not an issue.

see error below

dispmsg:
     [echo] 10:48:54 Starting batchlauncher
[batchlauncher] Using configured properties for logging.
[batchlauncher] Running a Single Batch Program.
[batchlauncher] Connecting to DB2 data source : com.ibm.db2.jcc.DB2SimpleDataSource.
[batchlauncher] 'batch.username' not found.
[batchlauncher] Batch invoking : 'curam.util.internal.userpreference.intf.UserPreferenceLoader.insertUserPreferencesToDatabase'.
[batchlauncher] Unhandled exception
[batchlauncher] Type=Segmentation error vmState=0x00080002
[batchlauncher] J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
[batchlauncher] Handler1=00007F4165F26CA0 Handler2=00007F416580B140 InaccessibleAddress=00007F41AE96D817
[batchlauncher] RDI=00007F4160015CA0 RSI=00007F41AE96D817 RAX=0000000000000000 RBX=00007F41AE96D817
[batchlauncher] RCX=00007F4165FEBD98 RDX=00007F41601F1450 R8=00007F415F606840 R9=000000000000000A
[batchlauncher] R10=0000000000000001 R11=0000000000000000 R12=0000000000000005 R13=0000000000000041
[batchlauncher] R14=00007F41AE96D817 R15=00000000000000A8
[batchlauncher] RIP=00007F4165FB4C26 GS=0000 FS=0000 RSP=00007F4166E61460
[batchlauncher] EFlags=0000000000010202 CS=0033 RBP=00007F4166E61650 ERR=0000000000000004
[batchlauncher] TRAPNO=000000000000000E OLDMASK=0000000000000000 CR2=00007F41AE96D817
[batchlauncher] xmm0 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[batchlauncher] xmm1 534e656c62616e65 (f: 1650552448.000000, d: 1.981380e+93)
[batchlauncher] xmm2 ffffffffffffffff (f: 4294967296.000000, d: -nan)
[batchlauncher] xmm3 120c00b42a1100c6 (f: 705757376.000000, d: 9.683534e-222)
[batchlauncher] xmm4 011700b92b2e01a7 (f: 724435392.000000, d: 2.096455e-303)
[batchlauncher] xmm5 0500990b00b61f00 (f: 11935488.000000, d: 1.395228e-284)
[batchlauncher] xmm6 c60c00b42a1800c6 (f: 706216128.000000, d: -2.773258e+29)
[batchlauncher] xmm7 00b42ab02b050099 (f: 721748096.000000, d: 2.871841e-305)
[batchlauncher] xmm8 2a1100c60c00b42a (f: 201372720.000000, d: 4.633484e-106)
[batchlauncher] xmm9 ffff00ff0000ffff (f: 65535.000000, d: -nan)
[batchlauncher] xmm10 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[batchlauncher] xmm11 000000004d570a3d (f: 1297549824.000000, d: 6.410748e-315)
[batchlauncher] xmm12 000000004a09a025 (f: 1242144768.000000, d: 6.137011e-315)
[batchlauncher] xmm13 000000004b2c0833 (f: 1261176832.000000, d: 6.231042e-315)
[batchlauncher] xmm14 000000004be50e00 (f: 1273302528.000000, d: 6.290950e-315)
[batchlauncher] xmm15 000000004a373e68 (f: 1245134464.000000, d: 6.151782e-315)
[batchlauncher] Module=/opt/ibm/java/jre/lib/amd64/compressedrefs/libj9vm29.so
[batchlauncher] Module_base_address=00007F4165E91000
[batchlauncher] Target=2_90_20201102_458768 (Linux 4.4.0-148-generic)
[batchlauncher] CPU=amd64 (48 logical CPUs) (0x3ef3728000 RAM)
[batchlauncher] ----------- Stack Backtrace -----------
[batchlauncher] (0x00007F4165FB4C26 [libj9vm29.so+0x123c26])
[batchlauncher] (0x00007F4165FB54D3 [libj9vm29.so+0x1244d3])
[batchlauncher] (0x00007F4165F9ACAF [libj9vm29.so+0x109caf])
[batchlauncher] (0x00007F4165FB6BF1 [libj9vm29.so+0x125bf1])
[batchlauncher] (0x00007F4165F9184F [libj9vm29.so+0x10084f])
[batchlauncher] (0x00007F4165F984C6 [libj9vm29.so+0x1074c6])
[batchlauncher] (0x00007F4165F8D8CA [libj9vm29.so+0xfc8ca])
[batchlauncher] (0x00007F4165F8EFC0 [libj9vm29.so+0xfdfc0])
[batchlauncher] (0x00007F4165F8F689 [libj9vm29.so+0xfe689])
[batchlauncher] (0x00007F4165F8FDBE [libj9vm29.so+0xfedbe])
[batchlauncher] (0x00007F4165F84788 [libj9vm29.so+0xf3788])
[batchlauncher] (0x00007F4165F859D6 [libj9vm29.so+0xf49d6])
[batchlauncher] (0x00007F4165F16607 [libj9vm29.so+0x85607])
[batchlauncher] (0x00007F4165F17BBB [libj9vm29.so+0x86bbb])
[batchlauncher] (0x00007F4165F19E01 [libj9vm29.so+0x88e01])
[batchlauncher] (0x00007F4165EA809F [libj9vm29.so+0x1709f])
[batchlauncher] (0x00007F4165EA3C50 [libj9vm29.so+0x12c50])
[batchlauncher] (0x00007F4165F61DB2 [libj9vm29.so+0xd0db2])
[batchlauncher] ---------------------------------------
[batchlauncher] JVMDUMP039I Processing dump event "gpf", detail "" at 2020/12/14 10:49:02 - please wait.
[batchlauncher] JVMDUMP032I JVM requested System dump using '/opt/ibm/Curam/release/buildlogs/core.20201214.231413.421.0001.dmp' in response to an event
[batchlauncher] JVMDUMP010I System dump written to /opt/ibm/Curam/release/buildlogs/core.20201214.231413.421.0001.dmp
[batchlauncher] JVMDUMP032I JVM requested Java dump using '/opt/ibm/Curam/release/buildlogs/javacore.20201214.231413.421.0002.txt' in response to an event
[batchlauncher] JVMDUMP010I Java dump written to /opt/ibm/Curam/release/buildlogs/javacore.20201214.231413.421.0002.txt
[batchlauncher] JVMDUMP032I JVM requested Snap dump using '/opt/ibm/Curam/release/buildlogs/Snap.20201214.231413.421.0003.trc' in response to an event
[batchlauncher] JVMDUMP010I Snap dump written to /opt/ibm/Curam/release/buildlogs/Snap.20201214.231413.421.0003.trc 
[batchlauncher] JVMDUMP032I JVM requested JIT dump using '/opt/ibm/Curam/release/buildlogs/jitdump.20201214.231413.421.0004.dmp' in response to an event

Failed to pull image at Helm install step

Prereqs all followed (i believe)
Images built and pushed to Docker Registry ok
Helm charts prepared, packaged and pushed ok.

Now run:
helm install release2 local-development/spm -f ../static/resources/crc-values.yaml

in separate tab:
kubectl get pods -w
NAME READY STATUS RESTARTS AGE
release1-apps-apply-customsql-mbl2f 0/1 ImagePullBackOff 0 18m
release2-apps-apply-customsql-fc8pb 0/1 ImagePullBackOff 0 53s
releasename-apps-apply-customsql-2bm8l 0/1 ImagePullBackOff 0 22m
release2-apps-apply-customsql-fc8pb 0/1 ErrImagePull 0 56s

(note releasename and release1 were also earlier fails)

Viewing log:
kubectl logs -f pod/release1-apps-apply-customsql-mbl2f
Error from server (BadRequest): container "apply-customsql" in pod "release1-apps-apply-customsql-mbl2f" is waiting to start: trying and failing to pull image

OS = MacOSx Catalina 10.15.7
ChartMuseum installed locally and run like this:
chartmuseum --debug --port=8080 --storage="local" --storage-local-rootdir="./chartstorage"

CRC installed locally:
crc version
CodeReady Containers version: 1.15.0+e317bed
OpenShift version: 4.5.7 (embedded in binary)

Docker version 19.03.12

Documentation on the values.yaml Settings

Kev and I have completed the deployment we have been working on. We struggled a bit with the values.yaml settings that were needed. We got through it by comparing with various values.yaml files from working deployments, but it felt a bit hap-hazard. We know that the configuration settings are documented in the runbook, but it was difficult, in some cases, to put 2 and 2 together to figure out what the documentation was telling us. One suggestion here is to add some comments in the provided YAML files. Also, we feel that a worked through example would be very helpful... perhaps a values.yaml file with a bunch of commented overrides in place, based on an example project name and registry, assuming a deployment of curam, CE, birt and rest. etc. Also we're still a bit confused by some of the main blocks in the values.yaml file, for example the later blocks that are "turned-off" by default using "{}" - still not sure what they are there for (a comment highlighting the need to remove these braces if using any of the subsequent overrides would have been useful - yes I know we should be able to figure that out, and we did, but anything to reduce the cognitive load around this stuff would be appreciated).

In general through the documentation was great, and we are not suggesting here the need for a fundamental change, rather the provision of some worked through examples.

NFS needs mountOptions defining when using AWS EFS

Issue:

When running MQ with the following properties:

  mq:
    version: 9.1.3.0
    # Set to True if running MQ in HA mode
    useConnectionNameList: true
    tlsSecretName: 'spm-dev01-mq-secret'
    queueManager:
      name: 'QM1'
      secret:
        # name is the secret that contains the 'admin' user password and the 'app' user password to use for messaging
        name: ''
        # adminPasswordKey is the secret key that contains the 'admin' user password
        adminPasswordKey: 'adminPasswordKey'
        # appPasswordKey is the secret key that contains the 'admin' user password
        appPasswordKey: 'appPasswordKey'
    metrics:
      enabled: false
    resources: {}
    multiInstance:
      cephEnabled: false
      storageClassName: 'nfs'
      nfsEnabled: true
      nfsIP: 'fs-xxxxxxxx.efs.eu-west-2.amazonaws.com'
      nfsFolder: 'spm-dev01'

When the curam-mq and rest-mq pods stasrt they connect mount the AWS EFS file system, and Kubernetes(EKS) returns the following error:

  Warning  FailedMount  2m31s  kubelet, ip-100-64-18-180.eu-west-2.compute.internal  MountVolume.SetUp failed for volume "spm-dev01-curam-pv-qm" : mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/11e52fd5-fb8c-40b6-9cf7-b252f1f4e1ac/volumes/kubernetes.io~nfs/spm-dev01-curam-pv-qm --scope -- mount -t nfs -o hard,nfsvers=4.1,noresvport,retrans=2,rsize=1048576,timeo=600,wsize=1048576 fs-xxxxxxx.efs.eu-west-2.amazonaws.com:/spm-dev01/curam /var/lib/kubelet/pods/11e52fd5-fb8c-40b6-9cf7-b252f1f4e1ac/volumes/kubernetes.io~nfs/spm-dev01-curam-pv-qm
Output: Running scope as unit run-12552.scope.
mount.nfs: Connection timed out

Solution:

adding mountOptions with the following properties as recommended here seems to resolve the issue

For this to be portable or able to be changed for different servicer providers I've added the following code to mqserver/templates/pv-data.yaml, mqserver/templates/pv-logs.yaml, mqserver/templates/pv-qm.yaml

  {{- if $.Values.global.mq.multiInstance.nfsMountOptions }}
  mountOptions:
    {{- range $.Values.global.mq.multiInstance.nfsMountOptions }}
    - {{ . | quote }}
    {{- end }}
  {{- end}}

adding the following element, to the set values then sets the mountOptions:

      nfsMountOptions:
        - "nfsvers=4.1"
        - "rsize=1048576"
        - "wsize=1048576"
        - "hard"
        - "timeo=600"
        - "retrans=2"
        - "noresvport"

Generic SAML2 integration vs ISAM, multiple IdPs

Regarding the ISAM integration functionality, which seems to be based on generic SAML2 integration feature (samlWeb-2.0) from Websphere Liberty:

  • Do you see any reason why the ISAM config would not work with other SAML2 IdPs? Have you tested with other IdP solutions?
  • Is there a way to configure multiple SAML2 based IdPs, in a scenario where users are managed in different IAM solutions?

internal.docker.repository causes building of CE and StaticContent Dockerfiles to fail

Issue:

In the Dockerfiles for CE and StaticContent the FROM statement In both files is currently set to:

ARG DOCKER_REGISTRY="internal.docker.repository"

#Final
FROM ${DOCKER_REGISTRY}/ubi7/ibm-http-server:${HTTP_VERSION}

this repository can't be resolved and is inconsistent with the Liberty, batch and MQ docker files.

Solution:

This should be reverted back to:

FROM ibmcom/ibm-http-server:${HTTP_VERSION}

helm install spm fails on DB connection

When I execute the helm install command to deploy SPM, the first job fails on connecting to DB2. See log below.

From my terminal, I can connect to DB2 successfully. A google search suggests that DB2 be configured to enable tcpip for the DB2COMM config parameter, which is already set. From my local SPM installation, a build configtest also connects to the DB.

Any suggestions?

`Unable to locate tools.jar. Expected to find it in /opt/ibm/java/lib/tools.jar
Buildfile: /opt/ibm/Curam/release/CuramSDEJ/util/loadsql.xml

check.parameter.type:

check.db.type:

check.props.inside.file:

check.curam.environment.bindings.location.isset:

check.curam.environment.bindings.location.valid:

run.database.db2:

ora.use.servicename:

run.database.ora:

run.database.zos:

get.decrypted.db.password:

load:

targetDirectory:

BUILD FAILED
/opt/ibm/Curam/release/CuramSDEJ/util/loadsql.xml:40: The following error occurred while executing this line:
/opt/ibm/Curam/release/CuramSDEJ/util/loadsql.xml:53: com.ibm.db2.jcc.am.DisconnectNonTransientConnectionException: [jcc][t4][2043][11550][4.21.29] Exception java.net.ConnectException: Error opening socket to server localhost/127.0.0.1 on port 50,000 with message: Connection refused (Connection refused). ERRORCODE=-4499, SQLSTATE=08001
at com.ibm.db2.jcc.am.kd.a(kd.java:338)
at com.ibm.db2.jcc.am.kd.a(kd.java:435)
at com.ibm.db2.jcc.t4.ac.a(ac.java:440)
at com.ibm.db2.jcc.t4.ac.(ac.java:96)
at com.ibm.db2.jcc.t4.a.b(a.java:366)
at com.ibm.db2.jcc.t4.b.newAgent_(b.java:2076)
at com.ibm.db2.jcc.am.Connection.initConnection(Connection.java:812)
at com.ibm.db2.jcc.am.Connection.(Connection.java:754)
at com.ibm.db2.jcc.t4.b.(b.java:339)
at com.ibm.db2.jcc.DB2SimpleDataSource.getConnection(DB2SimpleDataSource.java:233)
at com.ibm.db2.jcc.DB2SimpleDataSource.getConnection(DB2SimpleDataSource.java:199)
at com.ibm.db2.jcc.DB2Driver.connect(DB2Driver.java:482)
at com.ibm.db2.jcc.DB2Driver.connect(DB2Driver.java:116)
at org.apache.tools.ant.taskdefs.JDBCTask.getConnection(JDBCTask.java:364)
at org.apache.tools.ant.taskdefs.SQLExec.getConnection(SQLExec.java:953)
at org.apache.tools.ant.taskdefs.SQLExec.execute(SQLExec.java:649)
at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:292)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
at java.lang.reflect.Method.invoke(Method.java:508)
at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:99)
at org.apache.tools.ant.Task.perform(Task.java:350)
at org.apache.tools.ant.Target.execute(Target.java:449)
at org.apache.tools.ant.Target.performTasks(Target.java:470)
at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1391)
at org.apache.tools.ant.helper.SingleCheckExecutor.executeTargets(SingleCheckExecutor.java:36)
at org.apache.tools.ant.Project.executeTargets(Project.java:1254)
at org.apache.tools.ant.taskdefs.Ant.execute(Ant.java:437)
at org.apache.tools.ant.taskdefs.CallTarget.execute(CallTarget.java:106)
at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:292)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
at java.lang.reflect.Method.invoke(Method.java:508)
at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:99)
at org.apache.tools.ant.Task.perform(Task.java:350)
at org.apache.tools.ant.Target.execute(Target.java:449)
at org.apache.tools.ant.Target.performTasks(Target.java:470)
at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1391)
at org.apache.tools.ant.Project.executeTarget(Project.java:1364)
at org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:41)
at org.apache.tools.ant.Project.executeTargets(Project.java:1254)
at org.apache.tools.ant.Main.runBuild(Main.java:830)
at org.apache.tools.ant.Main.startAnt(Main.java:223)
at org.apache.tools.ant.launch.Launcher.run(Launcher.java:284)
at org.apache.tools.ant.launch.Launcher.main(Launcher.java:101)
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:380)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:236)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:218)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:403)
at java.net.Socket.connect(Socket.java:682)
at com.ibm.db2.jcc.t4.w.run(w.java:49)
at java.security.AccessController.doPrivileged(AccessController.java:734)
at com.ibm.db2.jcc.t4.ac.a(ac.java:426)
... 42 more

Total time: 16 seconds`

Helm dependency for ce-app needs conditional adding

Describe the bug
Under the packaging-the-helm-charts section of the documentation if the instructions are followed, pushing of the helm charts fails with the error Error: ce-app chart not found in repo http://chartmuseum.spm-poc.scotgov-dt.internal when the instructions are followed. This is because the ce-app dependency doesn't have a conditional set to false by default and the pushing of the ce-app chart isn't mentioned in the instructions.

A conditional should be added to the code to allow the dependency to be disabled for users that don't need it, for example:

- name: ce-app
  version: "~1.0.0"
  repository: "@local-development"
  condition: global.ceApp.enabled

To Reproduce
Steps to reproduce the behavior:

  1. run the following commands:
helm repo add local-development ${CHART_REPO}
cd $SPM_CONTAINERISATION_HOME/helm-charts
helm push apps local-development
helm push mqserver local-development
helm push configmaps local-development
helm push xmlserver local-development
helm push batch local-development
helm push ihs local-development
helm repo update
helm dep up $SPM_CONTAINERISATION_HOME/helm-charts/spm/

Expected behavior
Helm packages dependencies to upload them to chartmuseum.

Screenshots
Logs from jenkins build server:

+ ./build-charts.sh
"local-development" has been added to your repositories
Pushing apps-2.0.0.tgz to local-development...
Done.
Pushing mqserver-1.2.0.tgz to local-development...
Done.
Pushing configmaps-1.2.0.tgz to local-development...
Done.
Pushing xmlserver-1.1.1.tgz to local-development...
Done.
Pushing batch-1.1.1.tgz to local-development...
Done.
Pushing ihs-2.0.0.tgz to local-development...
Done.
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "local-development" chart repository
Update Complete. ⎈ Happy Helming!⎈ 
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "local-development" chart repository
Update Complete. ⎈Happy Helming!⎈
Error: ce-app chart not found in repo http://chartmuseum.spm-poc.scotgov-dt.internal

Please complete the following information:
* OS: CentOS Linux release 7.6.1810 (Core)
* Docker Version: Docker version 18.09.5, build e8ff056
* Minikube Version: N/a
* Ant Version: 1.10.6
* Java Version: Java 8 (version packaged with Liberty)
* Liberty Version: 19.0.0.12-full-java8-ibmjava
* Cúram SPM Version: 7.0.10

Additional context

Currently our build process uses Helm 3 which has the following bug around conditionally including dependencies: helm/helm#5780 However the issue is still fixable by commenting our the ce-app and openldap dependencies in the spm/requirements.yaml file like:

+ ./build-charts.sh
"local-development" has been added to your repositories
Pushing apps-2.0.0.tgz to local-development...
Done.
Pushing mqserver-1.2.0.tgz to local-development...
Done.
Pushing configmaps-1.2.0.tgz to local-development...
Done.
Pushing xmlserver-1.1.1.tgz to local-development...
Done.
Pushing batch-1.1.1.tgz to local-development...
Done.
Pushing ihs-2.0.0.tgz to local-development...
Done.
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "local-development" chart repository
Update Complete. ⎈ Happy Helming!⎈ 
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "local-development" chart repository
Update Complete. ⎈Happy Helming!⎈
Saving 6 charts
Downloading apps from repo http://chartmuseum.spm-poc.scotgov-dt.internal
Downloading batch from repo http://chartmuseum.spm-poc.scotgov-dt.internal
Downloading configmaps from repo http://chartmuseum.spm-poc.scotgov-dt.internal
Downloading ihs from repo http://chartmuseum.spm-poc.scotgov-dt.internal
Downloading mqserver from repo http://chartmuseum.spm-poc.scotgov-dt.internal
Downloading xmlserver from repo http://chartmuseum.spm-poc.scotgov-dt.internal
Deleting outdated charts
Pushing spm-1.2.0.tgz to local-development...
Done.
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "local-development" chart repository
Update Complete. ⎈ Happy Helming!⎈ 

Log Collection
see Screenshots

Database connection authorization Failure in Producer log

We can get to the Curam login screen, but when we attempt to login, we just get a white screen. Looking in the logs of a producer pod we see the error: [ERROR ] Exception is: [jcc][t4][2013][11249][4.28.11] Connection authorization failure occurred. Reason: User ID or Password invalid. ERRORCODE=-4214, SQLSTATE=28000 DSRA0010E: SQL State = 28000, Error Code = -4,214.

We believe that we have setup the relevant database properties correctly in our values.yaml file, and have encrypted the password correctly. We are struggling to debug this further and get to the root of the problem. We are not sure what the Helm install scripts are doing with these properties. If we knew that we could dig around further. The log file from the producer pod is attached here
producer.log
.

ESDC HFP - Technical Exception Encountered in IBM MQ with OpenShift Containerized Environment

We are supporting ESDC's High Fidelity Prototype Project which is aimed to demonstrate Curam in OpenShift Containairized environment. Our SI is following the runbook provided by PD to build and deploy the Curam application. The following technical exception is encountered for a simple Universal Access application being submitted and further processed by the REST Consumer. This is kind of blocking the High Fidelity Prototype project which has huge implications in restoring Customer confidence in using Curam for longer term in upcoming ESDC projects.

Support Case WH00012069 is also raised to track this issue.
https://ibmwatsonhealth.force.com/mysupport/s/case/5001U00000lEOFZQA4/esdc-hfp-technical-exception-encountered-in-ibm-mq-with-openshift-containerized-environment?openCase=true

[INFO ] FFDC1015I: An FFDC Incident has been created: "javax.transaction.xa.XAException: The method 'xa_start' has failed with errorCode '-6'. com.ibm.tx.jta.impl.JTAXAResourceImpl.start 307" at ffdc_21.04.28_20.53.01.0.log
[INFO ] FFDC1015I: An FFDC Incident has been created: "javax.transaction.xa.XAException: The method 'xa_start' has failed with errorCode '-6'. com.ibm.tx.jta.impl.RegisteredResources.startRes 1053" at ffdc_21.04.28_20.53.01.1.log
[ERROR ] WTRN0078E: An attempt by the transaction manager to call start on a transactional resource has resulted in an error. The error code was XAER_PROTO. The exception stack trace follows: javax.transaction.xa.XAException: The method 'xa_start' has failed with errorCode '-6'.
at com.ibm.mq.jmqi.JmqiXAResource.start(JmqiXAResource.java:980)
at com.ibm.mq.connector.xa.XARWrapper.start(XARWrapper.java:680)
at com.ibm.ws.Transaction.JTA.JTAResourceBase.start(JTAResourceBase.java:121)
at [internal classes]
at com.ibm.mq.connector.inbound.AbstractWorkImpl.run(AbstractWorkImpl.java:210)
at com.ibm.ws.jca.inbound.security.JCASecurityContextService.runInInboundSecurityContext(JCASecurityContextService.java:49)
at [internal classes]
[INFO ] FFDC1015I: An FFDC Incident has been created: "javax.transaction.SystemException: XAResource start association error:XAER_PROTO com.ibm.tx.jta.impl.RegisteredResources.enlistResource 523" at ffdc_21.04.28_20.53.01.2.log
[INFO ] FFDC1015I: An FFDC Incident has been created: "javax.transaction.SystemException: XAResource start association error:XAER_PROTO com.ibm.tx.jta.TransactionImpl.enlistResource 2042" at ffdc_21.04.28_20.53.01.3.log
[INFO ] FFDC1015I: An FFDC Incident has been created: "javax.transaction.SystemException: XAResource start association error:XAER_PROTO com.ibm.ws.ejbcontainer.mdb.MessageEndpointBase.beforeDelivery 1244" at ffdc_21.04.28_20.53.01.4.log
[INFO ] Message delivery to an MDB 'curam.util.jms.MDBProxyDPEnactmentMDB_49ac78a4@99d46d8f(BeanId(CuramServerCode#coreinf-ejb.jar#DPEnactmentMDB, null))' failed with exception: 'beforeDelivery failure'.
[INFO ] FFDC1015I: An FFDC Incident has been created: "com.ibm.websphere.csi.CSITransactionRolledbackException: Transaction marked rollbackonly com.ibm.ejs.container.EJSContainer.postInvoke 2326" at ffdc_21.04.28_20.53.01.5.log
[INFO ] FFDC1015I: An FFDC Incident has been created: "javax.ejb.TransactionRolledbackLocalException: ; nested exception is: com.ibm.websphere.csi.CSITransactionRolledbackException: Transaction marked rollbackonly com.ibm.ws.ejbcontainer.mdb.MessageEndpointBase.afterDelivery 1280" at ffdc_21.04.28_20.53.01.6.log
[INFO ] WTRN0006W: Transaction 000001791A41120D00000001681E949CCC38DCA6F5D43D42714F62BDC24547D1E7EFE6F5000001791A41120D00000001681E949CCC38DCA6F5D43D42714F62BDC24547D1E7EFE6F500000001 has timed out after 180 seconds.
[INFO ] WTRN0124I: When the timeout occurred the thread with which the transaction is, or was most recently, associated was Thread[Default Executor-thread-6,5,Default Executor Thread Group]. The stack trace of this thread when the timeout occurred was:
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:218)
com.ibm.ws.threading.internal.BoundedBuffer.waitGet_(BoundedBuffer.java:176)
com.ibm.ws.threading.internal.BoundedBuffer.take(BoundedBuffer.java:647)
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1085)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
java.lang.Thread.run(Thread.java:822)

And here is what the MQ does:

2021-04-28T20:53:01.444Z AMQ6125E: An internal IBM MQ error has occurred.
2021-04-28T20:53:01.444Z AMQ6184W: An internal IBM MQ error has occurred on queue manager QM1.

Unable to build S2I Core base image from source

While building the S2I Core base image from source, the base image build fails on the tagging phase:

(base) Mikkos-MacBook-Pro-3:s2i-base-container makelm$ make build TARGET=rhel8 VERSIONS=core VERBOSE=1
Makefile:15: warning: overriding commands for target `core'
common/common.mk:87: warning: ignoring old commands for target `core'
Makefile:15: warning: overriding commands for target `core'
common/common.mk:87: warning: ignoring old commands for target `core'
VERSIONS="core" SKIP_SQUASH=1 UPDATE_BASE= OS=rhel8 CLEAN_AFTER= DOCKER_BUILD_CONTEXT=. OPENSHIFT_NAMESPACES="" CUSTOM_REPO="" REGISTRY="""" /usr/bin/env bash common/build.sh
-> Version core: building image from 'Dockerfile.rhel8' ...
-> Pulling image registry.access.redhat.com/ubi8:latest before building image from Dockerfile.rhel8.
The image registry.access.redhat.com/ubi8:latest is already pulled.
[+] Building 0.2s (10/10) FINISHED                                                                                   
 => [internal] load build definition from Dockerfile.rhel8                                                      0.0s
 => => transferring dockerfile: 43B                                                                             0.0s
 => [internal] load .dockerignore                                                                               0.0s
 => => transferring context: 2B                                                                                 0.0s
 => [internal] load metadata for registry.access.redhat.com/ubi8:latest                                         0.0s
 => [1/5] FROM registry.access.redhat.com/ubi8:latest                                                           0.0s
 => [internal] load build context                                                                               0.0s
 => => transferring context: 573B                                                                               0.0s
 => CACHED [2/5] RUN INSTALL_PKGS="bsdtar   findutils   groff-base   glibc-locale-source   glibc-langpack-en    0.0s
 => CACHED [3/5] COPY ./root/ /                                                                                 0.0s
 => CACHED [4/5] WORKDIR /opt/app-root/src                                                                      0.0s
 => CACHED [5/5] RUN rpm-file-permissions &&   useradd -u 1001 -r -g 0 -d /opt/app-root/src -s /sbin/nologin    0.0s
 => exporting to image                                                                                          0.0s
 => => exporting layers                                                                                         0.0s
 => => writing image sha256:2001f8f3115e301855c50ee940924cd688e95411e968fd2553663f6fe38468ad                    0.0s
VERSIONS="core" SKIP_SQUASH=1 UPDATE_BASE= OS=rhel8 CLEAN_AFTER= DOCKER_BUILD_CONTEXT=. OPENSHIFT_NAMESPACES="" CUSTOM_REPO="" REGISTRY="""" /usr/bin/env bash common/tag.sh
Error: No such object: 
make[1]: *** [core] Error 1
make: *** [build-serial] Error 2

Before running the build, I have installed the coreutils+md2man and updated my path to have the gnuutils to have priority.

ESDC HFP - Technical Exception Encountered since the May 2021 update when using Helm

We support ESDC's High Fidelity Prototype Project whose goal is to successfully demonstrate Curam in an OpenShift environment. Our SI incorporates this repo, including charts, directly into a pipeline whenever we build and deploy the Curam application. Since the May 2021 update to the SPM-kubernetes repo, builds have been failing with:

Error: template: spm/charts/apps/templates/configmaps/configmap-sessions.yaml:38:12: executing "spm/charts/apps/templates/configmaps/configmap-sessions.yaml" at <include "apps.dsprops.fragment" (list . "CURAMSESSDB")>: error calling include: template: spm/charts/apps/templates/_database.tpl:50:10: executing "apps.dsprops.fragment" at <include "apps.oracleurl" .>: error calling include: template: spm/charts/apps/templates/_database.tpl:68:24: executing "apps.oracleurl" at <.Values.global.database>: can't evaluate field Values in type []interface {}

Appreciate your support, as this error fails our builds and has delayed progress on current sprints. Available to provide further info upon request

db2 dependency in spm/requirements.yaml

Describe the bug
In SPM Umbrella chart under helm-charts/spm/requirements.yaml there is a dependency on db2 as follows:

- name: db2
  version: "~1.3.0"
  repository: "@local-development"

There is no valid charts for this dependency and therefore it should be removed as it causes the following issue when packaging the spm umbrella chart.

Error: db2 chart not found in repo http://chartmuseum.apps-crc.testing

Duplicate ihs elements in spm/values.yaml

Describe the bug
In SPM Umbrella chart under helm-charts/spm/values.yaml the ihs element is set twice with similar keys present

  ihs:
    runAs: 1000
    replicaCount: 1
    readinessPath: /CuramStatic
    ingressPath: /CuramStatic/*
    resources: {}

and further down:

  ihs:
    serviceType: NodePort
    config:
      readinessPath: /CuramStatic
      ingressPath: /CuramStatic/*

This is slightly confusing and the second sections ingressPath element isn't picked up by the ingress.yaml template:

          - path: {{ .Values.global.ihs.ingressPath | default "/CuramStatic" }}
            backend:
              serviceName: {{ $.Release.Name }}-ihs
              servicePort: http

Could not allocate resources as specified

I use Mac, and not able to allocate 4 cpus and 8 GB RAM. Do we really need that much to run Curam?. The following downgraded configuration worked. As per the instructions, I always get a timeout.

minikube start --vm-driver=vmware --cpus 2 --memory 4G --insecure-registry "192.168.3.0/16" --disk-size='20G' --kubernetes-version v1.16.7

helm install fails waiting for response

The runbook step to execute helm install using

helm install releasename local-development/spm

is failing on a timeout. There doesn't appear to be detailed logs I can find. Please see the attached screenshot of the dashboard. it shows a timeout event on the customsql job. However, when I Execute a curl command from the command line I get a response immediately.

[ibmadmin@localserver helm-charts]$ curl http://minikube.local:5000/v2
<a href="/v2/">Moved Permanently</a>.

If I attempt to install again, I get this error message.

[ibmadmin@localserver helm-charts]$ helm install spm-v1 local-development/spm
Error: cannot re-use a name that is still in use

Please advise.

image

Certificate Error when logging in to the Open Shift Registry

I am following the SPM Runbook for Code Ready Containers: https://ibm.github.io/spm-kubernetes/prereq/openshift/codeready-containers

When executing the following step:

docker login -u kubeadmin -p $(oc whoami -t) $(oc registry info --public)

I get the following Certificate error:

Error response from daemon: Get https://default-route-openshift-image-registry.apps-crc.testing/v1/users/: x509: certificate signed by unknown authority

Let me know if you have any thoughts or advice. Thanks!

Can't push to Registry - getting error "connect: connection refused" from minikube on RHEL

I am following the SPM Runbook and have built my Docker images. I am using Minikube and I'm trying to push the images to the local repository. When I do so I get the following error:


[vnc@bluffing1 ~]$ docker push $DOCKER_REGISTRY/$PROJECT/xmlserver:latest
The push refers to repository [minikube.local:5000/minikubemtess/xmlserver]
Get http://minikube.local:5000/v2/: dial tcp 172.17.0.2:5000: connect: connection refused

Any advice?

Background info:

OS: RHEL 7.9
Docker Version: 20.10.2
Minikube Version: v1.13.1
Ant Version: 1.10.6 
Java Version: 1.8.0_201
Liberty  Version: 20.0.0.9
Cúram SPM Version: 7.0.11

Other info from my environment:


[vnc@bluffing1 ~]$ env | egrep "DOCKER_REGISTRY|PROJECT"
PROJECT=minikubemtess
DOCKER_REGISTRY=minikube.local:5000
[vnc@bluffing1 ~]$ minikube ip
172.17.0.2
[vnc@bluffing1 ~]$ minikube addons list
|-----------------------------|----------|--------------|
|         ADDON NAME          | PROFILE  |    STATUS    |
|-----------------------------|----------|--------------|
| ambassador                  | minikube | disabled     |
| csi-hostpath-driver         | minikube | disabled     |
| dashboard                   | minikube | disabled     |
| default-storageclass        | minikube | enabled \u2705   |
| efk                         | minikube | disabled     |
| freshpod                    | minikube | disabled     |
| gcp-auth                    | minikube | disabled     |
| gvisor                      | minikube | disabled     |
| helm-tiller                 | minikube | disabled     |
| ingress                     | minikube | enabled \u2705   |
| ingress-dns                 | minikube | disabled     |
| istio                       | minikube | disabled     |
| istio-provisioner           | minikube | disabled     |
| kubevirt                    | minikube | disabled     |
| logviewer                   | minikube | disabled     |
| metallb                     | minikube | disabled     |
| metrics-server              | minikube | disabled     |
| nvidia-driver-installer     | minikube | disabled     |
| nvidia-gpu-device-plugin    | minikube | disabled     |
| olm                         | minikube | disabled     |
| pod-security-policy         | minikube | disabled     |
| registry                    | minikube | enabled \u2705   |
| registry-aliases            | minikube | disabled     |
| registry-creds              | minikube | disabled     |
| storage-provisioner         | minikube | enabled \u2705   |
| storage-provisioner-gluster | minikube | disabled     |
| volumesnapshots             | minikube | disabled     |
|-----------------------------|----------|--------------|
[vnc@bluffing1 bin]$ cat /etc/docker/daemon.json
{
  "insecure-registries": [
    "172.17.0.2/16"
  ]
}

NetworkManager issues when configuring CRC

I'm following the instructions to install CodeReady Containers (https://ibm.github.io/spm-kubernetes/prereq/openshift/codeready-containers/#creating-a-crc-project). I'm using an RHEL VM, and when I attempt the "crc start" command I get the following error regarding NetworkManager:

[vnc@bluffing1 ~]$ crc start
INFO Checking if running as non-root              
INFO Checking if podman remote executable is cached 
INFO Checking if admin-helper executable is cached 
INFO Checking minimum RAM requirements            
INFO Checking if Virtualization is enabled        
INFO Checking if KVM is enabled                   
INFO Checking if libvirt is installed             
INFO Checking if user is part of libvirt group    
INFO Checking if libvirt daemon is running        
INFO Checking if a supported libvirt version is installed 
INFO Checking if crc-driver-libvirt is installed  
INFO Checking if systemd-networkd is running      
INFO Checking if NetworkManager is installed      
INFO Checking if NetworkManager service is running 
INFO Checking if /etc/NetworkManager/conf.d/crc-nm-dnsmasq.conf exists 
File not found: /etc/NetworkManager/conf.d/crc-nm-dnsmasq.conf: stat /etc/NetworkManager/conf.d/crc-nm-dnsmasq.conf: no such file or directory

Can someone advise how /etc/NetworkManager/conf.d/crc-nm-dnsmasq.conf file should be configured?

There is some info here but I assume that is just an example? Not sure what values I should put

Please clarify Open Shift CRC installation documentation in the SPM Runbook

I am following the SPM Runbook and I want to install Openshift CRC (please see https://ibm.github.io/spm-kubernetes/prereq/openshift/codeready-containers). I clicked that that latest release link to download Code Ready Containers, but the page catches me off guard. I was expecting to see the software listed/download options. Instead I had to poke around a bit, and I finally realized/discovered what I needed was under the "Sandbox" tab. We should update the documentation to clarify that / eliminate confusion for the developer/customer.

It looks like this when you click latest release link:
image

The installer and pull secret are actually under the Sandbox tab:

image

Build errors with Liberty EAR Build (gss.jar missing)

I am running a automatized CI build for SPM 7.0.11 codebase (project codebase) and experiencing following Ant build error with the libertyEAR Ant target:

dispmsg:
     [echo] 04:04:32 Starting buildEAR
     [echo] Using properties file '/home/jenkins/agent/workspace/masked-project/masked-project-spm-application-build-pipeline/EJBServer/project/properties/AppServer.properties'.

check.properties.exists:
    [mkdir] Created dir: /home/jenkins/agent/workspace/masked-project/masked-project-spm-application-build-pipeline/EJBServer/build/ear/temp
    [mkdir] Created dir: /home/jenkins/agent/workspace/masked-project/masked-project-spm-application-build-pipeline/EJBServer/build/ear/combined
    [mkdir] Created dir: /home/jenkins/agent/workspace/masked-project/masked-project-spm-application-build-pipeline/EJBServer/build/ear/combined/META-INF
    [mkdir] Created dir: /home/jenkins/agent/workspace/masked-project/masked-project-spm-application-build-pipeline/EJBServer/build/ear/temp/extrajars
    [mkdir] Created dir: /home/jenkins/agent/workspace/masked-project/masked-project-spm-application-build-pipeline/EJBServer/build/ear/WLP

BUILD FAILED
/home/jenkins/agent/workspace/masked-project/masked-project-spm-application-build-pipeline/CuramSDEJ/bin/build.xml:147: The following error occurred while executing this line:
/home/jenkins/agent/workspace/masked-project/masked-project-spm-application-build-pipeline/CuramSDEJ/bin/app_buildEAR.xml:170: Warning: Could not find file /home/jenkins/agent/workspace/masked-project/masked-project-spm-application-build-pipeline/EJBServer/tools/search/GlobalSearchServer_lib/gss.jar to copy.

As per documentation GSS should be deprecated in latest SPM releases, so why is it referenced by the Ant build?

CE Ingress controller Rules created when the Application isn't deployed

Describe the bug
In SPM Umbrella chart under helm-charts/spm/values.yaml when ceApp has the following values set:

  ceApp:
    enabled: false
    replicaCount: 1
    imageLibrary: ''
    imageName: ce-ihs
    imageTag: latest
    ingressPath: /universal/*
    resources: {}

The helm charts still produce the ingress controller rules, this causes the ingress controller provisioning to fail. this is due to this if statement in helm-charts/spm/templates/ingress.yaml:

          {{- if .Values.global.ceApp.imageTag }}
          - path: {{ .Values.global.ceApp.ingressPath | default "/universal" }}
            backend:
              serviceName: {{ $.Release.Name }}-ce-app
              servicePort: http
          {{- end }}

shouldn't this be set to?:

          {{- if .Values.global.ceApp.enabled }}
          - path: {{ .Values.global.ceApp.ingressPath | default "/universal" }}
            backend:
              serviceName: {{ $.Release.Name }}-ce-app
              servicePort: http
          {{- end }}

To Reproduce
Steps to reproduce the behavior:

  1. Edit helm with the ceApp section attached above
  2. Run helm install
  3. Check ingress controller config which should include. the following rule:
            {
              "path": "/universal",
              "backend": {
                "serviceName": "spm-dev01-ce-app",
                "servicePort": "http"
              }

Since that service is not installed:

[matt.smithson@ssvc-c-403 dashboard]$ kubectl get service
NAME                       TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                         AGE
spm-dev01-apps-curam       ClusterIP   172.20.35.250    <none>        8443/TCP                        5h1m
spm-dev01-apps-rest        ClusterIP   172.20.103.175   <none>        8443/TCP                        5h1m
spm-dev01-ihs              NodePort    172.20.120.160   <none>        8443:31010/TCP                  5h1m
spm-dev01-mqserver-curam   NodePort    172.20.254.199   <none>        9443:30411/TCP,1414:31859/TCP   137m
spm-dev01-mqserver-rest    NodePort    172.20.247.88    <none>        9443:30562/TCP,1414:32102/TCP   137m
spm-dev01-xmlserver        ClusterIP   172.20.16.199    <none>        1800/TCP                        5h1m

ingress controller setup fails.

Expected behavior
Helm charts should not produce ingress rules for components that aren't enabled.

Screenshots
See logs above

Please complete the following information:
* OS: AWS EKS 1.14
* Docker Version: Docker version 18.09.5, build e8ff056
* Minikube Version: N/a
* Ant Version: 1.10.6
* Java Version: Java 8 (version packaged with Liberty)
* Liberty Version: 19.0.0.12-full-java8-ibmjava
* Cúram SPM Version: 7.0.10

Log Collection

See above.

Producer and consumer pods fail to stabilize initially

Following symptoms appear on minikube cluster v1.15.1:

When the SPM Helm chart in installed and the different deployments are created on k8s, the SPM producer and consumer pods fail to start initially and only after few rounds of re-attempts the stabilize to running state. On the failed start attempts, following type of errors can be observed on Pod Events:

MountVolume.SetUp failed for volume "ejb-bindings" : failed to sync configmap cache: timed out waiting for the condition

Same error is repeated for all volumes required by the pod.

image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.