Describe the bug Readiness & liveness probes failed. Connecti

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

[BUG] about helm-charts HOT 10 CLOSED

mailu commented on May 5, 2024

[BUG]

from helm-charts.

Comments (10)

micw commented on May 5, 2024

Probably not a subnet issue. Check the logs of the pods. If there are no errors, get a shell to the pods and check if the services respond at the given ports.

from helm-charts.

zbagdzevicius commented on May 5, 2024

Probably not a subnet issue. Check the logs of the pods. If there are no errors, get a shell to the pods and check if the services respond at the given ports.

I checked pods and there is no errors but still nothing works properly.
Thank you for your suggestion. I will check if services responds.

from helm-charts.

micw commented on May 5, 2024

Any results?

from helm-charts.

zbagdzevicius commented on May 5, 2024

@micw i have some results on further investigation. I can access admin dashboard and login and roundcube login page after i configured custom ingress-nginx loadbalancer that points directly to required services to investigate if that's an ingress problem. When i try to login to roundcube i get error that say "connection to database failed" I tried to configure to use external MySQL database but still same error occurs. My database deployment ran successfully without any errors or warnings, . I also have warnings on pods that have 80/443 ports "liveliness probe: connection refused". I can access them from outside. But as i can see they are not accessible from inside the pods itself. I even tried doing everything without helm-charts. But i experienced same problem that i service is unavailable from outside same as with default helm-charts configuration. Existing volume claim used successfully. One related problem that i have resolved was cert-manager health check ( same as liveliness probe ). I resolved that by modifying ingress-nginx configuration to use cluster external traffic policy instead of local so that my pods could call each others without. I spent 20 hours on this unsuccessfully.

from helm-charts.

micw commented on May 5, 2024

One thing I just saw is "tlsFlavor: letsencrypt" - that won't work with certmanager. The option was added by a PR recently. Unfortunately the doc was a bit missleading - I updated it a few days ago. You need to use the default "cert" here.
Probably not the only issue in your setup but at least one.

from helm-charts.

zbagdzevicius commented on May 5, 2024

@micw Ok. I will try that tomorrow and will leave a comment about situation and how this is going on. Thank you.

from helm-charts.

micw commented on May 5, 2024

BTW, which cloud provider are you using?

from helm-charts.

zbagdzevicius commented on May 5, 2024

@micw Still failed to deploy successfully. I'm not using any cloud provider. Everything I do is on baremetal ( 5-10x cheaper ) - Microk8s + addons ( metallb, dns, storage, helm ). By the way everything I have deployed worked OK including cert-manager. I think that the problem could be the ingress configurations.

from helm-charts.

micw commented on May 5, 2024

I see. Nevertheless, I'll close this issue since it's not a report of a particular bug nor a feature request. The issue tracker is not the right place to debug your installation. Please use the chat. You will find me there as well as others who deploy an various k8s flavors.

from helm-charts.

lmcdasm commented on May 5, 2024

Hello

i have the same issue pulling from the latest chart today.

steps:
helm repo add
helm template -name mail-service mailu/mailu -f values.xml >> my manifest.yaml

updated the SUBNETs that are there from 10.42.0.0./16 to our values.

kubectl apply -f my_manifest.yaml

i get the same issue as the OP, we can see that admin is doing an liveness probe and then getting hung on setting up SQL DB.

here is the logs from Admin
[SQL: SELECT domain.created_at AS domain_created_at, domain.updated_at AS domain_updated_at, domain.comment AS domain_comment, domain.name AS domain_name, domain.max_users AS domain_max_users, domain.max_aliases AS domain_max_aliases, domain.max_quota_bytes AS domain_max_quota_bytes, domain.signup_enabled AS domain_signup_enabled
FROM domain
WHERE domain.signup_enabled = 1]
(Background on this error at: http://sqlalche.me/e/e3q8)
10.196.225.51 - - [14/May/2021:22:36:39 +0000] "GET /ui/login HTTP/1.1" 500 290 "-" "kube-probe/1.18"
[2021-05-14 22:36:44,897] ERROR in app: Exception on /ui/login [GET]
Traceback (most recent call last):
File "/usr/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1243, in _execute_context
self.dialect.do_execute(
File "/usr/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 552, in do_execute
cursor.execute(statement, parameters)
sqlite3.OperationalError: no such table: domain

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/lib/python3.8/site-packages/flask/app.py", line 2292, in wsgi_app
response = self.full_dispatch_request()
File "/usr/lib/python3.8/site-packages/flask/app.py", line 1815, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/lib/python3.8/site-packages/flask/app.py", line 1718, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/usr/lib/python3.8/site-packages/flask/_compat.py", line 35, in reraise
raise value
File "/usr/lib/python3.8/site-packages/flask/app.py", line 1813, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/lib/python3.8/site-packages/flask/app.py", line 1799, in dispatch_request
return self.view_functionsrule.endpoint
File "/app/mailu/ui/views/base.py", line 26, in login
return flask.render_template('login.html', form=form)
File "/usr/lib/python3.8/site-packages/flask/templating.py", line 133, in render_template
ctx.app.update_template_context(context)
File "/usr/lib/python3.8/site-packages/flask/app.py", line 792, in update_template_context
context.update(func())
File "/app/mailu/init.py", line 37, in inject_defaults
signup_domains = models.Domain.query.filter_by(signup_enabled=True).all()
File "/usr/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3161, in all
return list(self)
File "/usr/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3317, in iter
return self._execute_and_instances(context)
File "/usr/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3342, in _execute_and_instances
result = conn.execute(querycontext.statement, self._params)
File "/usr/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 988, in execute
return meth(self, multiparams, params)
File "/usr/lib/python3.8/site-packages/sqlalchemy/sql/elements.py", line 287, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/usr/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1101, in _execute_clauseelement
ret = self._execute_context(
File "/usr/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1247, in _execute_context
self._handle_dbapi_exception(
File "/usr/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1466, in _handle_dbapi_exception
util.raise_from_cause(sqlalchemy_exception, exc_info)
File "/usr/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 383, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb, cause=cause)
File "/usr/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 128, in reraise
raise value.with_traceback(tb)
File "/usr/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1243, in _execute_context
self.dialect.do_execute(
File "/usr/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 552, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such table: domain

Since its the default - sql lite implementation, we then go over and see that from the deployment there are three services that "arent working"

[dasm@tiserv-ci-00 part2]$ kubectl get pods -n secops-system
NAME READY STATUS RESTARTS AGE
secops-mail-server-mailu-admin-868487c86d-wvtfz 0/1 Running 1 5m59s
secops-mail-server-mailu-admin-d9947d758-mbsbf 0/1 Running 0 3m39s
secops-mail-server-mailu-clamav-7975784689-gzkqs 1/1 Running 0 5m59s
secops-mail-server-mailu-dovecot-549f8fbb4-zr6fd 1/1 Running 0 5m59s
secops-mail-server-mailu-front-7c5dd96647-5r4pw 1/1 Running 0 5m59s
secops-mail-server-mailu-postfix-58f4575d8c-fxtpl 0/1 Running 1 5m59s
secops-mail-server-mailu-redis-5b84b5d987-sbsfd 1/1 Running 0 5m59s
secops-mail-server-mailu-roundcube-5d4685b7f8-7f72v 0/1 Running 3 5m58s
secops-mail-server-mailu-rspamd-6984749897-rt7w2 1/1 Running 0 5m58s
[dasm@tiserv-ci-00 part2]$

ADMIN, POSTFIX and ROUNDCUBE - the common element there is the implementation of sqlite using the shared PVC.

looking at ADMIN gives nothing more than above.
ROUNDCUBE doesnt emit any logs by default

POSTFIX gives us a clue however, in that we can see the PVC that is mounted doesnt have the correct permissions when writing to the shared PVC or at least is not he owner that it things.

[dasm@tiserv-ci-00 part2]$ kubectl logs -f secops-mail-server-mailu-postfix-58f4575d8c-fxtpl -n secops-system
May 14 22:37:21 mail postfix[51]: Postfix is running with backwards-compatible default settings
May 14 22:37:21 mail postfix[51]: See http://www.postfix.org/COMPATIBILITY_README.html for details
May 14 22:37:21 mail postfix[51]: To disable backwards compatibility use "postconf compatibility_level=2" and "postfix reload"
May 14 22:37:22 mail postfix[266]: Postfix is running with backwards-compatible default settings
May 14 22:37:22 mail postfix[266]: See http://www.postfix.org/COMPATIBILITY_README.html for details
May 14 22:37:22 mail postfix[266]: To disable backwards compatibility use "postconf compatibility_level=2" and "postfix reload"
May 14 22:37:22 mail postfix/postfix-script[316]: warning: group or other writable: /queue/.
May 14 22:37:22 mail postfix/postfix-script[317]: warning: group or other writable: /queue/pid
May 14 22:37:22 mail postfix/postfix-script[330]: warning: not owned by postfix: /queue/active
May 14 22:37:22 mail postfix/postfix-script[331]: warning: not owned by postfix: /queue/bounce
May 14 22:37:22 mail postfix/postfix-script[332]: warning: not owned by postfix: /queue/corrupt
May 14 22:37:22 mail postfix/postfix-script[333]: warning: not owned by postfix: /queue/defer
May 14 22:37:22 mail postfix/postfix-script[334]: warning: not owned by postfix: /queue/deferred
May 14 22:37:22 mail postfix/postfix-script[335]: warning: not owned by postfix: /queue/flush
May 14 22:37:22 mail postfix/postfix-script[336]: warning: not owned by postfix: /queue/hold
May 14 22:37:22 mail postfix/postfix-script[337]: warning: not owned by postfix: /queue/incoming
May 14 22:37:22 mail postfix/postfix-script[338]: warning: not owned by postfix: /queue/private
May 14 22:37:22 mail postfix/postfix-script[339]: warning: not owned by postfix: /queue/public
May 14 22:37:22 mail postfix/postfix-script[340]: warning: not owned by postfix: /queue/saved
May 14 22:37:22 mail postfix/postfix-script[341]: warning: not owned by postfix: /queue/trace
May 14 22:37:22 mail postfix/postfix-script[343]: warning: not owned by postfix: /queue/maildrop
May 14 22:37:23 mail postfix/postfix-script[345]: warning: not owned by group postdrop: /queue/public
May 14 22:37:23 mail postfix/postfix-script[346]: warning: not owned by group postdrop: /queue/maildrop
May 14 22:37:23 mail postfix/postfix-script[349]: starting the Postfix mail system

So, i think the chain is that ADMIN is not sending back Liveness probe (thus you dont see runnning) because admin is failing to start cause it cant get the SQLITE DB, POSTFIX starts, but then doesnt send back a liveness probe either.> Roundcube neither.. thus i suspect the SQLITE implementation (whcih ties these together) is impacted

Looking into RoundCube now, i see that same 10.46.0.0/16 SUBNET defined in there, so will try to fix that, but i suspect the liveness probe failure stil comes back to something on the disk with that sqlite coming up - since none of the three components taht use it are working now. the "network error in k8s describe pods" is not a networking issue, since until the component have their DB they cant return http 200

[dasm@tiserv-ci-00 part2]$ kubectl describe pod secops-mail-server-mailu-roundcube-5d4685b7f8-7f72v -n secops-system
Name: secops-mail-server-mailu-roundcube-5d4685b7f8-7f72v
Namespace: secops-system
Priority: 0
Node: aks-default2-25272590-vmss000001/10.196.225.51
Start Time: Fri, 14 May 2021 22:31:45 +0000
Labels: app=secops-mail-server-mailu
component=roundcube
pod-template-hash=5d4685b7f8
Annotations:
Status: Running
IP: 10.196.225.105
IPs:
IP: 10.196.225.105
Controlled By: ReplicaSet/secops-mail-server-mailu-roundcube-5d4685b7f8
Containers:
roundcube:
Container ID: docker://82fb6a599f6d905e6877df94a3b5157655bed3e86f52e062c55fa33fcb9f4bb0
Image: mailu/roundcube:1.8
Image ID: docker-pullable://mailu/roundcube@sha256:2f8ef3466fae7445c60ba35f72cf8535fd768490a7ede819f69f18d27ec08010
Port: 80/TCP
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Fri, 14 May 2021 22:38:22 +0000
Finished: Fri, 14 May 2021 22:40:23 +0000
Ready: False
Restart Count: 4
Limits:
cpu: 200m
memory: 200Mi
Requests:
cpu: 100m
memory: 100Mi
Liveness: http-get http://:http/ delay=0s timeout=5s period=5s #success=1 #failure=30
Readiness: http-get http://:http/ delay=0s timeout=5s period=10s #success=1 #failure=1
Environment:
MESSAGE_SIZE_LIMIT: 20971520
IMAP_ADDRESS: secops-mail-server-mailu-dovecot
FRONT_ADDRESS: secops-mail-server-mailu-front
SECRET_KEY: S0m3th!ngGr38t
SUBNET: 10.42.0.0/16
ROUNDCUBE_DB_FLAVOR: sqlite
Mounts:
/data from data (rw,path="roundcube")
/var/run/secrets/kubernetes.io/serviceaccount from default-token-9lsgv (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: secops-mail-server-mailu-storage
ReadOnly: false
default-token-9lsgv:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-9lsgv
Optional: false
QoS Class: Burstable
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message

Normal Scheduled 9m15s default-scheduler Successfully assigned secops-system/secops-mail-server-mailu-roundcube-5d4685b7f8-7f72v to aks-default2-25272590-vmss000001
Normal Pulling 9m9s kubelet Pulling image "mailu/roundcube:1.8"
Normal Pulled 9m9s kubelet Successfully pulled image "mailu/roundcube:1.8"
Normal Created 9m9s kubelet Created container roundcube
Normal Started 9m8s kubelet Started container roundcube
Warning Unhealthy 8m (x14 over 9m2s) kubelet Liveness probe failed: Get http://10.196.225.105:80/: dial tcp 10.196.225.105:80: connect: connection refused
Warning Unhealthy 8m (x7 over 9m) kubelet Readiness probe failed: Get http://10.196.225.105:80/: dial tcp 10.196.225.105:80: connect: connection refused
Warning BackOff 4m8s kubelet Back-off restarting failed container
[dasm@tiserv-ci-00 part2]$

from helm-charts.

[BUG] about helm-charts HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent