Hi ODPI Egeria team, I deployed Egeria on a dedicated Kubernetes clu

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Egeria-IGC-connectivity issues - Part II - Connectivity to IGC can't be restored after failed asset mapping about egeria-connector-ibm-information-server HOT 13 CLOSED

odpi commented on August 24, 2024

Egeria-IGC-connectivity issues - Part II - Connectivity to IGC can't be restored after failed asset mapping

from egeria-connector-ibm-information-server.

Comments (13)

jan-frommann commented on August 24, 2024 1

@cmgrote absolutely, just give me a couple of hours...

from egeria-connector-ibm-information-server.

cmgrote commented on August 24, 2024 1

@planetf1 ^^ fyi -- I added those lines into our images that build from Git to ensure the cache is invalidated when there's a code change (included as part of odpi/egeria#1127)

from egeria-connector-ibm-information-server.

cmgrote commented on August 24, 2024

Thanks again for submitting -- ~~I've reproduced this one on my side~~, investigating a fix.

Edit: actually while I can reproduce the errors above, communication still works fine for types that are mapped (it doesn't become blocked from that point on). However, it probably makes sense to anyway consider a default mapping to Referenceable for all IGC types that are not otherwise mapped rather than having them error-out.

from egeria-connector-ibm-information-server.

jan-frommann commented on August 24, 2024

Hi Christopher,
I just realized, I could have a added a bit more information:

2019-06-11 11:37:06.253 ERROR 1 --- [       Thread-4] o.o.a.r.i.e.IGCOMRSRepositoryEventMapper : EntityDetail could not be retrieved for RID: b1c497ce.54bd3a08.9g868p8ap.1n94o64.biqmmt.d2vu7883hkqtnc5gknm6a
2019-06-11 11:37:06.253  WARN 1 --- [       Thread-4] o.o.o.a.r.i.r.stores.EntityMappingStore  : Unable to find mapping for IGC type: view
2019-06-11 11:37:06.254 ERROR 1 --- [       Thread-4] o.o.a.r.i.e.IGCOMRSRepositoryEventMapper : Failed trying to consume IGC events from Kafka.
java.lang.NullPointerException: null
    at org.odpi.openmetadata.adapters.repositoryservices.igc.repositoryconnector.IGCOMRSMetadataCollection.getMappers(IGCOMRSMetadataCollection.java:3748) ~[igc-repository-connector-1.1-SNAPSHOT.jar!/:na]
    at org.odpi.openmetadata.adapters.repositoryservices.igc.repositoryconnector.IGCOMRSMetadataCollection.getIgcPropertiesToRelationshipMappings(IGCOMRSMetadataCollection.java:3381) ~[igc-repository-connector-1.1-SNAPSHOT.jar!/:na]
    at org.odpi.openmetadata.adapters.repositoryservices.igc.eventmapper.IGCOMRSRepositoryEventMapper.processAsset(IGCOMRSRepositoryEventMapper.java:597) ~[igc-repository-connector-1.1-SNAPSHOT.jar!/:na]
    at org.odpi.openmetadata.adapters.repositoryservices.igc.eventmapper.IGCOMRSRepositoryEventMapper.processAssetEventV115(IGCOMRSRepositoryEventMapper.java:354) ~[igc-repository-connector-1.1-SNAPSHOT.jar!/:na]
    at org.odpi.openmetadata.adapters.repositoryservices.igc.eventmapper.IGCOMRSRepositoryEventMapper.processEventV115(IGCOMRSRepositoryEventMapper.java:278) ~[igc-repository-connector-1.1-SNAPSHOT.jar!/:na]
    at org.odpi.openmetadata.adapters.repositoryservices.igc.eventmapper.IGCOMRSRepositoryEventMapper.processEvent(IGCOMRSRepositoryEventMapper.java:238) ~[igc-repository-connector-1.1-SNAPSHOT.jar!/:na]
    at org.odpi.openmetadata.adapters.repositoryservices.igc.eventmapper.IGCOMRSRepositoryEventMapper$IGCKafkaConsumerThread.run(IGCOMRSRepositoryEventMapper.java:204) ~[igc-repository-connector-1.1-SNAPSHOT.jar!/:na]
    at java.lang.Thread.run(Thread.java:748) [na:1.8.0_201]
2019-06-11 11:37:06.254  WARN 1 --- [       Thread-4] o.a.k.c.c.internals.ConsumerCoordinator  : [Consumer clientId=consumer-1, groupId=IGCOMRSRepositoryEventMapper_consumer] Synchronous auto-commit of offsets {InfosphereEvents-0=OffsetAndMetadata{offset=4357, leaderEpoch=null, metadata=''}} failed: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
2019-06-11 11:37:06.254  INFO 1 --- [       Thread-4] o.a.k.c.c.internals.ConsumerCoordinator  : [Consumer clientId=consumer-1, groupId=IGCOMRSRepositoryEventMapper_consumer] Revoking previously assigned partitions [InfosphereEvents-0]
2019-06-11 11:37:06.254  INFO 1 --- [       Thread-4] o.a.k.c.c.internals.AbstractCoordinator  : [Consumer clientId=consumer-1, groupId=IGCOMRSRepositoryEventMapper_consumer] (Re-)joining group
2019-06-11 11:37:06.486  INFO 1 --- [       Thread-4] o.a.k.c.c.internals.AbstractCoordinator  : [Consumer clientId=consumer-1, groupId=IGCOMRSRepositoryEventMapper_consumer] Successfully joined group with generation 1
2019-06-11 11:37:06.487  INFO 1 --- [       Thread-4] o.a.k.c.c.internals.ConsumerCoordinator  : [Consumer clientId=consumer-1, groupId=IGCOMRSRepositoryEventMapper_consumer] Setting newly assigned partitions [InfosphereEvents-0]
2019-06-11 11:38:41.272  INFO 1 --- [       Thread-4] o.o.a.r.i.e.IGCOMRSRepositoryEventMapper : Skipping asset - no changes detected: 6662c0f2.e1b1ec6c.9g86m9t3m.t2imv1k.qiqbqo.6tfuum9v5ncp288rbemgc
2019-06-11 11:38:42.361  INFO 1 --- [       Thread-4] o.o.a.r.i.e.IGCOMRSRepositoryEventMapper : Skipping asset - no changes detected: 6662c0f2.e1b1ec6c.9g86m9t3m.t2imv1k.qiqbqo.6tfuum9v5ncp288rbemgc
2019-06-11 11:42:06.876  WARN 1 --- [Mapper_consumer] o.a.k.c.c.internals.AbstractCoordinator  : [Consumer clientId=consumer-1, groupId=IGCOMRSRepositoryEventMapper_consumer] This member will leave the group because consumer poll timeout has expired. This means the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time processing messages. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
2019-06-11 11:42:06.876  INFO 1 --- [Mapper_consumer] o.a.k.c.c.internals.AbstractCoordinator  : [Consumer clientId=consumer-1, groupId=IGCOMRSRepositoryEventMapper_consumer] Sending LeaveGroup request to coordinator xxxxx.xxxx.xx.xx:59092 (id: 2147483646 rack: null)
2019-06-11 11:42:52.275  INFO 1 --- [       Thread-4] o.o.a.r.i.e.IGCOMRSRepositoryEventMapper : Skipping asset - no changes detected: b1c497ce.60641b50.9g86all5v.kpa48s0.65v6qb.jtgk2uvn4kr74k56g8cdk
2019-06-11 11:43:57.901  INFO 1 --- [       Thread-4] o.o.a.r.i.e.IGCOMRSRepositoryEventMapper : Skipping asset - no changes detected: b1c497ce.60641b50.9g86all5v.kpa48s0.65v6qb.jtgk2uvn4kr74k56g8cdk

from egeria-connector-ibm-information-server.

cmgrote commented on August 24, 2024

@jan-frommann thanks for the additional details. Per my comments on #21 it looks like you're using an older version of the connector... Any chance you can re-test with the latest?

from egeria-connector-ibm-information-server.

cmgrote commented on August 24, 2024

Hi @jan-frommann -- just a polite follow-up to see if by any chance you've been able to re-test with the latest connector, and had better luck or are still running into the same (or other) problems?

from egeria-connector-ibm-information-server.

jan-frommann commented on August 24, 2024

Hi @cmgrote, sorry, it took me so long to respond! I was able to make the majority run, but now job number 3 is having connectivity issues and ultimately fails. I rolled back to the old version and was happy to see, that it still works.
I used most parameters from my old values.yaml, but in some cases (credentials, for instance), I'm not entirely sure, if my settings are correct. For some reason, these errors look familiar....
Also, I spotted in the logs, that the host name can't be resolved and I haven't figured out why yet - the hosts files look good to me. I tested the connectivity using ping and nc -vz for all ports used, but found nothing out of the ordinary.
As far as I can tell, the job fails during instantiation.

from egeria-connector-ibm-information-server.

cmgrote commented on August 24, 2024

I used most parameters from my old values.yaml, but in some cases (credentials, for instance), I'm not entirely sure, if my settings are correct. For some reason, these errors look familiar....

There have been some changes to the values.yaml over time, so it would be worth doing a diff between yours and the latest in master to see if there are missing settings that you need to configure, or if some of the settings have been slightly re-organised for consistency.

The overall IGC area should look like this in the latest values.yaml:

ibmigc:
  enabled: true
  user: isadmin
  password: isadmin
  proxyserver: ibmigc
  internal:
    enabled: false
  external:
    enabled: true
    hostname: "your.hostname.com"
    ip: "192.168.0.100"
    ports:
      https: "9446"
      broker: "59092"

(That's assuming a cluster-external environment running IGC: naturally change the details around hostname, IP, credentials, ports, etc as needed.)

Also, I spotted in the logs, that the host name can't be resolved and I haven't figured out why yet - the hosts files look good to me. I tested the connectivity using ping and nc -vz for all ports used, but found nothing out of the ordinary.

Hosts can be tricky, with k8s managing its own internal network and DNS... Are you using a pre-existing IGC host (outside your k8s cluster) or a container that you're running within the cluster? Make sure you're using the appropriate settings for internal vs external above -- if you're using a container, the hostname of the IGC container must basically be hard-coded to infosvr to be resolvable from everywhere.

from egeria-connector-ibm-information-server.

planetf1 commented on August 24, 2024

I would recommend adding your overrides into a new file - ie ~/etc/cloud.yaml
You would set ONLY the values you want to modify from the default in values.yaml
This avoids changing the file. It won't help if we change the meaning of a parameter, but at least it will help if new values are added
Helm would then be run with
helm install vdc -f ~/etc/cloud.yaml

This doesn't address the failure, but is IMO a better practice for using helm charts

from egeria-connector-ibm-information-server.

planetf1 commented on August 24, 2024

@jan-frommann If you're using an external IGC we basically end up create a host/ip mapping in the containers. But we might be getting deep into the specifics of your network setup. Feel free to ping me and/or @cmgrote on slack as I suspect we'll be into network addresses/ports that might not be best shared in github? . Also if you're using multiple IGCs, that may require other changes

from egeria-connector-ibm-information-server.

jan-frommann commented on August 24, 2024

Hi @cmgrote, I'm using an external IGC. In the last version I used, the connection worked mostly fine (the initial connection and the first few updates). The credentials used to be encrypted in the values.yaml - does it have to be clear text now?
Also, I left 'ibmigc' as proxyserver...is that right?

Hi @planetf1, I promise you, I will create such a file! I haven't ignored the advice, I swear - I just haven't gotten around to it yet.
Sounds like a great idea Nigel. I'll reach out to you!

from egeria-connector-ibm-information-server.

cmgrote commented on August 24, 2024

Initial offline analysis is currently pointing to at least one problem (maybe not the root problem, still TBD): that container images being built by default cache the intermediate layers and therefore do not pickup changes to the master codebase on a re-build of the image... Looks like there are some suggested workarounds to this which we most likely will want to adopt!

eg. https://stackoverflow.com/questions/36996046/how-to-prevent-dockerfile-caching-git-clone

from egeria-connector-ibm-information-server.

planetf1 commented on August 24, 2024

We're building a bit of a backlog on old atlas. I'm getting quite keen to just get things moving again now especially with the super work on IGC.

So... I'm going to start by merging some of the PRs we have outstanding .. and will test on IBM cloud & then fix any discrepancies & then remove old atlas/update build team.

from egeria-connector-ibm-information-server.

Egeria-IGC-connectivity issues - Part II - Connectivity to IGC can't be restored after failed asset mapping about egeria-connector-ibm-information-server HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent