Coder Social home page Coder Social logo

Egeria-IGC-connectivity issues - Part II - Connectivity to IGC can't be restored after failed asset mapping about egeria-connector-ibm-information-server HOT 13 CLOSED

odpi avatar odpi commented on August 24, 2024
Egeria-IGC-connectivity issues - Part II - Connectivity to IGC can't be restored after failed asset mapping

from egeria-connector-ibm-information-server.

Comments (13)

jan-frommann avatar jan-frommann commented on August 24, 2024 1

@cmgrote absolutely, just give me a couple of hours...

from egeria-connector-ibm-information-server.

cmgrote avatar cmgrote commented on August 24, 2024 1

@planetf1 ^^ fyi -- I added those lines into our images that build from Git to ensure the cache is invalidated when there's a code change (included as part of odpi/egeria#1127)

from egeria-connector-ibm-information-server.

cmgrote avatar cmgrote commented on August 24, 2024

Thanks again for submitting -- I've reproduced this one on my side, investigating a fix.

Edit: actually while I can reproduce the errors above, communication still works fine for types that are mapped (it doesn't become blocked from that point on). However, it probably makes sense to anyway consider a default mapping to Referenceable for all IGC types that are not otherwise mapped rather than having them error-out.

from egeria-connector-ibm-information-server.

jan-frommann avatar jan-frommann commented on August 24, 2024

Hi Christopher,
I just realized, I could have a added a bit more information:

2019-06-11 11:37:06.253 ERROR 1 --- [       Thread-4] o.o.a.r.i.e.IGCOMRSRepositoryEventMapper : EntityDetail could not be retrieved for RID: b1c497ce.54bd3a08.9g868p8ap.1n94o64.biqmmt.d2vu7883hkqtnc5gknm6a
2019-06-11 11:37:06.253  WARN 1 --- [       Thread-4] o.o.o.a.r.i.r.stores.EntityMappingStore  : Unable to find mapping for IGC type: view
2019-06-11 11:37:06.254 ERROR 1 --- [       Thread-4] o.o.a.r.i.e.IGCOMRSRepositoryEventMapper : Failed trying to consume IGC events from Kafka.
java.lang.NullPointerException: null
    at org.odpi.openmetadata.adapters.repositoryservices.igc.repositoryconnector.IGCOMRSMetadataCollection.getMappers(IGCOMRSMetadataCollection.java:3748) ~[igc-repository-connector-1.1-SNAPSHOT.jar!/:na]
    at org.odpi.openmetadata.adapters.repositoryservices.igc.repositoryconnector.IGCOMRSMetadataCollection.getIgcPropertiesToRelationshipMappings(IGCOMRSMetadataCollection.java:3381) ~[igc-repository-connector-1.1-SNAPSHOT.jar!/:na]
    at org.odpi.openmetadata.adapters.repositoryservices.igc.eventmapper.IGCOMRSRepositoryEventMapper.processAsset(IGCOMRSRepositoryEventMapper.java:597) ~[igc-repository-connector-1.1-SNAPSHOT.jar!/:na]
    at org.odpi.openmetadata.adapters.repositoryservices.igc.eventmapper.IGCOMRSRepositoryEventMapper.processAssetEventV115(IGCOMRSRepositoryEventMapper.java:354) ~[igc-repository-connector-1.1-SNAPSHOT.jar!/:na]
    at org.odpi.openmetadata.adapters.repositoryservices.igc.eventmapper.IGCOMRSRepositoryEventMapper.processEventV115(IGCOMRSRepositoryEventMapper.java:278) ~[igc-repository-connector-1.1-SNAPSHOT.jar!/:na]
    at org.odpi.openmetadata.adapters.repositoryservices.igc.eventmapper.IGCOMRSRepositoryEventMapper.processEvent(IGCOMRSRepositoryEventMapper.java:238) ~[igc-repository-connector-1.1-SNAPSHOT.jar!/:na]
    at org.odpi.openmetadata.adapters.repositoryservices.igc.eventmapper.IGCOMRSRepositoryEventMapper$IGCKafkaConsumerThread.run(IGCOMRSRepositoryEventMapper.java:204) ~[igc-repository-connector-1.1-SNAPSHOT.jar!/:na]
    at java.lang.Thread.run(Thread.java:748) [na:1.8.0_201]
2019-06-11 11:37:06.254  WARN 1 --- [       Thread-4] o.a.k.c.c.internals.ConsumerCoordinator  : [Consumer clientId=consumer-1, groupId=IGCOMRSRepositoryEventMapper_consumer] Synchronous auto-commit of offsets {InfosphereEvents-0=OffsetAndMetadata{offset=4357, leaderEpoch=null, metadata=''}} failed: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
2019-06-11 11:37:06.254  INFO 1 --- [       Thread-4] o.a.k.c.c.internals.ConsumerCoordinator  : [Consumer clientId=consumer-1, groupId=IGCOMRSRepositoryEventMapper_consumer] Revoking previously assigned partitions [InfosphereEvents-0]
2019-06-11 11:37:06.254  INFO 1 --- [       Thread-4] o.a.k.c.c.internals.AbstractCoordinator  : [Consumer clientId=consumer-1, groupId=IGCOMRSRepositoryEventMapper_consumer] (Re-)joining group
2019-06-11 11:37:06.486  INFO 1 --- [       Thread-4] o.a.k.c.c.internals.AbstractCoordinator  : [Consumer clientId=consumer-1, groupId=IGCOMRSRepositoryEventMapper_consumer] Successfully joined group with generation 1
2019-06-11 11:37:06.487  INFO 1 --- [       Thread-4] o.a.k.c.c.internals.ConsumerCoordinator  : [Consumer clientId=consumer-1, groupId=IGCOMRSRepositoryEventMapper_consumer] Setting newly assigned partitions [InfosphereEvents-0]
2019-06-11 11:38:41.272  INFO 1 --- [       Thread-4] o.o.a.r.i.e.IGCOMRSRepositoryEventMapper : Skipping asset - no changes detected: 6662c0f2.e1b1ec6c.9g86m9t3m.t2imv1k.qiqbqo.6tfuum9v5ncp288rbemgc
2019-06-11 11:38:42.361  INFO 1 --- [       Thread-4] o.o.a.r.i.e.IGCOMRSRepositoryEventMapper : Skipping asset - no changes detected: 6662c0f2.e1b1ec6c.9g86m9t3m.t2imv1k.qiqbqo.6tfuum9v5ncp288rbemgc
2019-06-11 11:42:06.876  WARN 1 --- [Mapper_consumer] o.a.k.c.c.internals.AbstractCoordinator  : [Consumer clientId=consumer-1, groupId=IGCOMRSRepositoryEventMapper_consumer] This member will leave the group because consumer poll timeout has expired. This means the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time processing messages. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
2019-06-11 11:42:06.876  INFO 1 --- [Mapper_consumer] o.a.k.c.c.internals.AbstractCoordinator  : [Consumer clientId=consumer-1, groupId=IGCOMRSRepositoryEventMapper_consumer] Sending LeaveGroup request to coordinator xxxxx.xxxx.xx.xx:59092 (id: 2147483646 rack: null)
2019-06-11 11:42:52.275  INFO 1 --- [       Thread-4] o.o.a.r.i.e.IGCOMRSRepositoryEventMapper : Skipping asset - no changes detected: b1c497ce.60641b50.9g86all5v.kpa48s0.65v6qb.jtgk2uvn4kr74k56g8cdk
2019-06-11 11:43:57.901  INFO 1 --- [       Thread-4] o.o.a.r.i.e.IGCOMRSRepositoryEventMapper : Skipping asset - no changes detected: b1c497ce.60641b50.9g86all5v.kpa48s0.65v6qb.jtgk2uvn4kr74k56g8cdk

from egeria-connector-ibm-information-server.

cmgrote avatar cmgrote commented on August 24, 2024

@jan-frommann thanks for the additional details. Per my comments on #21 it looks like you're using an older version of the connector... Any chance you can re-test with the latest?

from egeria-connector-ibm-information-server.

cmgrote avatar cmgrote commented on August 24, 2024

Hi @jan-frommann -- just a polite follow-up to see if by any chance you've been able to re-test with the latest connector, and had better luck or are still running into the same (or other) problems?

from egeria-connector-ibm-information-server.

jan-frommann avatar jan-frommann commented on August 24, 2024

Hi @cmgrote, sorry, it took me so long to respond! I was able to make the majority run, but now job number 3 is having connectivity issues and ultimately fails. I rolled back to the old version and was happy to see, that it still works.
I used most parameters from my old values.yaml, but in some cases (credentials, for instance), I'm not entirely sure, if my settings are correct. For some reason, these errors look familiar....
Also, I spotted in the logs, that the host name can't be resolved and I haven't figured out why yet - the hosts files look good to me. I tested the connectivity using ping and nc -vz for all ports used, but found nothing out of the ordinary.
As far as I can tell, the job fails during instantiation.

from egeria-connector-ibm-information-server.

cmgrote avatar cmgrote commented on August 24, 2024

I used most parameters from my old values.yaml, but in some cases (credentials, for instance), I'm not entirely sure, if my settings are correct. For some reason, these errors look familiar....

There have been some changes to the values.yaml over time, so it would be worth doing a diff between yours and the latest in master to see if there are missing settings that you need to configure, or if some of the settings have been slightly re-organised for consistency.

The overall IGC area should look like this in the latest values.yaml:

ibmigc:
  enabled: true
  user: isadmin
  password: isadmin
  proxyserver: ibmigc
  internal:
    enabled: false
  external:
    enabled: true
    hostname: "your.hostname.com"
    ip: "192.168.0.100"
    ports:
      https: "9446"
      broker: "59092"

(That's assuming a cluster-external environment running IGC: naturally change the details around hostname, IP, credentials, ports, etc as needed.)

Also, I spotted in the logs, that the host name can't be resolved and I haven't figured out why yet - the hosts files look good to me. I tested the connectivity using ping and nc -vz for all ports used, but found nothing out of the ordinary.

Hosts can be tricky, with k8s managing its own internal network and DNS... Are you using a pre-existing IGC host (outside your k8s cluster) or a container that you're running within the cluster? Make sure you're using the appropriate settings for internal vs external above -- if you're using a container, the hostname of the IGC container must basically be hard-coded to infosvr to be resolvable from everywhere.

from egeria-connector-ibm-information-server.

planetf1 avatar planetf1 commented on August 24, 2024

I would recommend adding your overrides into a new file - ie ~/etc/cloud.yaml
You would set ONLY the values you want to modify from the default in values.yaml
This avoids changing the file. It won't help if we change the meaning of a parameter, but at least it will help if new values are added
Helm would then be run with
helm install vdc -f ~/etc/cloud.yaml

This doesn't address the failure, but is IMO a better practice for using helm charts

from egeria-connector-ibm-information-server.

planetf1 avatar planetf1 commented on August 24, 2024

@jan-frommann If you're using an external IGC we basically end up create a host/ip mapping in the containers. But we might be getting deep into the specifics of your network setup. Feel free to ping me and/or @cmgrote on slack as I suspect we'll be into network addresses/ports that might not be best shared in github? . Also if you're using multiple IGCs, that may require other changes

from egeria-connector-ibm-information-server.

jan-frommann avatar jan-frommann commented on August 24, 2024

Hi @cmgrote, I'm using an external IGC. In the last version I used, the connection worked mostly fine (the initial connection and the first few updates). The credentials used to be encrypted in the values.yaml - does it have to be clear text now?
Also, I left 'ibmigc' as proxyserver...is that right?

Hi @planetf1, I promise you, I will create such a file! I haven't ignored the advice, I swear - I just haven't gotten around to it yet.
Sounds like a great idea Nigel. I'll reach out to you!

from egeria-connector-ibm-information-server.

cmgrote avatar cmgrote commented on August 24, 2024

Initial offline analysis is currently pointing to at least one problem (maybe not the root problem, still TBD): that container images being built by default cache the intermediate layers and therefore do not pickup changes to the master codebase on a re-build of the image... Looks like there are some suggested workarounds to this which we most likely will want to adopt!

eg. https://stackoverflow.com/questions/36996046/how-to-prevent-dockerfile-caching-git-clone

from egeria-connector-ibm-information-server.

planetf1 avatar planetf1 commented on August 24, 2024

We're building a bit of a backlog on old atlas. I'm getting quite keen to just get things moving again now especially with the super work on IGC.

So... I'm going to start by merging some of the PRs we have outstanding .. and will test on IBM cloud & then fix any discrepancies & then remove old atlas/update build team.

from egeria-connector-ibm-information-server.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.