Coder Social home page Coder Social logo

Comments (22)

cwperks avatar cwperks commented on May 26, 2024 2

Would Plugin.onIndexModule be useful here?

I tried the snippet that @msfroh provided using the GuiceHolder pattern in a plugin to get the indices service and tried to get the indexName within Plugin.createComponents or ClusterPlugin.onNodeStarted, but neither worked and the IndexService returned from IndexService indexService = indicesService.indexService(indexForUuid); was null at that point in execution.


Before the node is fully initialized, Plugin.onIndexModule is called for every index in the cluster where you can obtain the index name and UUID for all indices in the cluster. On Node bootstrap, you may see messages like:

[2024-04-02T00:13:33,394][INFO ][o.o.p.PluginsService     ] [smoketestnode] PluginService:onIndexModule index:[.opendistro_security/tk6k2nCkRJuc5Gr4mXUxEQ]

and this line directly comes from onIndexModule here:

logger.info("PluginService:onIndexModule index:" + indexModule.getIndex());

Example of security plugin overriding onIndexModule: https://github.com/opensearch-project/security/blob/main/src/main/java/org/opensearch/security/OpenSearchSecurityPlugin.java#L664-L665

from opensearch.

pakshi-titaniam avatar pakshi-titaniam commented on May 26, 2024 2

@cwperks Thanks. This looks promising. I will run it by my engineering team. Will comment again in a few days. Much appreciated.

from opensearch.

reta avatar reta commented on May 26, 2024 1

This mapping could be easily obtained using _cat APIs:

$ curl http://localhost:9200/_cat/indices?h=index,uuid

index1 SbFwt5hhSviSj1YTFtwyEg

from opensearch.

msfroh avatar msfroh commented on May 26, 2024 1

From an internal API standpoint, if you have a reference to IndicesService, you can get it in a slightly roundabout way:

public static String resolveIndexName(IndicesService indicesService, String uuid) {
  Index indexForUuid = new Index("", uuid);
  IndexService indexService = indicesService.indexService(indexForUuid);
  if (indexService == null) {
    // throw an exception? 
    // Could also call indicesService.indexServiceSafe(indexForUuid), 
    // which will throw the exception for you.
  }
  return indexService.getIndexSettings().getIndex().getName();
}


from opensearch.

anto-tl avatar anto-tl commented on May 26, 2024 1

@cwperks onModuleModule method is useful to collect the uuid and indexName mapping. However in case of cluster startup/restart, the lucene reading related things are called for existing indices (where we required the indexName from uuid) - called even before onIndexModule. Is there any other way to collect the uuid and indexName ?

Also an alternative option could be store this uuid - indexName mapping in some persistence storage like file system similar to how Opensearch is maintaining the state in cluster with permissions to write to a file. So on cluster startup, we can read the mapping from the file and load it up in plugin as a MAP even before lucene read happens (on AbstractLifecycleComponent -> doStart).

Any pointers would be helpful.

from opensearch.

cwperks avatar cwperks commented on May 26, 2024 1

@anto-tl I do see this when I look in the data directory of a node:

> cat data/nodes/0/indices/WUqRmN4DQtKFE5dtOSpb0A/_state/state-8.st
?�lstate:)
�.address-book�versionԎmapping_versionďsettings_version�aliases_version�routing_num_shards$ ��stateCopen�settings�index.creation_dateL1713919726108�index.number_of_replicas@1�index.number_of_shards@1�index.provided_nameL.address-book�index.replication.typeGDOCUMENT�index.uuidUWUqRmN4DQtKFE5dtOSpb0A�index.version.createdH137217827�mappings���DFL�V�O�OV��V*(�/H-*�L-��2��K�sSA��� �T�ZQ���J�I+�N�,�/JAR�Q�L��/J�OL�/�525����aliases��primary_terms���in_sync_allocations�0�UxTQZjX_GSJaGz-udmOvwcg��rollover_info��system#���(��

Where WUqRmN4DQtKFE5dtOSpb0A is the UUID of the .address-book index:

> curl -XGET http://localhost:9200/_cat/indices
yellow open .address-book WUqRmN4DQtKFE5dtOSpb0A 1 1 4 0 15kb 15kb

It is on disk, but I'm not sure of how to read that in from a plugin before onIndexModule is called.

from opensearch.

pakshi-titaniam avatar pakshi-titaniam commented on May 26, 2024

Thanks @reta and @msfroh. One of our constraints is as follows.
When the OpenSearch cluster starts up, it reads each index (headers).
Are these methods available at the cluster start up time?
I think the first one (curl /_cat/indices) is not available at the cluster start up time. How about the second one (resolveIndexName)?
Thanks both.

from opensearch.

peternied avatar peternied commented on May 26, 2024

[Triage - attendees 1 2 3 4 5 6 7 8]
@pakshi-titaniam It looks like this issue has been resolved. Please open a new issue if this is not the case.

from opensearch.

anto-tl avatar anto-tl commented on May 26, 2024

@cwperks Thanks for checking the details.

  • I found that opensearch service is partially up before onIndexModule is called - in the place of lucene related code we have. So I tried to call /_cat/indices and got the uuid and index mapping.
  • In the case of single node using the same node ip this is fine. In the case of multi node and when load balancer url is used, the API call can go to any available node on the restart/bootstrap. So it's not guaranteed that all nodes are up at this point of time and call might fail with service unavailable.

So to avoid complexity, I am looking for a way to do an internal call using code itself from plugin to get /_cat/indices response without doing an external API call with ip.

I found that RestIndicesAction class is used for _cat/indices call internally. Any idea how to call this class method in plugin to get the _cat/indices response details without doing an external API call?

from opensearch.

anto-tl avatar anto-tl commented on May 26, 2024

@cwperks Thanks for checking the details.

  • I found that opensearch service is partially up before onIndexModule is called - in the place of lucene related code we have. So I tried to call /_cat/indices and got the uuid and index mapping.
  • In the case of single node using the same node ip this is fine. In the case of multi node and when load balancer url is used, the API call can go to any available node on the restart/bootstrap. So it's not guaranteed that all nodes are up at this point of time and call might fail with service unavailable.

So to avoid complexity, I am looking for a way to do an internal call using code itself from plugin to get /_cat/indices response without doing an external API call with ip.

I found that RestIndicesAction class is used for _cat/indices call internally. Any idea how to call this class method in plugin to get the _cat/indices response details without doing an external API call?

@cwperks Any idea? I have also tried multiple Listeners like ClusterStateListener. These are all having the information after the state is loaded only. Most of the samples I have tried are not giving the necessary information before the lucene loading is completed and cluster state is changed.

from opensearch.

cwperks avatar cwperks commented on May 26, 2024

I was trying to take a deep dive to see how the files are read from disk on node bootstrap, but I haven't been able to fully grok the code path.

@reta @dblock @msfroh any other ideas for getting a full list of index names and UUIDs before a cluster has fully initialized?

from opensearch.

reta avatar reta commented on May 26, 2024

@anto-tl could you please clarify what you mean by

the lucene reading related things are called for existing indices (where we required the indexName from uuid) - called even before onIndexModule. Is there any other way to collect the uuid and indexName ?

from opensearch.

anto-tl avatar anto-tl commented on May 26, 2024

@reta

  • On cluster restart I have noticed that the code flow goes to the following pieces of code

Calling: Store.tryOpenIndex

https://github.com/opensearch-project/OpenSearch/blob/2.9.0/server/src/main/java/org/opensearch/gateway/TransportNodesListGatewayStartedShards.java#L185

Calling: Lucene.readSegmentInfos

https://github.com/opensearch-project/OpenSearch/blob/2.9.0/server/src/main/java/org/opensearch/index/store/Store.java#L602
...
...

  • At this point we can get the uuid from the file system location. Here we want to get the indexName from uuid.
  • Problem with onIndexModule is that, it's called after the index is read by lucene. So we couldn't collect the uuid and indexName mapping on the restart scenario
  • As mentioned above one solution we are thinking is, trying to find a way to do the internal call from plugin code to _cat/indices content/response to get the indexName and uuid mapping as it's available at this point of time. I have already explained the problem with _cat/indices external call here. So we like to get the information from current running node itself with api call.
  • Is there a way to get the instance of RestIndicesAction from plugin code and get the _cat/indices response without doing an external API call?

Note:
Also I have tried to get indices information from clusterService (or) indicesService references. But index information is not loaded at this point of time. After the lucene segement read completed and clusterChanged event is triggered then only I can get the information. But we need uuid-indexName information even before this point of time.

from opensearch.

reta avatar reta commented on May 26, 2024

Ah I see, thanks @anto-tl for detailed explanation ... The ShardId that is passed Store method has ShardId that in turn has reference to index name and uuid. Shouldn't it be sufficient? I think there is no mechanism to propagate this contextual information down the line, is that the problem you are running into?

The sequence of initialization looks valid to me: the index has to be initialized first before onIndexModule call, it may be late to you since you apparently need that at the codec level.

from opensearch.

anto-tl avatar anto-tl commented on May 26, 2024

Hello @reta Thanks for the pointer on ShardId. I need to check how can I hook this in plugin and grab information from Store -> shardId. Let me take a look and let you know.

from opensearch.

anto-tl avatar anto-tl commented on May 26, 2024

Ah I see, thanks @anto-tl for detailed explanation ... The ShardId that is passed Store method has ShardId that in turn has reference to index name and uuid. Shouldn't it be sufficient? I think there is no mechanism to propagate this contextual information down the line, is that the problem you are running into?

The sequence of initialization looks valid to me: the index has to be initialized first before onIndexModule call, it may be late to you since you apparently need that at the codec level.

@reta Is there a way to get shardId information in plugin when TransportNodesListGatewayStartedShards->nodeOperation method is getting called. I am trying to find a way to extend this class and override nodeOperation method to get the shardId information (or from Store.tryOpenIndex). Still not sure how can we register or use this in plugin to use that. Any idea how can we achieve this?

from opensearch.

reta avatar reta commented on May 26, 2024

Still not sure how can we register or use this in plugin to use that. Any idea how can we achieve this?

@anto-tl I don't think you could alter TransportNodesListGatewayStartedShards::nodeOperation or Store.tryOpenIndex in any ways, those are static methods. Here is another idea: since you need it at codec level, the CodecService has IndexSettings supplied that also has index details. Plus, you could provide your own using EnginePlugin::getCustomCodecServiceFactory.

Besides that, I think we are getting to the end of the possible options, it seems like the feature you are working on needs to be looked to suggest the path forward.

from opensearch.

anto-tl avatar anto-tl commented on May 26, 2024

@reta Will take a look.
Also, Is there any option for calling _cat/indices logic internally via code in the same node without doing an external API call in plugin code? If we can call the same node to get the details then it's fine

Explained the multi node problem here and this is why wanted to external api call. If mTLS enabled (or) call goes to other on startup we can't get the needed _cat/indices response, that's why wanted to avoid external api call.

from opensearch.

reta avatar reta commented on May 26, 2024

Also, Is there any option for calling _cat/indices logic internally via code in the same node without doing an external API call in plugin code?

I thought you run into initialization sequence here, when the cluster was not ready to handle the requests when you called the API? In any case, createComponents provides the Client instance:

    public Collection<Object> createComponents(
        Client client,
        ClusterService clusterService,
        ThreadPool threadPool,
        ResourceWatcherService resourceWatcherService,
        ScriptService scriptService,
        NamedXContentRegistry xContentRegistry,
        Environment environment,
        NodeEnvironment nodeEnvironment,
        NamedWriteableRegistry namedWriteableRegistry,
        IndexNameExpressionResolver indexNameExpressionResolver,
        Supplier<RepositoriesService> repositoriesServiceSupplier
    ) {
 ...
}

The Client is instance of NodeClient (sadly, may need type check) which allows local execution: NodeClient::executeLocally

from opensearch.

anto-tl avatar anto-tl commented on May 26, 2024

@reta Thanks. If possible can you send some code snippet example for how to call _cat/indices with client.executeLocally if you have any idea, i.e how to build ActionType, ActionRequest, ActionListener for sending request for _cat/indices call. I am seeing different samples on opensearch code for client.executeLocally, but not sure how to build for _cat/indices since there is no specific ActionType Instance or Request in the RestIndicesAction class.

Also more more clarification: When we use the client.executeLocally() method from plugin code, if Opensearch is enabled with basic auth/mTLS, any authentication stuffs needs to be passed/required like we do for external API call ? (Assuming client.executeLocally is an internal call and not requires any authentication to be passed)

from opensearch.

anto-tl avatar anto-tl commented on May 26, 2024

@reta Finally I have found one way to get indexName and uuid mapping on the startup by below code

Works

  • Get all indices settings and collected the index name uuid information.
client.admin().indices().getSettings(new GetSettingsRequest()).actionGet().getIndexToSettings().values();

Not worked

  • Also I have tried like below to get the indicesStats information. But indicesStatsResponse.getIndices() map is empty on the startup. So couldn't use this
// Create a request to get indices stats
IndicesStatsRequest indicesStatsRequest = new IndicesStatsRequest();
indicesStatsRequest.indices();
indicesStatsRequest.indicesOptions(IndicesOptions.lenientExpandHidden());
indicesStatsRequest.all();
indicesStatsRequest.includeUnloadedSegments(false);

ActionListener<IndicesStatsResponse> actionListener = new ActionListener<>() {
    @Override
    public void onResponse(IndicesStatsResponse indicesStatsResponse) {
        Map<String, IndexStats> indices = indicesStatsResponse.getIndices();
        log.info("Indices: {}", indices);
    }

    @Override
    public void onFailure(Exception e) {
        log.error("error: {}", e.getCause());
    }
};

client.executeLocally(IndicesStatsAction.INSTANCE, indicesStatsRequest, actionListener);
  • Also tried this
client.admin().indices().prepareStats().all().get().getIndices(); // always returns 0 size

from opensearch.

reta avatar reta commented on May 26, 2024

@reta Finally I have found one way to get indexName and uuid mapping on the startup by below code

This is great, @anto-tl , I haven't looked into client.executeLocally, but I suspect it is not relevant anymore, thanks a lot for the update.

from opensearch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.