Comments (22)
Would Plugin.onIndexModule be useful here?
I tried the snippet that @msfroh provided using the GuiceHolder pattern in a plugin to get the indices service and tried to get the indexName within Plugin.createComponents or ClusterPlugin.onNodeStarted, but neither worked and the IndexService returned from IndexService indexService = indicesService.indexService(indexForUuid);
was null
at that point in execution.
Before the node is fully initialized, Plugin.onIndexModule is called for every index in the cluster where you can obtain the index name and UUID for all indices in the cluster. On Node bootstrap, you may see messages like:
[2024-04-02T00:13:33,394][INFO ][o.o.p.PluginsService ] [smoketestnode] PluginService:onIndexModule index:[.opendistro_security/tk6k2nCkRJuc5Gr4mXUxEQ]
and this line directly comes from onIndexModule
here:
Example of security plugin overriding onIndexModule: https://github.com/opensearch-project/security/blob/main/src/main/java/org/opensearch/security/OpenSearchSecurityPlugin.java#L664-L665
from opensearch.
@cwperks Thanks. This looks promising. I will run it by my engineering team. Will comment again in a few days. Much appreciated.
from opensearch.
This mapping could be easily obtained using _cat
APIs:
$ curl http://localhost:9200/_cat/indices?h=index,uuid
index1 SbFwt5hhSviSj1YTFtwyEg
from opensearch.
From an internal API standpoint, if you have a reference to IndicesService
, you can get it in a slightly roundabout way:
public static String resolveIndexName(IndicesService indicesService, String uuid) {
Index indexForUuid = new Index("", uuid);
IndexService indexService = indicesService.indexService(indexForUuid);
if (indexService == null) {
// throw an exception?
// Could also call indicesService.indexServiceSafe(indexForUuid),
// which will throw the exception for you.
}
return indexService.getIndexSettings().getIndex().getName();
}
from opensearch.
@cwperks onModuleModule
method is useful to collect the uuid and indexName mapping. However in case of cluster startup/restart, the lucene reading related things are called for existing indices (where we required the indexName from uuid) - called even before onIndexModule. Is there any other way to collect the uuid and indexName ?
Also an alternative option could be store this uuid - indexName
mapping in some persistence storage like file system similar to how Opensearch is maintaining the state in cluster with permissions to write to a file. So on cluster startup, we can read the mapping from the file and load it up in plugin as a MAP even before lucene read happens (on AbstractLifecycleComponent -> doStart).
Any pointers would be helpful.
from opensearch.
@anto-tl I do see this when I look in the data directory of a node:
> cat data/nodes/0/indices/WUqRmN4DQtKFE5dtOSpb0A/_state/state-8.st
?�lstate:)
�.address-book�versionԎmapping_versionďsettings_version�aliases_version�routing_num_shards$ ��stateCopen�settings�index.creation_dateL1713919726108�index.number_of_replicas@1�index.number_of_shards@1�index.provided_nameL.address-book�index.replication.typeGDOCUMENT�index.uuidUWUqRmN4DQtKFE5dtOSpb0A�index.version.createdH137217827�mappings���DFL�V�O�OV��V*(�/H-*�L-��2��K�sSA��� �T�ZQ���J�I+�N�,�/JAR�Q�L��/J�OL�/�525����aliases��primary_terms���in_sync_allocations�0�UxTQZjX_GSJaGz-udmOvwcg��rollover_info��system#���(��
Where WUqRmN4DQtKFE5dtOSpb0A
is the UUID of the .address-book
index:
> curl -XGET http://localhost:9200/_cat/indices
yellow open .address-book WUqRmN4DQtKFE5dtOSpb0A 1 1 4 0 15kb 15kb
It is on disk, but I'm not sure of how to read that in from a plugin before onIndexModule
is called.
from opensearch.
Thanks @reta and @msfroh. One of our constraints is as follows.
When the OpenSearch cluster starts up, it reads each index (headers).
Are these methods available at the cluster start up time?
I think the first one (curl /_cat/indices
) is not available at the cluster start up time. How about the second one (resolveIndexName)
?
Thanks both.
from opensearch.
[Triage - attendees 1 2 3 4 5 6 7 8]
@pakshi-titaniam It looks like this issue has been resolved. Please open a new issue if this is not the case.
from opensearch.
@cwperks Thanks for checking the details.
- I found that opensearch service is partially up before
onIndexModule
is called - in the place of lucene related code we have. So I tried to call/_cat/indices
and got the uuid and index mapping. - In the case of single node using the same node ip this is fine. In the case of multi node and when load balancer url is used, the API call can go to any available node on the restart/bootstrap. So it's not guaranteed that all nodes are up at this point of time and call might fail with service unavailable.
So to avoid complexity, I am looking for a way to do an internal call using code itself from plugin to get /_cat/indices
response without doing an external API call with ip.
I found that RestIndicesAction class is used for _cat/indices call internally. Any idea how to call this class method in plugin to get the _cat/indices
response details without doing an external API call?
from opensearch.
@cwperks Thanks for checking the details.
- I found that opensearch service is partially up before
onIndexModule
is called - in the place of lucene related code we have. So I tried to call/_cat/indices
and got the uuid and index mapping.- In the case of single node using the same node ip this is fine. In the case of multi node and when load balancer url is used, the API call can go to any available node on the restart/bootstrap. So it's not guaranteed that all nodes are up at this point of time and call might fail with service unavailable.
So to avoid complexity, I am looking for a way to do an internal call using code itself from plugin to get
/_cat/indices
response without doing an external API call with ip.I found that RestIndicesAction class is used for _cat/indices call internally. Any idea how to call this class method in plugin to get the
_cat/indices
response details without doing an external API call?
@cwperks Any idea? I have also tried multiple Listeners like ClusterStateListener
. These are all having the information after the state is loaded only. Most of the samples I have tried are not giving the necessary information before the lucene loading is completed and cluster state is changed.
from opensearch.
I was trying to take a deep dive to see how the files are read from disk on node bootstrap, but I haven't been able to fully grok the code path.
@reta @dblock @msfroh any other ideas for getting a full list of index names and UUIDs before a cluster has fully initialized?
from opensearch.
@anto-tl could you please clarify what you mean by
the lucene reading related things are called for existing indices (where we required the indexName from uuid) - called even before onIndexModule. Is there any other way to collect the uuid and indexName ?
from opensearch.
- On cluster restart I have noticed that the code flow goes to the following pieces of code
Calling: Store.tryOpenIndex
Calling: Lucene.readSegmentInfos
- From here it goes to
Lucene codec
which loads the indices. We have implemented/overridded Lucene codec for encryption.
https://github.com/apache/lucene/blob/releases/lucene/9.5.0/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L407
- At this point we can get the uuid from the file system location. Here we want to get the indexName from uuid.
- Problem with
onIndexModule
is that, it's called after the index is read by lucene. So we couldn't collect the uuid and indexName mapping on the restart scenario - As mentioned above one solution we are thinking is, trying to find a way to do the internal call from plugin code to
_cat/indices
content/response to get the indexName and uuid mapping as it's available at this point of time. I have already explained the problem with_cat/indices
external call here. So we like to get the information from current running node itself with api call. - Is there a way to get the instance of RestIndicesAction from plugin code and get the
_cat/indices
response without doing an external API call?
Note:
Also I have tried to get indices information from clusterService
(or) indicesService
references. But index information is not loaded at this point of time. After the lucene segement read completed and clusterChanged event is triggered then only I can get the information. But we need uuid-indexName
information even before this point of time.
from opensearch.
Ah I see, thanks @anto-tl for detailed explanation ... The ShardId
that is passed Store
method has ShardId
that in turn has reference to index name and uuid. Shouldn't it be sufficient? I think there is no mechanism to propagate this contextual information down the line, is that the problem you are running into?
The sequence of initialization looks valid to me: the index has to be initialized first before onIndexModule
call, it may be late to you since you apparently need that at the codec level.
from opensearch.
Hello @reta Thanks for the pointer on ShardId
. I need to check how can I hook this in plugin and grab information from Store -> shardId
. Let me take a look and let you know.
from opensearch.
Ah I see, thanks @anto-tl for detailed explanation ... The
ShardId
that is passedStore
method hasShardId
that in turn has reference to index name and uuid. Shouldn't it be sufficient? I think there is no mechanism to propagate this contextual information down the line, is that the problem you are running into?The sequence of initialization looks valid to me: the index has to be initialized first before
onIndexModule
call, it may be late to you since you apparently need that at the codec level.
@reta Is there a way to get shardId
information in plugin when TransportNodesListGatewayStartedShards->nodeOperation method is getting called. I am trying to find a way to extend this class and override nodeOperation
method to get the shardId
information (or from Store.tryOpenIndex
). Still not sure how can we register or use this in plugin to use that. Any idea how can we achieve this?
from opensearch.
Still not sure how can we register or use this in plugin to use that. Any idea how can we achieve this?
@anto-tl I don't think you could alter TransportNodesListGatewayStartedShards::nodeOperation
or Store.tryOpenIndex
in any ways, those are static methods. Here is another idea: since you need it at codec level, the CodecService
has IndexSettings
supplied that also has index details. Plus, you could provide your own using EnginePlugin::getCustomCodecServiceFactory
.
Besides that, I think we are getting to the end of the possible options, it seems like the feature you are working on needs to be looked to suggest the path forward.
from opensearch.
@reta Will take a look.
Also, Is there any option for calling _cat/indices
logic internally via code in the same node
without doing an external API call in plugin code? If we can call the same node to get the details then it's fine
Explained the multi node problem here and this is why wanted to external api call. If mTLS enabled (or) call goes to other on startup we can't get the needed _cat/indices response, that's why wanted to avoid external api call.
from opensearch.
Also, Is there any option for calling
_cat/indices
logic internally via code in thesame node
without doing an external API call in plugin code?
I thought you run into initialization sequence here, when the cluster was not ready to handle the requests when you called the API? In any case, createComponents
provides the Client
instance:
public Collection<Object> createComponents(
Client client,
ClusterService clusterService,
ThreadPool threadPool,
ResourceWatcherService resourceWatcherService,
ScriptService scriptService,
NamedXContentRegistry xContentRegistry,
Environment environment,
NodeEnvironment nodeEnvironment,
NamedWriteableRegistry namedWriteableRegistry,
IndexNameExpressionResolver indexNameExpressionResolver,
Supplier<RepositoriesService> repositoriesServiceSupplier
) {
...
}
The Client
is instance of NodeClient
(sadly, may need type check) which allows local execution: NodeClient::executeLocally
from opensearch.
@reta Thanks. If possible can you send some code snippet example for how to call _cat/indices
with client.executeLocally
if you have any idea, i.e how to build ActionType, ActionRequest, ActionListener for sending request for _cat/indices
call. I am seeing different samples on opensearch code for client.executeLocally
, but not sure how to build for _cat/indices
since there is no specific ActionType Instance or Request in the RestIndicesAction class.
Also more more clarification: When we use the client.executeLocally()
method from plugin code, if Opensearch is enabled with basic auth/mTLS, any authentication stuffs needs to be passed/required like we do for external API call ? (Assuming client.executeLocally is an internal call and not requires any authentication to be passed)
from opensearch.
@reta Finally I have found one way to get indexName and uuid mapping on the startup by below code
Works
- Get all indices settings and collected the index name uuid information.
client.admin().indices().getSettings(new GetSettingsRequest()).actionGet().getIndexToSettings().values();
Not worked
- Also I have tried like below to get the indicesStats information. But
indicesStatsResponse.getIndices()
map is empty on the startup. So couldn't use this
// Create a request to get indices stats
IndicesStatsRequest indicesStatsRequest = new IndicesStatsRequest();
indicesStatsRequest.indices();
indicesStatsRequest.indicesOptions(IndicesOptions.lenientExpandHidden());
indicesStatsRequest.all();
indicesStatsRequest.includeUnloadedSegments(false);
ActionListener<IndicesStatsResponse> actionListener = new ActionListener<>() {
@Override
public void onResponse(IndicesStatsResponse indicesStatsResponse) {
Map<String, IndexStats> indices = indicesStatsResponse.getIndices();
log.info("Indices: {}", indices);
}
@Override
public void onFailure(Exception e) {
log.error("error: {}", e.getCause());
}
};
client.executeLocally(IndicesStatsAction.INSTANCE, indicesStatsRequest, actionListener);
- Also tried this
client.admin().indices().prepareStats().all().get().getIndices(); // always returns 0 size
from opensearch.
@reta Finally I have found one way to get indexName and uuid mapping on the startup by below code
This is great, @anto-tl , I haven't looked into client.executeLocally
, but I suspect it is not relevant anymore, thanks a lot for the update.
from opensearch.
Related Issues (20)
- [Feature Request] Allow setting query parameters on requests HOT 1
- [RFC] Configurable Staleness for Search queries HOT 1
- [AUTOCUT] Gradle Check Failure on push to 2.x
- [Remote Store] Add support to timeout segment uploads
- [AUTOCUT] Gradle Check Failure on push to main HOT 1
- [Remote Cluster State] Create interfaces for remote cluster state
- [BUG] Test case org.opensearch.indices.IndicesRequestCacheIT.testDeleteAndCreateSameIndexShardOnSameNode {p0={"opensearch.experimental.feature.pluggable.caching.enabled":"true"/"false"}} is flaky HOT 2
- [AUTOCUT] Gradle Check Failure on push to main
- Using Combobox for Backend Role Mapping
- [BUG] test org.opensearch.indices.replication.SegmentReplicationRelocationIT.testRelocateWithQueuedOperationsDuringHandoff is flaky
- [Snapshot Interop] Shallow copy snapshots failing for closed indices
- [RFC] Search performance on warm index HOT 1
- [AUTOCUT] Gradle Check Failure on push to main HOT 1
- [Feature Request] IP prefix aggregation
- [BUG] Fail to visit multi-nested for some QueryBuilder
- Zero downtime reindexing with handling updates
- [BUG] org.opensearch.recovery.RecoveryWhileUnderLoadIT.testRecoverWhileUnderLoadAllocateReplicasRelocatePrimariesTest {p0={"cluster.indices.replication.strategy":"DOCUMENT"}} is flaky
- [BUG] org.opensearch.action.admin.indices.create.RemoteSplitIndexIT.classMethod is flaky
- [AUTOCUT] Gradle Check Failure on push to main HOT 1
- [BUG] org.opensearch.http.reactor.netty4.ReactorNetty4PipeliningIT.testThatNettyHttpServerSupportsPipelining is flaky
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from opensearch.