Skip to content

Conversation

@HoustonPutman
Copy link
Contributor

https://issues.apache.org/jira/browse/SOLR-18011

The Admin APIs will need to take in information in the request about the lock acquired for the parent Admin API call, so that it doesn't try to acquire that lock as well.

@HoustonPutman HoustonPutman marked this pull request as ready for review January 28, 2026 00:02
Copy link
Contributor

@gerlowskija gerlowskija left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No big changes jump out to me here, just a few questions and comments. Would be happy to approve as soon as you're able to clarify those and I can understand this just a bit better.

@@ -0,0 +1,9 @@
# See https://github.com/apache/solr/blob/main/dev-docs/changelog.adoc
title: Allow locked Admin APIs to call other locked AdminAPIs without deadlocking
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[0] If these deadlocks are a real thing that a user might've seen, then I think we should reword the changelog to highlight the benefit they get out of this change. Maybe something like:

Resolve locking-related deadlock in INSTALLSHARD and similar Admin API calls

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So currently no Admin APIs call other Admin APIs, they will run the Cmd themselves perhaps, but they won't submit it to the overseer or anything. Or they will call the coreAPIs themselves that the other collection API would do. But we currently get around this deadlock by just not doing it. I think we should be able to re-use code a bit better and have 1 entry point for a lot of these APIs, so this is a necessary part of that.

}
}

public String getCallingLockIds() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[0] I understand that a comma-delimited-string is our serialization format for when this is header, but it'd probably be nicer to have this method and others expose a List<String> or something similar. Saves callers from needing to understand the serialization format and parse it out themselves...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, I went back and forth on this. I can easily change it back though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'll just add a method to do the deserialization here, it's nice to keep it as a string throughout the process if we can.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I've added this and it works quite well.

params.remove(CommonParams.WT); // use default (currently javabin)
QueryRequest req = createQueryRequest(sreq, params, shard);
req.setMethod(SolrRequest.METHOD.POST);
if (sreq.headers != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Q] More thinking aloud than raising an objection. In general I've seen bugs before where headers are copied over blindly because certain things like auth headers, content-length, etc. may not be relevant for the sub-request.

But, in this case the ShardRequest is something that we've built by hand and that doesn't have all the headers from the original user request. So it should be safe to do a blind-copy here.

Do I have that right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, headers are added on an individual basis, and will probably only have this header for a while.

@HoustonPutman
Copy link
Contributor Author

Ultimately I think there is 2 things left to do here:

  • Currently we are keeping track of all previous locks in the command-calling chain. We only need the previous lock. It doesn't matter who else called that API, but each API should only guarantee locking for the things it calls, not 2 APIs down the chain.
  • I think there is a possibility of a dead-lock here. Where API-a locks collection an and API-b locks collection b, and API-a calls API-b and API-b calls API-a. Then calling API-a and API-b at the same time might cause a deadlock. We can fix this by insisting that sub-API requests are required to be in the same locking chain as the parent API request. I don't love making this caveat, but it would save us from dead-locking, and we can always relax the requirements later if we find a better way to eliminate deadlocks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants