Skip to content

Conversation

@sarutak
Copy link
Member

@sarutak sarutak commented Jan 27, 2026

What changes were proposed in this pull request?

This PR proposes to allow to launch SparkConnectServer in YARN cluster mode.
Thanks to #46182, now we can know the location of SparkConnectServer on a cluster easily.

Why are the changes needed?

Service providers can utilize their Hadoop cluster for running SparkConnectServer for better resource utilization and high availability.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

I confirmed that SparkConnectServer launched on my Hadoop cluster with sbin/start-connect-server.sh --master yarn --deploy-mode cluster and stopped with yarn application -kill <application id>.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions
Copy link

JIRA Issue Information

=== Sub-task SPARK-55239 ===
Summary: Allow to launch SparkConnectServer in YARN cluster mode
Assignee: None
Status: Open
Affected: ["4.2.0"]


This comment was automatically generated by GitHub Actions

@github-actions github-actions bot added the CORE label Jan 27, 2026
logInfo("SparkConnectServer is starting in cluster deploy mode." +
"Use `yarn application -kill` command or YARN client API to stop the server.")
case (_, CLUSTER) if isConnectServer(args.mainClass) =>
error("Cluster deploy mode is not applicable to Spark Connect server.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this message is not accurate then.

BTW, seems it's also applicable for Standalone and K8s cluster mode?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, seems it's also applicable for Standalone and K8s cluster mode?

For K8s, I think users have alternative way to run SparkConnectServer on the cluster like ThriftServer.
https://issues.apache.org/jira/browse/SPARK-23078?focusedCommentId=16705282&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16705282

For Standalone, as far as I know, we need further code change. So, I'd like to fix it for YARN for now as it just requires simple code change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of advantage of Spark is that spark-submit provides a consistent experience in most cases across different resource managers, and users are easy to switch to different resource managers, e.g., migrate from YARN to K8s, by providing a --master <foo>, I suggest making it also work on K8s cluster mode, if there is no technical challenge (maybe just require another 1 line change like for YARN).

For Standalone, as far as I know, we need further code change.

Good to know, thanks for the information.

case (_, CLUSTER) if isThriftServer(args.mainClass) =>
error("Cluster deploy mode is not applicable to Spark Thrift server.")
case (YARN, CLUSTER) if isConnectServer(args.mainClass) =>
logInfo("SparkConnectServer is starting in cluster deploy mode." +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: a space

Suggested change
logInfo("SparkConnectServer is starting in cluster deploy mode." +
logInfo("SparkConnectServer is starting in cluster deploy mode. " +

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants