-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-55239][CONNECT][YARN] Allow to launch SparkConnectServer in YARN cluster mode #54004
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
JIRA Issue Information=== Sub-task SPARK-55239 === This comment was automatically generated by GitHub Actions |
| logInfo("SparkConnectServer is starting in cluster deploy mode." + | ||
| "Use `yarn application -kill` command or YARN client API to stop the server.") | ||
| case (_, CLUSTER) if isConnectServer(args.mainClass) => | ||
| error("Cluster deploy mode is not applicable to Spark Connect server.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this message is not accurate then.
BTW, seems it's also applicable for Standalone and K8s cluster mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, seems it's also applicable for Standalone and K8s cluster mode?
For K8s, I think users have alternative way to run SparkConnectServer on the cluster like ThriftServer.
https://issues.apache.org/jira/browse/SPARK-23078?focusedCommentId=16705282&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16705282
For Standalone, as far as I know, we need further code change. So, I'd like to fix it for YARN for now as it just requires simple code change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of advantage of Spark is that spark-submit provides a consistent experience in most cases across different resource managers, and users are easy to switch to different resource managers, e.g., migrate from YARN to K8s, by providing a --master <foo>, I suggest making it also work on K8s cluster mode, if there is no technical challenge (maybe just require another 1 line change like for YARN).
For Standalone, as far as I know, we need further code change.
Good to know, thanks for the information.
| case (_, CLUSTER) if isThriftServer(args.mainClass) => | ||
| error("Cluster deploy mode is not applicable to Spark Thrift server.") | ||
| case (YARN, CLUSTER) if isConnectServer(args.mainClass) => | ||
| logInfo("SparkConnectServer is starting in cluster deploy mode." + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: a space
| logInfo("SparkConnectServer is starting in cluster deploy mode." + | |
| logInfo("SparkConnectServer is starting in cluster deploy mode. " + |
What changes were proposed in this pull request?
This PR proposes to allow to launch SparkConnectServer in YARN cluster mode.
Thanks to #46182, now we can know the location of SparkConnectServer on a cluster easily.
Why are the changes needed?
Service providers can utilize their Hadoop cluster for running SparkConnectServer for better resource utilization and high availability.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
I confirmed that SparkConnectServer launched on my Hadoop cluster with
sbin/start-connect-server.sh --master yarn --deploy-mode clusterand stopped withyarn application -kill <application id>.Was this patch authored or co-authored using generative AI tooling?
No.