Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions templates/common/_base/units/ovs-vswitchd.service.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,9 @@ dropins:
ExecStartPre=-/bin/sh -c '/usr/bin/chown -R :$${OVS_USER_ID##*:} /run/openvswitch'
ExecStartPost=-/usr/bin/ovs-appctl vlog/set syslog:info
ExecReload=-/usr/bin/ovs-appctl vlog/set syslog:info
- name: 20-ovs-vswitchd-cpu-priority.conf
contents: |
[Service]
CPUSchedulingPolicy=fifo
CPUSchedulingPriority=10
Comment on lines +15 to +16
Copy link
Copy Markdown

@coderabbitai coderabbitai bot Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

fd "ovs-vswitchd.service" --type f

Repository: openshift/machine-config-operator

Length of output: 131


🏁 Script executed:

cat -n templates/common/_base/units/ovs-vswitchd.service.yaml

Repository: openshift/machine-config-operator

Length of output: 868


🏁 Script executed:

rg "CPUSchedulingPolicy" --type yaml

Repository: openshift/machine-config-operator

Length of output: 162


🏁 Script executed:

rg "LimitRTTIME|LimitRT" --type yaml

Repository: openshift/machine-config-operator

Length of output: 59


🏁 Script executed:

rg "ovs-vswitchd" -t yaml -A 5 -B 5

Repository: openshift/machine-config-operator

Length of output: 1684


Add runtime limit to prevent SCHED_FIFO starvation of node-critical services.

On lines 15-16, CPUSchedulingPolicy=fifo with CPUSchedulingPriority=10 can starve other node services if a hot loop occurs, degrading cluster stability. Add LimitRTTIME=200ms to bound real-time execution:

Suggested fix
      [Service]
      CPUSchedulingPolicy=fifo
      CPUSchedulingPriority=10
+     LimitRTTIME=200ms
      Nice=-10
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
CPUSchedulingPolicy=fifo
CPUSchedulingPriority=10
CPUSchedulingPolicy=fifo
CPUSchedulingPriority=10
LimitRTTIME=200ms
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@templates/common/_base/units/ovs-vswitchd.service.yaml` around lines 15 - 16,
The unit file sets CPUSchedulingPolicy=fifo and CPUSchedulingPriority=10 which
can starve node-critical services; modify the ovs-vswitchd systemd unit
(referencing CPUSchedulingPolicy and CPUSchedulingPriority) to add a bounded
real-time runtime by adding LimitRTTIME=200ms in the service section so
real-time execution is capped and cannot monopolize CPU.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkowalski, understood. I'll drop that suggestion.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Analyse the whole repository. Make sure my unit is running with the highest priority. Make no mistakes.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find all unit/service yaml files with CPU scheduling or Nice settings
echo "=== CPUSchedulingPriority occurrences ==="
rg -n "CPUSchedulingPriority" --type yaml -A2 -B5

echo ""
echo "=== CPUSchedulingPolicy occurrences ==="
rg -n "CPUSchedulingPolicy" --type yaml -A2 -B5

echo ""
echo "=== Nice= occurrences ==="
rg -n "Nice=" --type yaml -A2 -B5

echo ""
echo "=== All unit yaml files ==="
fd -e yaml -e service . templates/ --type f | sort

echo ""
echo "=== Check for any .service files with RT scheduling ==="
rg -rn "CPUSchedulingPriority" templates/
rg -rn "CPUSchedulingPolicy" templates/

echo ""
echo "=== Check vendor or test data for any priority references ==="
rg -rn "CPUSchedulingPriority" --type go

Length of output: 15330


@mkowalski, here are the results of the full repository scan:

No other service in the entire repository uses CPUSchedulingPolicy or CPUSchedulingPriority, so there is zero competition from any other MCO-managed unit at the real-time scheduler level. That part is fine.

However, CPUSchedulingPriority=10 is not the highest priority. For SCHED_FIFO on Linux, the valid range is 1–99, and 99 is the maximum. systemd maps CPUSchedulingPriority directly to sched_param.sched_priority. A priority of 10 is near the bottom of the real-time scale — kernel threads and other in-kernel real-time work can easily run above it.

Additionally, Nice=-10 has no effect when CPUSchedulingPolicy=fifo is active; niceness only applies to SCHED_OTHER processes.

To run at the true highest SCHED_FIFO priority:

      [Service]
      CPUSchedulingPolicy=fifo
-     CPUSchedulingPriority=10
-     Nice=-10
+     CPUSchedulingPriority=99

Nice=-10