Trim console lines that are longer than limit#2466
Trim console lines that are longer than limit#2466trancexpress wants to merge 1 commit intoeclipse-platform:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR attempts to fix issue #2465, where the console character limit doesn't apply to lines that are longer than the limit. The issue occurs when a single line without newlines exceeds the console's high water mark - the existing trimming logic tries to trim to the start of the line (offset 0), resulting in no content being removed.
Changes:
- Modified the
trim()method inIOConsolePartitionerto handle the case where a single line exceeds the character limit
Comments suppressed due to low confidence (1)
debug/org.eclipse.ui.console/src/org/eclipse/ui/internal/console/IOConsolePartitioner.java:1207
- This fix lacks test coverage for the specific issue it's attempting to address. The existing testTrim() method only tests trimming with multiple lines (each line is "0123456789\n"). There should be a test that reproduces the scenario from issue #2465: a single very long line (e.g., 100,000 characters without newlines) with the console limit set to a lower value (e.g., 80,000 characters), to verify that the console properly trims the content even when it's all on one line.
// deal with case of one long line
cutOffset = Math.max(cutOffset, truncateOffset);
cutOffset = document.getLineOffset(cutoffLine);
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
debug/org.eclipse.ui.console/src/org/eclipse/ui/internal/console/IOConsolePartitioner.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (2)
debug/org.eclipse.ui.console/src/org/eclipse/ui/internal/console/IOConsolePartitioner.java:1206
- The comment "deal with case of one long line" is misleading. This logic applies to any line that is longer than the trim threshold, not just when there is only one line in the document. Consider updating the comment to clarify that this handles cases where individual lines exceed the buffer limit, for example: "trim within line if line is longer than buffer limit" or "ensure trimming happens even when line start would keep too much data".
// deal with case of one long line
debug/org.eclipse.ui.console/src/org/eclipse/ui/internal/console/IOConsolePartitioner.java:1207
- The fix addresses a specific edge case (very long single lines exceeding the console limit), but there is no test coverage added for this scenario. Consider adding a test similar to testTrim() but with a single very long line (e.g., 100,000 characters without newlines) to verify that trimming works correctly when the high water mark is exceeded. This would prevent regression of the bug described in issue #2465.
// deal with case of one long line
cutOffset = Math.max(cutOffset, truncateOffset);
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Test fails to look into: |
|
The fail in
(i.e. the output of the snippet above) While the test expects: The expectations are due to the code we need to change: If IMO the test expectations should be adjusted. @iloveeclipse WDYT? |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (1)
debug/org.eclipse.debug.tests/src/org/eclipse/debug/tests/console/IOConsoleTests.java:719
- Changing the expected offset to
1indicates trimming now leaves a leading newline (i.e., trims at an arbitrary offset instead of the start of a line). That seems like a regression from the original “trim to line start” behavior and doesn’t directly validate the long-line limit fix from #2465. Consider keeping the expectation at offset0for the normal multi-line case, and add a dedicated assertion/test scenario for a single line longer than the high-water mark (no newline) to ensure the buffer actually shrinks whentruncateToOffsetLineStartwould otherwise result incutOffset == 0.
c.getConsole().setWaterMarks(50, 100);
c.waitForScheduledJobs();
c.verifyContentByOffset("0123456789", 1);
assertTrue(c.getDocument().getNumberOfLines() < 15, "Document not trimmed.");
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Yes. |
|
This pull request changes some projects for the first time in this development cycle. An additional commit containing all the necessary changes was pushed to the top of this PR's branch. To obtain these changes (for example if you want to push more changes) either fetch from your fork or apply the git patch. Git patchFurther information are available in Common Build Issues - Missing version increments. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (1)
debug/org.eclipse.debug.tests/src/org/eclipse/debug/tests/console/IOConsoleTests.java:718
- The updated expectation (
verifyContentByOffset(..., 1)) suggests trimming may now leave an extra leading character (often a lone\nfrom splitting a CRLF delimiter) rather than trimming cleanly at a line boundary. It would be better for the implementation to avoid producing a leading empty line / partial delimiter and keep the test asserting content begins at offset 0; also consider adding a dedicated test that reproduces the reported bug (single output line longer than the high-water mark) so the fix is validated directly.
c.verifyContentByOffset("0123456789", 1);
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
No idea, but an extra test wouldn't hurt, also for |
I assume it will, yes. Considering the split is at the start of the console contents (oldest output), is that a concern? Even normal text in the oldest line will likely lose meaning when we cut it off. I assume the limit is meant to keep the latest important console output and drop the rest. |
I wonder if current code "properly" deals with Unicode surrogate pairs and |
|
I'm not so much concerned about the meaning of the text; obviously arbitrarily clipped text loses contextual meaning. It's the rendering that's a problem. A 1/2 surrogate is invalid and cannot be properly rendered. Yes, it's a corner case, so you might not care, but many languages (e.g., Chinese and Japanese) heavily use the surrogate pairs and they will likely notice such corner cases frequently whereas we might only notice if an emoji is corrupted. Anyway I didn't create a blocking review. Use your own good judgement. You might use What's actually sort of worse is that folks are very likely to expect that the cutoff number is the number of characters and if you only count unicode UTF-8 characters where a pair often represent a codepoint which is a character the user actually sees, the user might be surprised that the number does not represent visible characters. Again these corners you might ignore and are easier to ignore when your language has a simpler alphabet... |
Currently the code doesnt trim within a line. So I dont think there are problems with Windows line ends on |
I can try this tomorrow, it might not make things that more complicated.
The code on |
See e.g.: On So IMO if this is a concern, best to open a new ticket. |
This seems to be a big problem actually. I have this snippet: I lower the console limit to 1000 (the lowest possible value). With the change here, I see no symbols in the console. Splitting the smiley somehow breaks the output, the console becomes empty. Maybe the underlying widget text support on Linux breaks... Not only that, but for the smiley This particular smiley is composed of So apparently we must check not only the character at which we want to truncate, but also the one before (if any)... I think if either return And this is complicated enough to need a test... For the |
|
Yes, it's kind of horrible. Java characters are a UTF-16 encoding of Unicode which is a 32 bit encoding that in java is a codepoint,i.e., an int; the UTF-8 encoding is bytes. If a Java character is a isLowSurrogate, you should truncate before that character, i.e., the character should not be at the end of the string. If it is a isHighSurrogate, you should truncate after that character, i.e., the character should not be at the start of the string.. Keep in mind that these things come in pairs so it's mostly a matter increasing or decreasing the cutoff point by 1 to not be between the two characters of the pair. And yes, one concern is that if the encoding is invalid that some tools or views might have a problems processing the content, certainly all will have problems rendering the content given it's not a meaningful unicode codepoint. Anyway, thanks for investigating! 🏆 |
|
Ah, I see. I'll try using |
Seems to work as expected. I still need to add tests. |
Fixes: #2465