Skip to content

Conversation

@dataroaring
Copy link
Contributor

Summary

  • For S3 paths without wildcards (*, ?, [...]), use HEAD requests instead of ListObjectsV2 to avoid requiring s3:ListBucket permission
  • Brace patterns like {1..10} are expanded to concrete file paths and verified individually with HEAD requests
  • This enables loading data from S3 when only s3:GetObject permission is granted

Motivation

S3 ListBucket permission is often more restricted than GetObject in enterprise environments. When users specify exact file paths or deterministic patterns like file{1..3}.csv, listing is unnecessary since the file names can be determined from the input.

Changes

File Description
S3Util.java Added isDeterministicPattern() to detect paths without wildcards, and expandBracePatterns() to expand brace patterns to concrete paths
S3ObjStorage.java Modified globListInternal() to use HEAD requests for deterministic paths
S3UtilTest.java Added unit tests for new utility methods

Examples

Path Deterministic? Behavior
s3://bucket/data/file.csv ✅ Yes Single HEAD request
s3://bucket/data/file{1..3}.csv ✅ Yes 3 HEAD requests
s3://bucket/data/*.csv ❌ No Falls back to LIST

Test Plan

  • Added unit tests for isDeterministicPattern()
  • Added unit tests for expandBracePatterns()
  • Manual testing with S3 TVF and Broker Load

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings February 1, 2026 20:44
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@dataroaring dataroaring force-pushed the optimize_s3_skip_list_for_deterministic_paths branch 4 times, most recently from b5f5124 to 82c3832 Compare February 1, 2026 20:54
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes S3 file access by using HEAD requests instead of LIST operations for deterministic file paths (paths without wildcards like *, ?, [...]). This enables loading data from S3 when only s3:GetObject permission is granted, without requiring s3:ListBucket permission.

Changes:

  • Added utility methods to detect deterministic patterns and expand brace patterns
  • Modified S3 object listing logic to use HEAD requests for deterministic paths
  • Added comprehensive unit tests for the new utility methods

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
fe/fe-core/src/main/java/org/apache/doris/common/util/S3Util.java Added isDeterministicPattern() to detect paths without wildcards, and expandBracePatterns() with helper methods to expand brace patterns to concrete paths
fe/fe-core/src/main/java/org/apache/doris/fs/obj/S3ObjStorage.java Modified globListInternal() to use HEAD requests instead of LIST operations for deterministic paths when no limits or startFile are specified
fe/fe-core/src/test/java/org/apache/doris/common/util/S3UtilTest.java Added comprehensive unit tests for isDeterministicPattern() and expandBracePatterns() methods
Comments suppressed due to low confidence (1)

fe/fe-core/src/main/java/org/apache/doris/common/util/S3Util.java:62

  • Duplicate imports detected. Lines 57-58 import java.util.ArrayList and java.util.List, but lines 60-62 duplicate these imports. Remove the duplicate imports on lines 60 and 62.
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

public void testExpandBracePatterns_withPath() {
// Full path with braces
List<String> result = S3Util.expandBracePatterns("data/year{2023,2024}/month{01,02}/file.csv");
Assert.assertEquals(8, result.size());
Copy link

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expected result size is incorrect. The pattern "data/year{2023,2024}/month{01,02}/file.csv" should expand to 4 paths (2 years × 2 months), not 8. The correct expansion should produce:

  1. data/year2023/month01/file.csv
  2. data/year2023/month02/file.csv
  3. data/year2024/month01/file.csv
  4. data/year2024/month02/file.csv

Change the expected size from 8 to 4.

Suggested change
Assert.assertEquals(8, result.size());
Assert.assertEquals(4, result.size());

Copilot uses AI. Check for mistakes.
@dataroaring dataroaring force-pushed the optimize_s3_skip_list_for_deterministic_paths branch 4 times, most recently from db904e5 to 77f8a42 Compare February 1, 2026 20:59
…uests

For S3 paths without wildcards (*, ?, [...]), use HEAD requests instead
of ListObjectsV2 to avoid requiring s3:ListBucket permission. This is
useful when only s3:GetObject permission is granted.

Brace patterns like {1..10} are expanded to concrete file paths and
verified individually with HEAD requests.
@dataroaring dataroaring force-pushed the optimize_s3_skip_list_for_deterministic_paths branch from 77f8a42 to 8f528c8 Compare February 2, 2026 01:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants