-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[opt](s3) Skip S3 listing for deterministic file paths using HEAD requests #60414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
b5f5124 to
82c3832
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR optimizes S3 file access by using HEAD requests instead of LIST operations for deterministic file paths (paths without wildcards like *, ?, [...]). This enables loading data from S3 when only s3:GetObject permission is granted, without requiring s3:ListBucket permission.
Changes:
- Added utility methods to detect deterministic patterns and expand brace patterns
- Modified S3 object listing logic to use HEAD requests for deterministic paths
- Added comprehensive unit tests for the new utility methods
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
fe/fe-core/src/main/java/org/apache/doris/common/util/S3Util.java |
Added isDeterministicPattern() to detect paths without wildcards, and expandBracePatterns() with helper methods to expand brace patterns to concrete paths |
fe/fe-core/src/main/java/org/apache/doris/fs/obj/S3ObjStorage.java |
Modified globListInternal() to use HEAD requests instead of LIST operations for deterministic paths when no limits or startFile are specified |
fe/fe-core/src/test/java/org/apache/doris/common/util/S3UtilTest.java |
Added comprehensive unit tests for isDeterministicPattern() and expandBracePatterns() methods |
Comments suppressed due to low confidence (1)
fe/fe-core/src/main/java/org/apache/doris/common/util/S3Util.java:62
- Duplicate imports detected. Lines 57-58 import
java.util.ArrayListandjava.util.List, but lines 60-62 duplicate these imports. Remove the duplicate imports on lines 60 and 62.
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| public void testExpandBracePatterns_withPath() { | ||
| // Full path with braces | ||
| List<String> result = S3Util.expandBracePatterns("data/year{2023,2024}/month{01,02}/file.csv"); | ||
| Assert.assertEquals(8, result.size()); |
Copilot
AI
Feb 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The expected result size is incorrect. The pattern "data/year{2023,2024}/month{01,02}/file.csv" should expand to 4 paths (2 years × 2 months), not 8. The correct expansion should produce:
data/year2023/month01/file.csvdata/year2023/month02/file.csvdata/year2024/month01/file.csvdata/year2024/month02/file.csv
Change the expected size from 8 to 4.
| Assert.assertEquals(8, result.size()); | |
| Assert.assertEquals(4, result.size()); |
db904e5 to
77f8a42
Compare
…uests
For S3 paths without wildcards (*, ?, [...]), use HEAD requests instead
of ListObjectsV2 to avoid requiring s3:ListBucket permission. This is
useful when only s3:GetObject permission is granted.
Brace patterns like {1..10} are expanded to concrete file paths and
verified individually with HEAD requests.
77f8a42 to
8f528c8
Compare
Summary
*,?,[...]), use HEAD requests instead of ListObjectsV2 to avoid requirings3:ListBucketpermission{1..10}are expanded to concrete file paths and verified individually with HEAD requestss3:GetObjectpermission is grantedMotivation
S3
ListBucketpermission is often more restricted thanGetObjectin enterprise environments. When users specify exact file paths or deterministic patterns likefile{1..3}.csv, listing is unnecessary since the file names can be determined from the input.Changes
S3Util.javaisDeterministicPattern()to detect paths without wildcards, andexpandBracePatterns()to expand brace patterns to concrete pathsS3ObjStorage.javaglobListInternal()to use HEAD requests for deterministic pathsS3UtilTest.javaExamples
s3://bucket/data/file.csvs3://bucket/data/file{1..3}.csvs3://bucket/data/*.csvTest Plan
isDeterministicPattern()expandBracePatterns()🤖 Generated with Claude Code