diff --git a/CLAUDE.md b/CLAUDE.md index ffe399f..7ac519e 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -250,6 +250,14 @@ Retry logic for GitHub API rate limiting: 10 attempts with linear additive backo `AuthHeaderCache` provides thread-safe per-origin auth header caching (modeled after `RateLimitCoordinator`). `get_origin()` normalizes URL variants (GitHub web/raw/API, GitLab web/API) to a single cache key. A single instance is created in `main()` and shared across both the validation phase (`FileValidator`) and all download functions, so auth discovered during validation is reused during downloads. The first request per origin probes unauthenticated; on auth resolution, subsequent requests for the same origin skip the probe. `resolve_and_cache()` uses double-checked locking for thread safety. `_fetch_url_core()` (DRY refactor of the former near-duplicate `fetch_url_with_auth`/`fetch_url_bytes_with_auth`) implements auth resolution priority: explicit `auth_headers` param > `auth_cache` hit > unauthenticated probe with cache population. GitHub API transport headers (`Accept: application/vnd.github.raw+json`, `X-GitHub-Api-Version`) are managed independently from cached auth headers via `github_api_headers` dict and `_build_request()` helper in `_fetch_url_core()`, applied to every `Request` construction in `_do_fetch()` when `'api.github.com' in url`. +**Lazy-auth invariant (validation + download parity):** Both `FileValidator.validate_remote_url` and `_fetch_url_core._do_fetch` MUST honor the "try unauthenticated first; escalate to authentication only on HTTP 401, 403, or 404" contract. Non-auth failures (SSL, DNS, 5xx, timeout, 416 Range Not Satisfiable) MUST NOT trigger auth prompts. Successful unauthenticated probes cache a `None` sentinel in `AuthHeaderCache` for downstream consumers. When `auth_cache` is available, route escalation through `AuthHeaderCache.resolve_and_cache(url)` (double-checked locking) to serialize prompts across parallel validation threads. Helper functions `_check_with_head` and `_check_with_range` return `tuple[bool, int | None]` where the second element is the HTTP status code (or `None` for non-HTTP failures) used to gate the escalation decision. + +**Authentication SRP:** `get_auth_headers(url, auth_param)` is a thin orchestrator that delegates to `resolve_credentials(url, auth_param)` (pure, non-interactive -- reads CLI param + env vars only) and then escalates to `prompt_for_credentials(url, tokens_checked=[...])` (interactive, guarded by `sys.stdin.isatty()`). The split is transparent to existing callers (`get_auth_headers` preserves its original signature and behavior), but enables non-interactive callers to opt into credential resolution without risking user prompts. The interactive prompt uses the wording `Authentication required for {url}` (both the `warning(...)` on interactive terminals and the `info(...)` fallback on non-interactive terminals emit this URL-qualified message); the earlier `Private {RepoType} repository detected but no authentication found` wording has been removed. + +**Repository type classification (`detect_repo_type`):** Uses hostname-based matching via `urllib.parse.urlparse`. GitHub Pages URLs (hostname ending in `.github.io`) are explicitly classified as `None` (not `'github'`) because GitHub Pages sites are static HTTP hosts with no repository auth model. `github.com`, `raw.githubusercontent.com`, and `api.github.com` are classified as `'github'`. GitLab classification uses both hostname substring (`'gitlab' in host`) and the API path marker (`/api/v4/projects/`). + +**GitHub 404 disambiguation (UX polish):** When a GitHub URL returns HTTP 404 during validation, before escalating to an auth prompt, the validator probes `GET https://api.github.com/repos/{owner}/{repo}` unauthenticated. If the repo endpoint returns 200, the original 404 is classified as a **genuine missing file** (typo) and the auth prompt is skipped (returns failure directly). If the repo endpoint returns 404 (private-hidden or nonexistent -- GitHub does not distinguish), the validator conservatively escalates to the auth prompt. Rate-limiting or network errors from the repo probe also trigger conservative escalation. The `/contents/{path}` endpoint intentionally cannot distinguish private-hidden from missing, which is why `/repos/{owner}/{repo}` is used instead. + ### Hooks Configuration Structure Hooks support four types matching the official Claude Code hooks specification: diff --git a/docs/environment-configuration-guide.md b/docs/environment-configuration-guide.md index 5246ec2..258577b 100644 --- a/docs/environment-configuration-guide.md +++ b/docs/environment-configuration-guide.md @@ -1533,6 +1533,8 @@ Authentication is resolved in this order (highest priority first): - GitLab web URLs (`/-/raw/`) are automatically converted to API format - GitHub raw URLs are automatically converted to API URLs - Public access is attempted first; authentication is applied only on 401/403/404 responses +- GitHub Pages URLs (`*.github.io`) are never treated as repository URLs -- no auth prompt is issued for them even on 404 responses +- For GitHub 404 responses, the setup script probes `api.github.com/repos/{owner}/{repo}` unauthenticated to distinguish missing files in public repositories (no auth prompt) from private or nonexistent repositories (auth prompt) ### Automatic Auto-Update Management diff --git a/scripts/setup_environment.py b/scripts/setup_environment.py index 7fffc83..dcd9af7 100644 --- a/scripts/setup_environment.py +++ b/scripts/setup_environment.py @@ -3036,19 +3036,45 @@ def __init__(self, auth_param: str | None = None, auth_cache: 'AuthHeaderCache | self._validation_results: list[tuple[str, str, bool, str]] = [] def validate_remote_url(self, url: str) -> tuple[bool, str]: - """Validate a remote URL with per-URL authentication. - - Generates authentication headers specific to this URL (GitHub vs GitLab) - and attempts validation using HEAD request, then Range request as fallback. - When an auth_cache is provided, uses cached headers to avoid redundant - resolution and populates the cache for downstream consumers. + """Validate a remote URL, escalating to authentication only on 401/403/404. + + Implements the "try unauthenticated first; escalate only on HTTP + 401/403/404" contract, mirroring the download path in _fetch_url_core. + Non-authentication failures (SSL, DNS, 5xx, 416 Range Not Satisfiable, + timeouts) do NOT trigger authentication prompts. + + Flow: + Phase 1: GitLab URL conversion (web -> API) for raw URLs. + Phase 2: Cache lookup -- reuse cached headers if the origin has + been probed before (None sentinel = known public origin, + dict = resolved authentication). + Phase 3: Initial probe using unauthenticated headers (or cached + headers when available). HEAD first, Range as fallback. + On success, populate the cache with the headers that + worked (None on a fresh unauthenticated success). + Phase 4: Auth escalation -- triggered ONLY when the last probe's + HTTP code is in (401, 403, 404). For 404 on GitHub URLs, + first probe api.github.com/repos/{owner}/{repo}: if the + repo is confirmed public, the 404 is a genuine missing + file and no auth prompt is needed. Otherwise (private, + nonexistent, or rate-limited), fall through to normal + escalation. Uses AuthHeaderCache.resolve_and_cache(url) + when a cache is available (double-checked locking + serializes prompts across parallel validation threads); + otherwise calls get_auth_headers(url, self.auth_param) + directly. resolve_and_cache returns {} (not None) when + no credentials are available -- treat this as terminal + and do not retry. Executes at most once per call. + Phase 5: Non-auth failure -- return (False, 'None') without + invoking get_auth_headers on 5xx, SSL, DNS, 416, or + None codes. Args: - url: Remote URL to validate + url: Remote URL to validate. Returns: - Tuple of (is_valid, method_used) - method_used is 'HEAD', 'Range', or 'None' + Tuple of (is_valid, method_used). + method_used is 'HEAD' or 'Range' on success, 'None' on failure. """ # ARCHITECTURAL NOTE: URL domain asymmetry between validation and download # @@ -3062,33 +3088,107 @@ def validate_remote_url(self, url: str) -> tuple[bool, str]: # cache key (github.com/owner/repo), so auth cached during validation # is correctly reused during download. # - # Convert GitLab web URLs to API format + # Phase 1: Convert GitLab web URLs to API format original_url = url if detect_repo_type(url) == 'gitlab' and '/-/raw/' in url: url = convert_gitlab_url_to_api(url) if url != original_url: info(f'Using API URL for validation: {url}') - # Generate auth headers for THIS specific URL, using cache when available + # Phase 2: Cache lookup. is_cached=True means the origin has already + # been probed; cached_headers may be None (public sentinel) or a dict + # (resolved authentication headers). + initial_headers: dict[str, str] | None = None + origin_cached = False if self.auth_cache is not None: is_cached, cached_headers = self.auth_cache.get_cached_headers(url) if is_cached: - auth_headers = cached_headers - else: - auth_headers = get_auth_headers(url, self.auth_param) - # Cache the result for the download phase - self.auth_cache.cache_headers(url, auth_headers or None) - else: - auth_headers = get_auth_headers(url, self.auth_param) - - # Try HEAD request first - if self._check_with_head(url, auth_headers): + initial_headers = cached_headers + origin_cached = True + + # Phase 3: Initial probe -- unauthenticated (or with cached headers). + head_valid, head_code = self._check_with_head(url, initial_headers) + if head_valid: + if self.auth_cache is not None and not origin_cached: + self.auth_cache.cache_headers(url, initial_headers) return (True, 'HEAD') - # Fallback to Range request - if self._check_with_range(url, auth_headers): + range_valid, range_code = self._check_with_range(url, initial_headers) + if range_valid: + if self.auth_cache is not None and not origin_cached: + self.auth_cache.cache_headers(url, initial_headers) return (True, 'Range') + # Phase 4: Auth escalation -- ONLY on 401/403/404 from the last probe. + # Prefer the Range code (last attempt); fall back to HEAD's code when + # Range returned None (e.g., a non-HTTP error). + last_http_code = range_code if range_code is not None else head_code + + if last_http_code in (401, 403, 404): + # If we already used cached auth headers (non-empty dict) and still + # got 401/403/404, the authentication failed -- re-prompting would + # loop. + if initial_headers: + return (False, 'None') + + # 404 disambiguation for GitHub URLs: probe api.github.com/repos/ + # {owner}/{repo} unauthenticated. If the repo is confirmed public, + # the original 404 is a genuine missing file (typo) and no auth + # prompt is needed. Other outcomes (private, nonexistent, + # rate-limited, network failure) fall through to normal escalation + # (conservative: prompt for auth). + if last_http_code == 404 and detect_repo_type(url) == 'github': + owner_repo = _extract_github_owner_repo(url) + if owner_repo is not None: + owner, repo = owner_repo + repo_public = _github_repo_is_public(owner, repo) + if repo_public is True: + info( + f'GitHub repo {owner}/{repo} is public; ' + f'404 indicates missing file (not auth issue): {url}', + ) + return (False, 'None') + if repo_public is None: + warning( + f'Could not verify public status of {owner}/{repo}; ' + f'proceeding with auth prompt', + ) + + # Resolve authentication. When a cache is available, route through + # resolve_and_cache for thread-safe double-checked locking across + # parallel validation threads. Note: resolve_and_cache returns {} + # (empty dict), not None, when no credentials are available from + # any source -- treat this as terminal. + if self.auth_cache is not None: + resolved_headers = self.auth_cache.resolve_and_cache(url) + if not resolved_headers: + return (False, 'None') + auth_headers: dict[str, str] | None = resolved_headers + else: + auth_headers = get_auth_headers(url, self.auth_param) + if not auth_headers: + return (False, 'None') + + # Retry probe with resolved auth headers (at most once). + head_valid, _head_code = self._check_with_head(url, auth_headers) + if head_valid: + if self.auth_cache is not None: + # resolve_and_cache already populated the cache; this + # defensive re-cache is a no-op in that case but keeps + # the non-cache path correct. + self.auth_cache.cache_headers(url, auth_headers) + return (True, 'HEAD') + + range_valid, _range_code = self._check_with_range(url, auth_headers) + if range_valid: + if self.auth_cache is not None: + self.auth_cache.cache_headers(url, auth_headers) + return (True, 'Range') + + # Authenticated retry also failed. + return (False, 'None') + + # Phase 5: Non-auth failure (5xx, SSL, DNS, 416, None). Do NOT prompt. return (False, 'None') def validate_local_path(self, path: str) -> tuple[bool, str]: @@ -3119,7 +3219,7 @@ def validate(self, url_or_path: str, is_remote: bool) -> tuple[bool, str]: return self.validate_remote_url(url_or_path) return self.validate_local_path(url_or_path) - def _check_with_head(self, url: str, auth_headers: dict[str, str] | None) -> bool: + def _check_with_head(self, url: str, auth_headers: dict[str, str] | None) -> tuple[bool, int | None]: """Check URL availability using HEAD request. Args: @@ -3127,7 +3227,15 @@ def _check_with_head(self, url: str, auth_headers: dict[str, str] | None) -> boo auth_headers: Optional authentication headers Returns: - True if file is accessible, False otherwise + Tuple of (is_valid, http_code). + is_valid=True with http_code=200 on success (including after SSL fallback). + is_valid=False with http_code= on HTTP errors (401/403/404/416/5xx). + is_valid=False with http_code=None on non-HTTP errors (SSL without fallback, + DNS, URL errors, or other exceptions). + + The http_code is used by callers to decide whether to escalate to + authentication (401/403/404) versus treating the failure as non-auth + (5xx, None, etc.). """ try: request = Request(url, method='HEAD') @@ -3137,20 +3245,29 @@ def _check_with_head(self, url: str, auth_headers: dict[str, str] | None) -> boo try: response = urlopen(request) - return bool(response.status == 200) + status = response.status + return (bool(status == 200), status) + except urllib.error.HTTPError as http_err: + return (False, http_err.code) except urllib.error.URLError as e: if 'SSL' in str(e) or 'certificate' in str(e).lower(): # Try with unverified SSL context ctx = ssl.create_default_context() ctx.check_hostname = False ctx.verify_mode = ssl.CERT_NONE - response = urlopen(request, context=ctx) - return bool(response.status == 200) - return False - except (urllib.error.HTTPError, Exception): - return False + try: + response = urlopen(request, context=ctx) + status = response.status + return (bool(status == 200), status) + except urllib.error.HTTPError as http_err: + return (False, http_err.code) + except Exception: + return (False, None) + return (False, None) + except Exception: + return (False, None) - def _check_with_range(self, url: str, auth_headers: dict[str, str] | None) -> bool: + def _check_with_range(self, url: str, auth_headers: dict[str, str] | None) -> tuple[bool, int | None]: """Check URL availability using Range request. Args: @@ -3158,7 +3275,16 @@ def _check_with_range(self, url: str, auth_headers: dict[str, str] | None) -> bo auth_headers: Optional authentication headers Returns: - True if file is accessible, False otherwise + Tuple of (is_valid, http_code). + is_valid=True with http_code=200 (full content) or 206 (partial content) + on success (including after SSL fallback). + is_valid=False with http_code= on HTTP errors (including 416 + Range Not Satisfiable, which is a non-auth-related failure). + is_valid=False with http_code=None on non-HTTP errors. + + The http_code is used by callers to decide whether to escalate to + authentication (401/403/404) versus treating the failure as non-auth + (5xx, 416, None, etc.). """ try: request = Request(url) @@ -3169,19 +3295,29 @@ def _check_with_range(self, url: str, auth_headers: dict[str, str] | None) -> bo try: response = urlopen(request) - # Accept both 200 (full content) and 206 (partial content) - return response.status in (200, 206) + status = response.status + is_valid = status in (200, 206) + return (is_valid, status) + except urllib.error.HTTPError as http_err: + return (False, http_err.code) except urllib.error.URLError as e: if 'SSL' in str(e) or 'certificate' in str(e).lower(): # Try with unverified SSL context ctx = ssl.create_default_context() ctx.check_hostname = False ctx.verify_mode = ssl.CERT_NONE - response = urlopen(request, context=ctx) - return response.status in (200, 206) - return False - except (urllib.error.HTTPError, Exception): - return False + try: + response = urlopen(request, context=ctx) + status = response.status + is_valid = status in (200, 206) + return (is_valid, status) + except urllib.error.HTTPError as http_err: + return (False, http_err.code) + except Exception: + return (False, None) + return (False, None) + except Exception: + return (False, None) @property def results(self) -> list[tuple[str, str, bool, str]]: @@ -3450,25 +3586,51 @@ def is_binary_file(file_path: str | Path) -> bool: def detect_repo_type(url: str) -> str | None: - """Detect the repository type from URL. + """Detect the repository type from URL using hostname-based classification. + + Uses urllib.parse.urlparse to extract the hostname and classify the URL by + exact host match, host suffix, or URL path marker. GitHub Pages URLs + (hostname ending in .github.io) are explicitly excluded from the 'github' + classification because Pages are static HTTP hosts with no repository + auth model. + + Args: + url: URL to classify. Returns: - 'gitlab' for GitLab URLs - 'github' for GitHub URLs - None for other URLs + 'gitlab' for gitlab.com, self-hosted *.gitlab.* hosts, or URLs using + the /api/v4/projects/ path marker. + 'github' for github.com, *.github.com (including api.github.com), and + raw.githubusercontent.com. Does NOT include *.github.io (Pages). + 'bitbucket' for hosts containing 'bitbucket'. + None for all other URLs, including GitHub Pages URLs, gist.githubusercontent.com, + and unparseable URLs. """ - url_lower = url.lower() + from urllib.parse import urlparse - # GitLab detection (including self-hosted) - if 'gitlab' in url_lower or '/api/v4/projects/' in url: - return 'gitlab' + try: + parsed = urlparse(url) + except ValueError: + return None + + host = (parsed.hostname or '').lower() - # GitHub detection - includes raw.githubusercontent.com - if 'github.com' in url_lower or 'api.github.com' in url_lower or 'raw.githubusercontent.com' in url_lower: + # GitHub Pages: .github.io or bare github.io. + # Excluded from 'github' because Pages sites use no repository auth model. + if host == 'github.io' or host.endswith('.github.io'): + return None + + # GitHub source hosts: github.com, api.github.com, raw.githubusercontent.com. + if host == 'github.com' or host.endswith('.github.com') or host == 'raw.githubusercontent.com': return 'github' - # Bitbucket detection (future expansion) - if 'bitbucket' in url_lower: + # GitLab: hostname substring match (covers gitlab.com and self-hosted) OR + # API path marker (covers any host exposing the GitLab REST API). + if 'gitlab' in host or '/api/v4/projects/' in url: + return 'gitlab' + + # Bitbucket: hostname substring match (future expansion; preserves prior behavior). + if 'bitbucket' in host: return 'bitbucket' return None @@ -3684,79 +3846,233 @@ def convert_github_raw_to_api(url: str) -> str: return url -def get_auth_headers(url: str, auth_param: str | None = None) -> dict[str, str]: - """Get authentication headers using multiple fallback methods. +def _extract_github_owner_repo(url: str) -> tuple[str, str] | None: + """Extract (owner, repo) from any GitHub URL variant. - Precedence order: - 1. Command-line --auth parameter - 2. Environment variables (GITLAB_TOKEN, GITHUB_TOKEN, REPO_TOKEN) - 3. Auth config file (~/.claude/auth.yaml) - future expansion - 4. Interactive prompt (if terminal is interactive) + Supports github.com web URLs, raw.githubusercontent.com URLs, and + api.github.com Contents API URLs. Returns None for unparseable URLs, + non-GitHub hosts, GitHub Pages URLs, or URLs lacking owner/repo path + components. Args: - url: The URL to authenticate for - auth_param: Optional auth parameter in format "header:value" or "header=value" + url: Candidate GitHub URL. Returns: - Dictionary of headers to use for authentication + Tuple (owner, repo) for recognized GitHub URLs with at least + owner/repo path segments; None otherwise. + + Examples: + >>> _extract_github_owner_repo('https://github.com/owner/repo') + ('owner', 'repo') + >>> _extract_github_owner_repo('https://raw.githubusercontent.com/owner/repo/main/file.md') + ('owner', 'repo') + >>> _extract_github_owner_repo('https://api.github.com/repos/owner/repo/contents/x.md?ref=main') + ('owner', 'repo') + >>> _extract_github_owner_repo('https://github.io/x') is None + True + """ + try: + parsed = urllib.parse.urlparse(url) + except ValueError: + return None + + host = (parsed.hostname or '').lower() + parts = [p for p in parsed.path.split('/') if p] + + # api.github.com/repos/{owner}/{repo}/... + if host == 'api.github.com': + if len(parts) >= 3 and parts[0] == 'repos': + return (parts[1], parts[2]) + return None + + # raw.githubusercontent.com/{owner}/{repo}/{ref}/... + if host == 'raw.githubusercontent.com': + if len(parts) >= 2: + return (parts[0], parts[1]) + return None + + # github.com/{owner}/{repo}/... (web URL, supports tree/blob/etc subpaths) + if host == 'github.com' or host.endswith('.github.com'): + if len(parts) >= 2: + return (parts[0], parts[1]) + return None + + # GitHub Pages and all other hosts: not a source repo URL. + return None + + +def _github_repo_is_public(owner: str, repo: str, *, timeout: float = 5.0) -> bool | None: + """Probe api.github.com/repos/{owner}/{repo} unauthenticated to check repo visibility. + + Used to disambiguate a 404 response on a GitHub file URL: when the bare-repo + endpoint returns 200, the original 404 is a genuine missing file (typo) and + no auth prompt is needed. When the bare-repo endpoint returns 404, the + repo is either private (privacy-hidden by GitHub) or genuinely missing -- + in either case the caller should conservatively escalate to the auth + prompt (legitimate for the private case, an unavoidable papercut otherwise). + + Endpoint choice (api.github.com/repos/{owner}/{repo}): + GitHub intentionally returns 404 for both "private repo hidden" and + "missing file" on the /contents/{path} endpoint, making them + indistinguishable. The bare repo endpoint distinguishes: + 200 -> public repo exists -> a 404 on a file path is a genuine + typo/missing file + 404 -> ambiguous (private or nonexistent); the caller should + escalate to auth (legitimate for the private case) + + Args: + owner: GitHub repository owner (user or org). + repo: GitHub repository name. + timeout: Maximum seconds to wait for the probe HTTP call. + + Returns: + True if the repo is confirmed public-existing (HTTP 200). + False if the repo is private or does not exist (HTTP 404). + None on any other condition (rate-limit 403/429, network failure, + timeout, malformed response). Caller should treat None as + "unable to determine -- conservatively escalate". + + Note: + This helper deliberately bypasses _fetch_url_core and AuthHeaderCache + to avoid recursion (the validator may call this DURING auth resolution + for a different URL on the same origin). It uses a raw urllib Request + + urlopen with the documented GitHub REST API headers. + """ + api_url = f'https://api.github.com/repos/{owner}/{repo}' + request = Request(api_url) + request.add_header('Accept', 'application/vnd.github+json') + request.add_header('X-GitHub-Api-Version', '2022-11-28') + try: + with urlopen(request, timeout=timeout) as response: + return getattr(response, 'status', None) == 200 + except urllib.error.HTTPError as e: + if e.code == 404: + return False + return None + except (urllib.error.URLError, TimeoutError, OSError): + return None + + +def _env_tokens_checked_for_repo_type(repo_type: str | None) -> list[str]: + """Return the ordered list of environment variable names to check for a repo type. + + Used by resolve_credentials (to determine env var lookup order) and by + prompt_for_credentials (to inform the user which env vars were checked). + + Args: + repo_type: Value returned by detect_repo_type (e.g., 'github', 'gitlab', 'bitbucket', None). + + Returns: + Ordered list of env var names to check, repo-specific name first then REPO_TOKEN fallback. + Unknown or None repo types return just ['REPO_TOKEN']. + """ + if repo_type == 'gitlab': + return ['GITLAB_TOKEN', 'REPO_TOKEN'] + if repo_type == 'github': + return ['GITHUB_TOKEN', 'REPO_TOKEN'] + return ['REPO_TOKEN'] + + +def resolve_credentials(url: str, auth_param: str | None = None) -> dict[str, str]: + """Resolve authentication headers for a URL from non-interactive sources. + + Checks two sources in order: + 1. Command-line auth_param (format "header:value", "header=value", or bare token) + 2. Environment variables (GITLAB_TOKEN, GITHUB_TOKEN, REPO_TOKEN) + + This function is pure and non-interactive: it never prompts the user, never + emits 'Authentication required' warnings, and never waits for terminal input. + Callers that want interactive fallback should invoke prompt_for_credentials() afterward. + + Args: + url: URL being authenticated (used only for repo type detection). + auth_param: Optional auth parameter from command-line (format "header:value", + "header=value", or a bare token). + + Returns: + Dictionary of HTTP headers for authentication. Empty dict {} if no credentials + are resolvable from the non-interactive sources. """ repo_type = detect_repo_type(url) - # Helper function to build GitHub headers with Accept and API version + # Helper: build GitHub Authorization header with Bearer prefix. def build_github_headers(token: str) -> dict[str, str]: # Handle Bearer prefix - avoid duplication if already present auth_value = token if token.startswith('Bearer ') else f'Bearer {token}' return {'Authorization': auth_value} - # Method 1: Command-line parameter (highest priority) + # Method 1: Command-line parameter (highest priority). if auth_param: # Support both : and = as separators if ':' in auth_param: header_name, token = auth_param.split(':', 1) - elif '=' in auth_param: + info('Using authentication from command-line parameter') + return {header_name: token} + if '=' in auth_param: header_name, token = auth_param.split('=', 1) - else: - # Assume it's just a token, use default header based on repo type - token = auth_param - if repo_type == 'gitlab': - header_name = 'PRIVATE-TOKEN' - elif repo_type == 'github': - info('Using authentication from command-line parameter') - return build_github_headers(token) - else: - error('Cannot determine auth header type. Use format: --auth "header:value"') - return {} - - info('Using authentication from command-line parameter') - return {header_name: token} - - # Method 2: Environment variables - tokens_checked: list[str] = [] + info('Using authentication from command-line parameter') + return {header_name: token} + # Bare token: use default header based on repo type. + token = auth_param + if repo_type == 'gitlab': + info('Using authentication from command-line parameter') + return {'PRIVATE-TOKEN': token} + if repo_type == 'github': + info('Using authentication from command-line parameter') + return build_github_headers(token) + error('Cannot determine auth header type. Use format: --auth "header:value"') + return {} - # Check repo-specific tokens first + # Method 2: Environment variables. if repo_type == 'gitlab': env_token = os.environ.get('GITLAB_TOKEN') - tokens_checked.append('GITLAB_TOKEN') if env_token: return {'PRIVATE-TOKEN': env_token} elif repo_type == 'github': env_token = os.environ.get('GITHUB_TOKEN') - tokens_checked.append('GITHUB_TOKEN') if env_token: return build_github_headers(env_token) - # Check generic REPO_TOKEN as fallback + # REPO_TOKEN fallback (only applied for known repo types). env_token = os.environ.get('REPO_TOKEN') - tokens_checked.append('REPO_TOKEN') if env_token: if repo_type == 'gitlab': return {'PRIVATE-TOKEN': env_token} if repo_type == 'github': return build_github_headers(env_token) - # Method 3: Interactive prompt (only if repo type detected and terminal is interactive) + return {} + + +def prompt_for_credentials(url: str, *, tokens_checked: list[str]) -> dict[str, str]: + """Prompt the user interactively for authentication credentials. + + Guarded by sys.stdin.isatty(). In non-interactive terminals, emits an + informational message (if a repo type is detected) and returns {} without + prompting. In interactive terminals with a detected repo type, warns the + user that authentication was not resolved, asks whether the user wants to + enter a token, and collects the token via getpass. + + Args: + url: URL being authenticated (used for repo type detection + header shape). + tokens_checked: Ordered list of env var names already checked by + resolve_credentials, used to inform the user of the fallback chain. + + Returns: + Dictionary of HTTP headers for authentication. Empty dict {} if the + terminal is non-interactive, the repo type is unknown, the user + declines, or the user cancels with Ctrl+C / EOF. + """ + repo_type = detect_repo_type(url) + + # Helper: build GitHub Authorization header with Bearer prefix. + def build_github_headers(token: str) -> dict[str, str]: + # Handle Bearer prefix - avoid duplication if already present + auth_value = token if token.startswith('Bearer ') else f'Bearer {token}' + return {'Authorization': auth_value} + if repo_type and sys.stdin.isatty(): - warning(f'Private {repo_type.title()} repository detected but no authentication found') + warning(f'Authentication required for {url}') info(f"Checked environment variables: {', '.join(tokens_checked)}") info('You can provide authentication by:') info(f' 1. Setting environment variable: {tokens_checked[0]}') @@ -3778,12 +4094,43 @@ def build_github_headers(token: str) -> dict[str, str]: print() # New line after Ctrl+C elif repo_type: # Non-interactive terminal but auth might be needed - info(f'Private {repo_type.title()} repository detected') + info(f'Authentication required for {url}') info(f"If authentication is required, set one of: {', '.join(tokens_checked)}") return {} +def get_auth_headers(url: str, auth_param: str | None = None) -> dict[str, str]: + """Resolve authentication headers for a URL, with interactive fallback. + + Orchestrates two-stage credential resolution: + 1. resolve_credentials() -- pure, non-interactive (CLI param, env vars) + 2. prompt_for_credentials() -- interactive, guarded by sys.stdin.isatty() + + Preserves the public-API signature used by callers and test mocks. Callers + that want non-interactive-only resolution can call resolve_credentials() + directly without risking user prompts. + + Args: + url: The URL to authenticate for. + auth_param: Optional auth parameter in format "header:value" or + "header=value" or a bare token. + + Returns: + Dictionary of HTTP headers for authentication, or {} if no credentials + could be resolved from any source (non-interactive and interactive). + """ + headers = resolve_credentials(url, auth_param) + if headers: + return headers + + # No creds resolved via non-interactive sources; attempt interactive prompt. + # The prompt is itself guarded by sys.stdin.isatty() and detects repo_type internally. + repo_type = detect_repo_type(url) + tokens_checked = _env_tokens_checked_for_repo_type(repo_type) + return prompt_for_credentials(url, tokens_checked=tokens_checked) + + def derive_base_url(config_source: str) -> str: """Derive base URL from a configuration source URL. diff --git a/tests/e2e/test_public_skills_no_prompt.py b/tests/e2e/test_public_skills_no_prompt.py new file mode 100644 index 0000000..25d768f --- /dev/null +++ b/tests/e2e/test_public_skills_no_prompt.py @@ -0,0 +1,255 @@ +"""E2E regression guard: public GitHub skill URLs must not trigger an auth prompt. + +Replays the user's exact failing scenario from the original bug report -- a mini +Playwright CLI skills configuration consumed via setup_environment.validate_all_config_files. +Mocks HTTP responses so the test is deterministic and network-independent. +""" + +from __future__ import annotations + +import sys +import urllib.error +from email.message import Message +from pathlib import Path +from typing import Any +from unittest.mock import MagicMock + +import pytest +import yaml + +from scripts import setup_environment + +PLAYWRIGHT_SKILLS_BASE = 'https://github.com/microsoft/playwright-cli/tree/main/skills/playwright-cli' + +PLAYWRIGHT_SKILL_FILES = [ + 'SKILL.md', + 'references/element-attributes.md', + 'references/playwright-tests.md', + 'references/request-mocking.md', + 'references/running-code.md', + 'references/session-management.md', + 'references/storage-state.md', + 'references/test-generation.md', + 'references/tracing.md', + 'references/video-recording.md', +] + + +@pytest.fixture +def public_github_skill_config_yaml() -> str: + """Return the YAML text from the user's exact failing scenario.""" + files_yaml = '\n'.join(f' - {name}' for name in PLAYWRIGHT_SKILL_FILES) + return ( + 'name: Playwright CLI with Skills\n' + '\n' + 'install-nodejs: true\n' + '\n' + 'dependencies:\n' + ' common:\n' + ' - npm install -g @playwright/cli@latest\n' + '\n' + 'skills:\n' + ' - name: playwright-cli\n' + f' base: {PLAYWRIGHT_SKILLS_BASE}\n' + ' files:\n' + f'{files_yaml}\n' + ) + + +@pytest.fixture +def public_github_skill_config( + tmp_path: Path, + public_github_skill_config_yaml: str, +) -> tuple[dict[str, Any], Path]: + """Write the failing-scenario YAML to tmp_path and return (parsed_config, path).""" + config_path = tmp_path / 'add-playwright-cli.yaml' + config_path.write_text(public_github_skill_config_yaml, encoding='utf-8') + config = yaml.safe_load(public_github_skill_config_yaml) + assert isinstance(config, dict) + return (config, config_path) + + +class TestPublicSkillsNoPrompt: + """Replay the user's exact failing scenario from the original bug report.""" + + def test_public_skill_urls_validate_without_prompt( + self, + public_github_skill_config: tuple[dict[str, Any], Path], + e2e_isolated_home: dict[str, Path], + monkeypatch: pytest.MonkeyPatch, + capsys: pytest.CaptureFixture[str], + ) -> None: + """Public microsoft/playwright-cli skill URLs must validate without any auth prompt. + + The lazy-auth contract, 404 disambiguation, and wording changes must together + make the scenario succeed silently. Assertions: + + 1. validate_all_config_files returns all_valid=True with no FAIL results. + 2. get_auth_headers is never called (no credential resolution attempted). + 3. input() is never called (no interactive prompt). + 4. Neither the old warning wording ('Private Github repository detected...') + nor the new warning wording ('Authentication required for ...') appears + in stdout/stderr -- public URLs must be fully silent on auth matters. + 5. The y/N prompt string is absent from output. + + The test mocks all HTTP I/O so it is deterministic and offline-capable. + """ + del e2e_isolated_home # Fixture used only for isolation side effects. + + config, config_path = public_github_skill_config + + # Mock validation probes (HEAD + Range) to return 200 (public reachable). + mock_response = MagicMock() + mock_response.status = 200 + mock_response.__enter__ = MagicMock(return_value=mock_response) + mock_response.__exit__ = MagicMock(return_value=False) + + monkeypatch.setattr( + setup_environment, + 'urlopen', + MagicMock(return_value=mock_response), + ) + + # Simulate interactive terminal -- the user's failure happened in an + # interactive PowerShell session where the prompt appeared. + monkeypatch.setattr(sys.stdin, 'isatty', MagicMock(return_value=True)) + + # Guard: input() must NEVER be called for public URLs. + def _fail_on_input(*_args: Any, **_kwargs: Any) -> str: + raise AssertionError( + 'input() was called during public-URL validation -- ' + 'auth prompt must not fire for public GitHub repositories.', + ) + + monkeypatch.setattr('builtins.input', _fail_on_input) + + # Guard: get_auth_headers must NEVER be called for public URLs. + mock_get_auth = MagicMock( + side_effect=AssertionError( + 'get_auth_headers() was called during public-URL validation -- ' + 'credential resolution must not start for public GitHub repositories.', + ), + ) + monkeypatch.setattr(setup_environment, 'get_auth_headers', mock_get_auth) + + # Run the user's failing scenario verbatim. + all_valid, results = setup_environment.validate_all_config_files( + config, + str(config_path), + ) + + # Assertion 1: Validation succeeds for all skill files. + assert all_valid is True, f'Validation FAILED for public URLs: {results}' + skill_results = [r for r in results if r[0] == 'skill'] + assert len(skill_results) == len(PLAYWRIGHT_SKILL_FILES), ( + f'Expected {len(PLAYWRIGHT_SKILL_FILES)} skill results, got {len(skill_results)}: ' + f'{skill_results}' + ) + for file_type, _path, is_valid, method in skill_results: + assert file_type == 'skill' + assert is_valid is True + assert method in ('HEAD', 'Range') + + # Assertion 2: get_auth_headers was never called. + mock_get_auth.assert_not_called() + + # Assertions 3-5: No prompt wording, old or new. + captured = capsys.readouterr() + combined_output = captured.out + captured.err + forbidden_phrases = [ + 'Private Github repository detected', + 'Private GitHub repository detected', + 'Authentication required for', + 'Would you like to enter the token now', + 'Enter GitHub token', + 'Enter Github token', + ] + for phrase in forbidden_phrases: + assert phrase not in combined_output, ( + f'Forbidden phrase found in output for public URLs: {phrase!r}\n' + f'Captured output:\n{combined_output}' + ) + + def test_public_skill_urls_with_404_disambiguation( + self, + public_github_skill_config: tuple[dict[str, Any], Path], + e2e_isolated_home: dict[str, Path], + monkeypatch: pytest.MonkeyPatch, + capsys: pytest.CaptureFixture[str], + ) -> None: + """404 on public skill files must skip auth prompt via api.github.com probe. + + Complements the primary test: simulates a 404 on the file URLs (e.g., missing + reference file) AND a confirmed-public repository. The 404 disambiguation must + classify the 404 as a genuine missing file and return failure directly without + any auth prompt. + + This exercises the full disambiguation code path end-to-end. + """ + del e2e_isolated_home + + config, config_path = public_github_skill_config + + # Mock validation HEAD/Range probes to return 404 (files missing). + mock_404_error = urllib.error.HTTPError( + url=PLAYWRIGHT_SKILLS_BASE, + code=404, + msg='Not Found', + hdrs=Message(), + fp=None, + ) + + # Mock api.github.com/repos/microsoft/playwright-cli disambiguation probe as 200 + # (repo confirmed public). validate_remote_url must return (False, ...) after + # disambiguation, WITHOUT calling get_auth_headers. + monkeypatch.setattr( + setup_environment, + '_github_repo_is_public', + MagicMock(return_value=True), + ) + + # HEAD + Range both return 404 via urlopen raising HTTPError. + monkeypatch.setattr( + setup_environment, + 'urlopen', + MagicMock(side_effect=mock_404_error), + ) + + monkeypatch.setattr(sys.stdin, 'isatty', MagicMock(return_value=True)) + + def _fail_on_input(*_args: Any, **_kwargs: Any) -> str: + raise AssertionError('input() was called despite confirmed public repo') + + monkeypatch.setattr('builtins.input', _fail_on_input) + + mock_get_auth = MagicMock( + side_effect=AssertionError( + 'get_auth_headers() was called despite confirmed public repo', + ), + ) + monkeypatch.setattr(setup_environment, 'get_auth_headers', mock_get_auth) + + all_valid, _results = setup_environment.validate_all_config_files( + config, + str(config_path), + ) + + # Expected behavior: validation reports FAIL (files 404), but NO auth prompt. + assert all_valid is False, ( + 'Expected validation to report FAIL for 404 file URLs ' + '(the test focus is absence of auth prompt, not validity).' + ) + mock_get_auth.assert_not_called() + + # No auth prompt wording should appear. + captured = capsys.readouterr() + combined_output = captured.out + captured.err + forbidden_prompt_phrases = [ + 'Authentication required for', + 'Would you like to enter the token now', + ] + for phrase in forbidden_prompt_phrases: + assert phrase not in combined_output, ( + f'Auth prompt phrase {phrase!r} appeared despite public-repo disambiguation.\n' + f'Captured output:\n{combined_output}' + ) diff --git a/tests/test_setup_environment_additional.py b/tests/test_setup_environment_additional.py index 3c81f39..d851ffd 100644 --- a/tests/test_setup_environment_additional.py +++ b/tests/test_setup_environment_additional.py @@ -214,6 +214,254 @@ def test_get_auth_headers_bearer_prefix_github(self): assert headers == {'Authorization': 'Bearer ghp_token'} +class TestCredentialsSRPSplit: + """Single-responsibility split of get_auth_headers into resolve_credentials + prompt_for_credentials. + + Verifies that resolve_credentials never prompts and prompt_for_credentials only + prompts when the terminal is interactive, and that the two together produce the + same results as the former monolithic get_auth_headers. + """ + + # --- resolve_credentials (6 tests) --- + + def test_resolve_credentials_auth_param_colon_gitlab(self): + """Colon-separated auth_param explicitly sets header and value for GitLab.""" + with patch.dict('os.environ', {}, clear=True): + headers = setup_environment.resolve_credentials( + 'https://gitlab.com/owner/repo', + 'PRIVATE-TOKEN:tok', + ) + assert headers == {'PRIVATE-TOKEN': 'tok'} + + def test_resolve_credentials_auth_param_equals_github(self): + """Equals-separated auth_param explicitly sets header and value for GitHub.""" + with patch.dict('os.environ', {}, clear=True): + headers = setup_environment.resolve_credentials( + 'https://github.com/owner/repo', + 'Authorization=Bearer tok', + ) + assert headers == {'Authorization': 'Bearer tok'} + + def test_resolve_credentials_bare_token_github(self): + """Bare token for GitHub URL is wrapped as Authorization: Bearer .""" + with patch.dict('os.environ', {}, clear=True): + headers = setup_environment.resolve_credentials( + 'https://github.com/owner/repo', + 'ghp_tok', + ) + assert headers == {'Authorization': 'Bearer ghp_tok'} + + @patch.dict('os.environ', {'GITHUB_TOKEN': 'env_tok'}, clear=True) + def test_resolve_credentials_env_github_token(self): + """GITHUB_TOKEN env var resolves to GitHub Authorization: Bearer header.""" + headers = setup_environment.resolve_credentials('https://github.com/owner/repo', None) + assert headers == {'Authorization': 'Bearer env_tok'} + + @patch.dict('os.environ', {'REPO_TOKEN': 'generic_tok'}, clear=True) + def test_resolve_credentials_env_repo_token_fallback_gitlab(self): + """REPO_TOKEN is used as a fallback when no repo-specific env var is set.""" + headers = setup_environment.resolve_credentials('https://gitlab.com/owner/repo', None) + assert headers == {'PRIVATE-TOKEN': 'generic_tok'} + + def test_resolve_credentials_never_prompts(self, monkeypatch): + """resolve_credentials NEVER calls input(), getpass.getpass(), or emits warnings.""" + + def fail_if_called(*_args, **_kwargs): + raise AssertionError('resolve_credentials triggered interactive I/O') + + monkeypatch.setattr('builtins.input', fail_if_called) + monkeypatch.setattr('getpass.getpass', fail_if_called) + monkeypatch.setattr('sys.stdin.isatty', lambda: True) + + with patch.dict('os.environ', {}, clear=True): + headers = setup_environment.resolve_credentials( + 'https://github.com/owner/repo', + None, + ) + assert headers == {} + + # --- prompt_for_credentials (5 tests) --- + + @patch('sys.stdin.isatty', return_value=False) + def test_prompt_for_credentials_non_interactive_returns_empty(self, mock_isatty): + """In non-interactive terminals, returns {} without calling input().""" + with patch('builtins.input') as mock_input: + headers = setup_environment.prompt_for_credentials( + 'https://github.com/owner/repo', + tokens_checked=['GITHUB_TOKEN', 'REPO_TOKEN'], + ) + assert headers == {} + mock_input.assert_not_called() + mock_isatty.assert_called() + + @patch('sys.stdin.isatty', return_value=True) + @patch('builtins.input', return_value='n') + def test_prompt_for_credentials_interactive_user_declines(self, mock_input, mock_isatty): + """When user answers 'n' to prompt, returns {}.""" + headers = setup_environment.prompt_for_credentials( + 'https://github.com/owner/repo', + tokens_checked=['GITHUB_TOKEN', 'REPO_TOKEN'], + ) + assert headers == {} + mock_input.assert_called_once() + assert mock_isatty.return_value is True + + @patch('sys.stdin.isatty', return_value=True) + @patch('builtins.input', return_value='y') + @patch('getpass.getpass', return_value='ghp_entered') + def test_prompt_for_credentials_interactive_accepts_github( + self, + mock_getpass, + mock_input, + mock_isatty, + ): + """Accepting the prompt and entering a GitHub token returns the Authorization header.""" + headers = setup_environment.prompt_for_credentials( + 'https://github.com/owner/repo', + tokens_checked=['GITHUB_TOKEN', 'REPO_TOKEN'], + ) + assert headers == {'Authorization': 'Bearer ghp_entered'} + mock_input.assert_called_once() + mock_getpass.assert_called_once() + assert mock_isatty.return_value is True + + @patch('sys.stdin.isatty', return_value=True) + @patch('builtins.input', return_value='y') + @patch('getpass.getpass', return_value='gl_entered') + def test_prompt_for_credentials_interactive_accepts_gitlab( + self, + mock_getpass, + mock_input, + mock_isatty, + ): + """Accepting the prompt and entering a GitLab token returns the PRIVATE-TOKEN header.""" + headers = setup_environment.prompt_for_credentials( + 'https://gitlab.com/owner/repo', + tokens_checked=['GITLAB_TOKEN', 'REPO_TOKEN'], + ) + assert headers == {'PRIVATE-TOKEN': 'gl_entered'} + mock_input.assert_called_once() + mock_getpass.assert_called_once() + assert mock_isatty.return_value is True + + @patch('sys.stdin.isatty', return_value=True) + @patch('builtins.input', side_effect=KeyboardInterrupt) + def test_prompt_for_credentials_ctrl_c_graceful(self, mock_input, mock_isatty): + """Ctrl+C during prompt returns {} without propagating KeyboardInterrupt.""" + headers = setup_environment.prompt_for_credentials( + 'https://github.com/owner/repo', + tokens_checked=['GITHUB_TOKEN', 'REPO_TOKEN'], + ) + assert headers == {} + mock_input.assert_called_once() + assert mock_isatty.return_value is True + + @patch('sys.stdin.isatty', return_value=True) + @patch('builtins.input', return_value='n') + def test_prompt_for_credentials_interactive_emits_url_bearing_warning( + self, + mock_input, + mock_isatty, + ): + """Interactive prompt emits 'Authentication required for {url}' warning with the URL.""" + with patch('setup_environment.warning') as mock_warning: + setup_environment.prompt_for_credentials( + 'https://github.com/owner/repo', + tokens_checked=['GITHUB_TOKEN', 'REPO_TOKEN'], + ) + + # Verify the new URL-bearing warning was emitted at least once with the full URL. + emitted_messages = [call.args[0] for call in mock_warning.call_args_list if call.args] + assert any(msg == 'Authentication required for https://github.com/owner/repo' for msg in emitted_messages), ( + f'Expected new URL-bearing wording; got: {emitted_messages}' + ) + # Verify the old wording is NOT emitted. + assert not any('Private' in msg and 'repository detected' in msg for msg in emitted_messages), ( + f'Old wording leaked into output: {emitted_messages}' + ) + mock_input.assert_called_once() + assert mock_isatty.return_value is True + + @patch('sys.stdin.isatty', return_value=False) + def test_prompt_for_credentials_non_interactive_emits_url_bearing_info(self, mock_isatty): + """Non-interactive prompt emits 'Authentication required for {url}' info with the URL.""" + with patch('setup_environment.info') as mock_info: + setup_environment.prompt_for_credentials( + 'https://gitlab.com/ns/proj', + tokens_checked=['GITLAB_TOKEN', 'REPO_TOKEN'], + ) + + emitted_messages = [call.args[0] for call in mock_info.call_args_list if call.args] + assert any(msg == 'Authentication required for https://gitlab.com/ns/proj' for msg in emitted_messages), ( + f'Expected new URL-bearing wording; got: {emitted_messages}' + ) + # Verify the old wording is NOT emitted. + assert not any('Private' in msg and 'repository detected' in msg for msg in emitted_messages), ( + f'Old wording leaked into output: {emitted_messages}' + ) + mock_isatty.assert_called() + + def test_prompt_for_credentials_no_repo_type_emits_nothing(self): + """Unknown repo type (detect_repo_type returns None): no warning, no info, no prompt.""" + with ( + patch('sys.stdin.isatty', return_value=True), + patch('builtins.input') as mock_input, + patch('setup_environment.warning') as mock_warning, + patch('setup_environment.info') as mock_info, + ): + headers = setup_environment.prompt_for_credentials( + 'https://example.com/some/path', + tokens_checked=['REPO_TOKEN'], + ) + + assert headers == {} + mock_input.assert_not_called() + mock_warning.assert_not_called() + mock_info.assert_not_called() + + # --- _env_tokens_checked_for_repo_type (2 tests) --- + + def test_env_tokens_checked_github(self): + """GitHub repo type lists GITHUB_TOKEN then REPO_TOKEN.""" + assert setup_environment._env_tokens_checked_for_repo_type('github') == [ + 'GITHUB_TOKEN', + 'REPO_TOKEN', + ] + + def test_env_tokens_checked_unknown(self): + """Unknown or None repo type lists only REPO_TOKEN.""" + assert setup_environment._env_tokens_checked_for_repo_type('bitbucket') == ['REPO_TOKEN'] + assert setup_environment._env_tokens_checked_for_repo_type(None) == ['REPO_TOKEN'] + + # --- Orchestrator delegation (2 tests) --- + + def test_get_auth_headers_delegates_resolve_first(self): + """When resolve_credentials returns non-empty, prompt_for_credentials is NOT called.""" + with ( + patch.object(setup_environment, 'resolve_credentials', return_value={'X': 'y'}) as mock_resolve, + patch.object(setup_environment, 'prompt_for_credentials') as mock_prompt, + ): + result = setup_environment.get_auth_headers('https://github.com/owner/repo', None) + assert result == {'X': 'y'} + mock_resolve.assert_called_once_with('https://github.com/owner/repo', None) + mock_prompt.assert_not_called() + + def test_get_auth_headers_escalates_to_prompt_when_resolve_empty(self): + """When resolve_credentials returns {}, prompt_for_credentials is invoked.""" + with ( + patch.object(setup_environment, 'resolve_credentials', return_value={}) as mock_resolve, + patch.object( + setup_environment, + 'prompt_for_credentials', + return_value={'X': 'y'}, + ) as mock_prompt, + ): + result = setup_environment.get_auth_headers('https://github.com/owner/repo', None) + assert result == {'X': 'y'} + mock_resolve.assert_called_once_with('https://github.com/owner/repo', None) + mock_prompt.assert_called_once() + + class TestBitbucketDetection: """Test Bitbucket repository detection.""" @@ -223,6 +471,67 @@ def test_detect_bitbucket_repo(self): assert setup_environment.detect_repo_type('https://api.bitbucket.org/2.0/repositories/') == 'bitbucket' +class TestDetectRepoTypeHostname: + """Hostname-based classification edge cases for detect_repo_type(). + + Verifies that detect_repo_type() uses hostname parsing (urllib.parse.urlparse) + rather than raw substring matching, with explicit exclusion of GitHub Pages + (*.github.io) hosts from 'github' classification. + """ + + def test_detect_repo_type_github_pages_excluded(self): + """GitHub Pages project URL with subpath should return None.""" + assert setup_environment.detect_repo_type( + 'https://microsoft.github.io/playwright-cli/skills/x.md', + ) is None + + def test_detect_repo_type_github_pages_root(self): + """GitHub Pages root URL should return None.""" + assert setup_environment.detect_repo_type('https://microsoft.github.io/') is None + + def test_detect_repo_type_apex_github_io(self): + """Apex github.io hostname (edge case) should return None.""" + assert setup_environment.detect_repo_type('https://github.io/x') is None + + def test_detect_repo_type_github_com_still_github(self): + """github.com URLs are still classified as 'github' after hardening.""" + assert setup_environment.detect_repo_type('https://github.com/owner/repo') == 'github' + + def test_detect_repo_type_raw_github_still_github(self): + """raw.githubusercontent.com URLs are still classified as 'github' after hardening.""" + assert setup_environment.detect_repo_type( + 'https://raw.githubusercontent.com/owner/repo/main/x.md', + ) == 'github' + + def test_detect_repo_type_api_github_still_github(self): + """api.github.com URLs are still classified as 'github' after hardening.""" + assert setup_environment.detect_repo_type( + 'https://api.github.com/repos/owner/repo/contents/x.md', + ) == 'github' + + def test_detect_repo_type_gitlab_still_gitlab(self): + """gitlab.com URLs are still classified as 'gitlab' after hardening.""" + assert setup_environment.detect_repo_type( + 'https://gitlab.com/ns/proj/-/raw/main/x.yaml', + ) == 'gitlab' + + def test_detect_repo_type_self_hosted_gitlab(self): + """Self-hosted GitLab hostname (gitlab.example.com) is classified as 'gitlab'.""" + assert setup_environment.detect_repo_type('https://gitlab.example.com/ns/p') == 'gitlab' + + def test_detect_repo_type_bitbucket(self): + """Bitbucket URLs are classified as 'bitbucket' after hardening.""" + assert setup_environment.detect_repo_type('https://bitbucket.org/owner/repo') == 'bitbucket' + + def test_detect_repo_type_other(self): + """Unknown hosts return None.""" + assert setup_environment.detect_repo_type('https://example.com/x') is None + + def test_detect_repo_type_malformed_url(self): + """Malformed input (non-URL string) returns None gracefully.""" + assert setup_environment.detect_repo_type('not a url') is None + + class TestConfigLoadingErrorPaths: """Test configuration loading error paths.""" diff --git a/tests/test_setup_environment_auth_cache.py b/tests/test_setup_environment_auth_cache.py index ef17251..9312096 100644 --- a/tests/test_setup_environment_auth_cache.py +++ b/tests/test_setup_environment_auth_cache.py @@ -260,13 +260,26 @@ class TestValidationPopulatesAuthCache: def test_validation_populates_cache_for_downloads( self, mock_get_auth: MagicMock, mock_urlopen: MagicMock, ) -> None: - """FileValidator with auth_cache populates it during validation.""" + """FileValidator with auth_cache populates it during validation. + + Drives the auth-escalation path: unauth probe returns 404, triggering + get_auth_headers via resolve_and_cache. The retry probe with auth + succeeds, and the cache is populated with the resolved auth headers. + """ + import urllib.error as _url_err + test_headers = {'Authorization': 'Bearer test-token'} mock_get_auth.return_value = test_headers - mock_response = MagicMock() - mock_response.status = 200 - mock_urlopen.return_value = mock_response + # First probe (HEAD unauth): 404; second probe (Range unauth): 404; + # third probe (HEAD auth): 200. + success_response = MagicMock() + success_response.status = 200 + mock_urlopen.side_effect = [ + _url_err.HTTPError('https://x', 404, 'Not Found', {}, None), + _url_err.HTTPError('https://x', 404, 'Not Found', {}, None), + success_response, + ] cache = AuthHeaderCache() validator = FileValidator(auth_param='test-token', auth_cache=cache) @@ -274,17 +287,27 @@ def test_validation_populates_cache_for_downloads( url = 'https://github.com/org/repo/blob/main/agent.md' validator.validate_remote_url(url) - # Cache should now contain the headers for this origin + # Cache should now contain the resolved auth headers for this origin. is_cached, cached_headers = cache.get_cached_headers(url) assert is_cached is True assert cached_headers == test_headers + @patch('setup_environment._github_repo_is_public', return_value=False) @patch('setup_environment.urlopen') @patch('setup_environment.get_auth_headers') def test_mixed_origins_cached_independently( - self, mock_get_auth: MagicMock, mock_urlopen: MagicMock, + self, mock_get_auth: MagicMock, mock_urlopen: MagicMock, mock_repo_public: MagicMock, ) -> None: - """Two URLs from different origins have independent cache entries.""" + """Two URLs from different origins have independent cache entries. + + Each origin drives the auth-escalation path: unauth probes return 404, + and the retry with auth succeeds. Both origins end up cached with + their respective auth headers. The GitHub visibility probe is mocked + to return False (ambiguous) so the auth-escalation path is exercised + rather than short-circuited by the 404 disambiguation. + """ + import urllib.error as _url_err + github_headers = {'Authorization': 'Bearer github-token'} gitlab_headers = {'PRIVATE-TOKEN': 'gitlab-token'} @@ -295,9 +318,17 @@ def mock_auth(url: str, _auth_param: str | None = None) -> dict[str, str]: mock_get_auth.side_effect = mock_auth - mock_response = MagicMock() - mock_response.status = 200 - mock_urlopen.return_value = mock_response + # For each URL: HEAD 404, Range 404, HEAD 200 (with auth). + success_response = MagicMock() + success_response.status = 200 + mock_urlopen.side_effect = [ + _url_err.HTTPError('https://x', 404, 'Not Found', {}, None), + _url_err.HTTPError('https://x', 404, 'Not Found', {}, None), + success_response, + _url_err.HTTPError('https://x', 404, 'Not Found', {}, None), + _url_err.HTTPError('https://x', 404, 'Not Found', {}, None), + success_response, + ] cache = AuthHeaderCache() validator = FileValidator(auth_param='token', auth_cache=cache) @@ -309,7 +340,7 @@ def mock_auth(url: str, _auth_param: str | None = None) -> dict[str, str]: validator.validate_remote_url(github_url) validator.validate_remote_url(gitlab_url) - # Both origins should be cached independently + # Both origins should be cached independently with their auth headers. is_cached_gh, gh_headers = cache.get_cached_headers(github_url) assert is_cached_gh is True assert gh_headers == github_headers @@ -318,6 +349,38 @@ def mock_auth(url: str, _auth_param: str | None = None) -> dict[str, str]: assert is_cached_gl is True assert gl_headers == gitlab_headers + # Visibility probe was invoked exactly once (for the GitHub URL only). + assert mock_repo_public.call_count == 1 + + @patch('setup_environment.urlopen') + @patch('setup_environment.get_auth_headers') + def test_validation_caches_none_on_public_success( + self, mock_get_auth: MagicMock, mock_urlopen: MagicMock, + ) -> None: + """Public URL (unauth 200) caches the None sentinel, bypassing auth. + + Verifies the lazy-auth contract: a successful unauthenticated probe + (HTTP 200) populates the cache with None (public-origin marker). + get_auth_headers MUST NOT be invoked for public URLs. + """ + success_response = MagicMock() + success_response.status = 200 + mock_urlopen.return_value = success_response + + cache = AuthHeaderCache() + validator = FileValidator(auth_param='unused-token', auth_cache=cache) + + url = 'https://github.com/org/public-repo/blob/main/agent.md' + validator.validate_remote_url(url) + + # get_auth_headers MUST NOT be called -- public URL validated unauth. + mock_get_auth.assert_not_called() + + # Cache MUST contain the None sentinel for this public origin. + is_cached, cached_headers = cache.get_cached_headers(url) + assert is_cached is True + assert cached_headers is None + class TestGitHubApiHeaders: """Test GitHub API transport headers on Request objects.""" diff --git a/tests/test_setup_environment_validation.py b/tests/test_setup_environment_validation.py index d7ca121..216a10d 100644 --- a/tests/test_setup_environment_validation.py +++ b/tests/test_setup_environment_validation.py @@ -3,6 +3,7 @@ import os import sys import tempfile +import threading import urllib.error from pathlib import Path from unittest.mock import MagicMock @@ -12,6 +13,7 @@ sys.path.insert(0, str(Path(__file__).parent.parent / 'scripts')) import setup_environment +from setup_environment import AuthHeaderCache from setup_environment import FileValidator @@ -30,7 +32,7 @@ def test_init_without_auth_param(self) -> None: @patch('setup_environment.urlopen') def test_check_with_head_success(self, mock_urlopen: MagicMock) -> None: - """Test HEAD request success.""" + """Test HEAD request success returns (True, 200).""" mock_response = MagicMock() mock_response.status = 200 mock_urlopen.return_value = mock_response @@ -38,7 +40,7 @@ def test_check_with_head_success(self, mock_urlopen: MagicMock) -> None: validator = FileValidator() result = validator._check_with_head('https://example.com/file.md', None) - assert result is True + assert result == (True, 200) mock_urlopen.assert_called_once() request = mock_urlopen.call_args[0][0] assert request.get_method() == 'HEAD' @@ -46,7 +48,7 @@ def test_check_with_head_success(self, mock_urlopen: MagicMock) -> None: @patch('setup_environment.urlopen') def test_check_with_head_with_auth(self, mock_urlopen: MagicMock) -> None: - """Test HEAD request with authentication headers.""" + """Test HEAD request with authentication headers returns (True, 200).""" mock_response = MagicMock() mock_response.status = 200 mock_urlopen.return_value = mock_response @@ -55,13 +57,13 @@ def test_check_with_head_with_auth(self, mock_urlopen: MagicMock) -> None: auth_headers = {'Authorization': 'Bearer token'} result = validator._check_with_head('https://example.com/file.md', auth_headers) - assert result is True + assert result == (True, 200) request = mock_urlopen.call_args[0][0] assert request.headers.get('Authorization') == 'Bearer token' @patch('setup_environment.urlopen') def test_check_with_head_not_found(self, mock_urlopen: MagicMock) -> None: - """Test HEAD request with 404.""" + """Test HEAD request with 404 returns (False, 404).""" mock_urlopen.side_effect = urllib.error.HTTPError( 'https://example.com/file.md', 404, @@ -73,11 +75,11 @@ def test_check_with_head_not_found(self, mock_urlopen: MagicMock) -> None: validator = FileValidator() result = validator._check_with_head('https://example.com/file.md', None) - assert result is False + assert result == (False, 404) @patch('setup_environment.urlopen') def test_check_with_head_ssl_error_retry(self, mock_urlopen: MagicMock) -> None: - """Test HEAD request with SSL error and successful retry.""" + """Test HEAD request with SSL error and successful retry returns (True, 200).""" mock_urlopen.side_effect = [ urllib.error.URLError('SSL: CERTIFICATE_VERIFY_FAILED'), MagicMock(status=200), @@ -86,25 +88,25 @@ def test_check_with_head_ssl_error_retry(self, mock_urlopen: MagicMock) -> None: validator = FileValidator() result = validator._check_with_head('https://example.com/file.md', None) - assert result is True + assert result == (True, 200) assert mock_urlopen.call_count == 2 second_call_context = mock_urlopen.call_args_list[1][1].get('context') assert second_call_context is not None @patch('setup_environment.urlopen') def test_check_with_head_non_ssl_error(self, mock_urlopen: MagicMock) -> None: - """Test HEAD request with non-SSL URLError.""" + """Test HEAD request with non-SSL URLError returns (False, None).""" mock_urlopen.side_effect = urllib.error.URLError('Connection refused') validator = FileValidator() result = validator._check_with_head('https://example.com/file.md', None) - assert result is False + assert result == (False, None) assert mock_urlopen.call_count == 1 @patch('setup_environment.urlopen') def test_check_with_range_success_200(self, mock_urlopen: MagicMock) -> None: - """Test successful Range request with 200 response.""" + """Test successful Range request with 200 response returns (True, 200).""" mock_response = MagicMock() mock_response.status = 200 mock_urlopen.return_value = mock_response @@ -112,13 +114,13 @@ def test_check_with_range_success_200(self, mock_urlopen: MagicMock) -> None: validator = FileValidator() result = validator._check_with_range('https://example.com/file.md', None) - assert result is True + assert result == (True, 200) request = mock_urlopen.call_args[0][0] assert request.headers.get('Range') == 'bytes=0-0' @patch('setup_environment.urlopen') def test_check_with_range_success_206(self, mock_urlopen: MagicMock) -> None: - """Test successful Range request with 206 partial content response.""" + """Test successful Range request with 206 partial content returns (True, 206).""" mock_response = MagicMock() mock_response.status = 206 mock_urlopen.return_value = mock_response @@ -126,11 +128,11 @@ def test_check_with_range_success_206(self, mock_urlopen: MagicMock) -> None: validator = FileValidator() result = validator._check_with_range('https://example.com/file.md', None) - assert result is True + assert result == (True, 206) @patch('setup_environment.urlopen') def test_check_with_range_with_auth(self, mock_urlopen: MagicMock) -> None: - """Test Range request with authentication headers.""" + """Test Range request with authentication headers returns (True, 206).""" mock_response = MagicMock() mock_response.status = 206 mock_urlopen.return_value = mock_response @@ -139,7 +141,7 @@ def test_check_with_range_with_auth(self, mock_urlopen: MagicMock) -> None: auth_headers = {'Authorization': 'Token abc123', 'X-Custom': 'value'} result = validator._check_with_range('https://example.com/file.md', auth_headers) - assert result is True + assert result == (True, 206) request = mock_urlopen.call_args[0][0] assert request.headers.get('Authorization') == 'Token abc123' assert request.headers.get('X-custom') == 'value' @@ -147,7 +149,7 @@ def test_check_with_range_with_auth(self, mock_urlopen: MagicMock) -> None: @patch('setup_environment.urlopen') def test_check_with_range_ssl_error_retry(self, mock_urlopen: MagicMock) -> None: - """Test Range request with SSL error and successful retry.""" + """Test Range request with SSL error and successful retry returns (True, 206).""" mock_urlopen.side_effect = [ urllib.error.URLError('certificate verify failed'), MagicMock(status=206), @@ -156,12 +158,12 @@ def test_check_with_range_ssl_error_retry(self, mock_urlopen: MagicMock) -> None validator = FileValidator() result = validator._check_with_range('https://example.com/file.md', None) - assert result is True + assert result == (True, 206) assert mock_urlopen.call_count == 2 @patch('setup_environment.urlopen') def test_check_with_range_http_error(self, mock_urlopen: MagicMock) -> None: - """Test Range request with HTTP error.""" + """Test Range request with HTTP error returns (False, 416).""" mock_urlopen.side_effect = urllib.error.HTTPError( 'https://example.com/file.md', 416, @@ -173,38 +175,46 @@ def test_check_with_range_http_error(self, mock_urlopen: MagicMock) -> None: validator = FileValidator() result = validator._check_with_range('https://example.com/file.md', None) - assert result is False + assert result == (False, 416) @patch('setup_environment.urlopen') def test_check_with_range_generic_exception(self, mock_urlopen: MagicMock) -> None: - """Test Range request with generic exception.""" + """Test Range request with generic exception returns (False, None).""" mock_urlopen.side_effect = Exception('Network error') validator = FileValidator() result = validator._check_with_range('https://example.com/file.md', None) - assert result is False + assert result == (False, None) @patch('setup_environment.get_auth_headers') + @patch.object(FileValidator, '_check_with_range') @patch.object(FileValidator, '_check_with_head') def test_validate_remote_url_generates_auth_per_url( self, mock_head: MagicMock, + mock_range: MagicMock, mock_auth: MagicMock, ) -> None: - """Test that validate_remote_url generates auth for the specific URL. + """Test that validate_remote_url generates auth for the specific URL on escalation. - CRITICAL: This is the core bug fix test - verify auth is generated for the - FILE URL, not the config source. + Verifies the lazy-auth contract: auth is invoked only after the initial + unauthenticated probe returns a 401/403/404 status. Mirrors the contract + honored by _fetch_url_core._do_fetch (lazy-auth gate on HTTP 401/403/404). + + CRITICAL: auth is generated for the FILE URL, not the config source. """ mock_auth.return_value = {'Authorization': 'Bearer token'} - mock_head.return_value = True + # Force escalation: initial unauth HEAD returns 404, Range returns 404; + # retry HEAD with auth succeeds. + mock_head.side_effect = [(False, 404), (True, 200)] + mock_range.return_value = (False, 404) validator = FileValidator(auth_param='my_token') result = validator.validate_remote_url('https://github.com/user/repo/file.md') assert result == (True, 'HEAD') - # CRITICAL: verify auth was generated for the FILE URL, not config source + # CRITICAL: auth must be generated for the FILE URL, not config source. mock_auth.assert_called_once_with( 'https://github.com/user/repo/file.md', 'my_token', @@ -218,8 +228,8 @@ def test_validate_remote_url_head_success( mock_head: MagicMock, ) -> None: """Test validation succeeds with HEAD request.""" - mock_head.return_value = True - mock_range.return_value = False + mock_head.return_value = (True, 200) + mock_range.return_value = (False, 500) validator = FileValidator() with patch('setup_environment.get_auth_headers', return_value=None): @@ -237,9 +247,9 @@ def test_validate_remote_url_fallback_to_range( mock_range: MagicMock, mock_head: MagicMock, ) -> None: - """Test validation falls back to Range when HEAD fails.""" - mock_head.return_value = False - mock_range.return_value = True + """Test validation falls back to Range when HEAD fails with non-auth code.""" + mock_head.return_value = (False, 500) + mock_range.return_value = (True, 206) validator = FileValidator() with patch('setup_environment.get_auth_headers', return_value=None): @@ -250,25 +260,28 @@ def test_validate_remote_url_fallback_to_range( mock_head.assert_called_once() mock_range.assert_called_once() + @patch('setup_environment.get_auth_headers') @patch.object(FileValidator, '_check_with_head') @patch.object(FileValidator, '_check_with_range') def test_validate_remote_url_both_fail( self, mock_range: MagicMock, mock_head: MagicMock, + mock_auth: MagicMock, ) -> None: - """Test validation fails when both methods fail.""" - mock_head.return_value = False - mock_range.return_value = False + """Test validation fails when both methods fail with non-auth codes.""" + mock_head.return_value = (False, 500) + mock_range.return_value = (False, 500) validator = FileValidator() - with patch('setup_environment.get_auth_headers', return_value=None): - is_valid, method = validator.validate_remote_url('https://example.com/file.md') + is_valid, method = validator.validate_remote_url('https://example.com/file.md') assert is_valid is False assert method == 'None' mock_head.assert_called_once() mock_range.assert_called_once() + # Non-auth failure codes MUST NOT trigger auth resolution. + mock_auth.assert_not_called() @patch('setup_environment.info') @patch('setup_environment.convert_gitlab_url_to_api') @@ -287,7 +300,7 @@ def test_validate_remote_url_gitlab_url_conversion( mock_detect.return_value = 'gitlab' mock_convert.return_value = gitlab_api_url - mock_head.return_value = True + mock_head.return_value = (True, 200) validator = FileValidator() with patch('setup_environment.get_auth_headers', return_value=None): @@ -361,6 +374,324 @@ def test_clear_results(self) -> None: assert len(validator.results) == 0 + @patch('setup_environment.get_auth_headers') + @patch.object(FileValidator, '_check_with_range') + @patch.object(FileValidator, '_check_with_head') + def test_public_url_succeeds_without_calling_get_auth_headers( + self, + mock_head: MagicMock, + mock_range: MagicMock, + mock_auth: MagicMock, + ) -> None: + """Public URL returning 200 unauth must NOT trigger auth resolution. + + Verifies the lazy-auth contract: successful unauthenticated probes + bypass get_auth_headers entirely. The auth_cache receives a None + sentinel (public-origin marker) for downstream consumers. + """ + + mock_head.return_value = (True, 200) + mock_range.return_value = (True, 200) # Would also succeed, but HEAD returns first. + + cache = AuthHeaderCache() + validator = FileValidator(auth_cache=cache) + result = validator.validate_remote_url('https://example.com/public/file.md') + + assert result == (True, 'HEAD') + # Auth MUST NOT be called for a public URL. + mock_auth.assert_not_called() + # Cache MUST contain the None sentinel for the public origin. + is_cached, cached_headers = cache.get_cached_headers('https://example.com/public/file.md') + assert is_cached is True + assert cached_headers is None + mock_range.assert_not_called() + + @patch('setup_environment.get_auth_headers') + @patch.object(FileValidator, '_check_with_range') + @patch.object(FileValidator, '_check_with_head') + def test_404_triggers_auth_escalation( + self, + mock_head: MagicMock, + mock_range: MagicMock, + mock_auth: MagicMock, + ) -> None: + """Unauth probe returning 404 must trigger auth resolution exactly once.""" + + resolved = {'Authorization': 'Bearer resolved-token'} + mock_auth.return_value = resolved + # Initial unauth: HEAD=404, Range=404. After resolution: HEAD=200. + mock_head.side_effect = [(False, 404), (True, 200)] + mock_range.return_value = (False, 404) + + cache = AuthHeaderCache(auth_param='test-token') + validator = FileValidator(auth_param='test-token', auth_cache=cache) + result = validator.validate_remote_url('https://github.com/org/repo/file.md') + + assert result == (True, 'HEAD') + # get_auth_headers called exactly once via resolve_and_cache. + mock_auth.assert_called_once_with('https://github.com/org/repo/file.md', 'test-token') + # Cache now contains the resolved headers for the origin. + is_cached, cached_headers = cache.get_cached_headers( + 'https://github.com/org/repo/file.md', + ) + assert is_cached is True + assert cached_headers == resolved + + @patch('setup_environment.get_auth_headers') + @patch.object(FileValidator, '_check_with_range') + @patch.object(FileValidator, '_check_with_head') + def test_401_triggers_auth_escalation( + self, + mock_head: MagicMock, + mock_range: MagicMock, + mock_auth: MagicMock, + ) -> None: + """Unauth probe returning 401 must trigger auth resolution.""" + mock_auth.return_value = {'Authorization': 'Bearer token'} + mock_head.side_effect = [(False, 401), (True, 200)] + mock_range.return_value = (False, 401) + + validator = FileValidator(auth_param='token') + result = validator.validate_remote_url('https://github.com/org/repo/file.md') + + assert result == (True, 'HEAD') + mock_auth.assert_called_once() + + @patch('setup_environment.get_auth_headers') + @patch.object(FileValidator, '_check_with_range') + @patch.object(FileValidator, '_check_with_head') + def test_403_triggers_auth_escalation( + self, + mock_head: MagicMock, + mock_range: MagicMock, + mock_auth: MagicMock, + ) -> None: + """Unauth probe returning 403 must trigger auth resolution.""" + mock_auth.return_value = {'Authorization': 'Bearer token'} + mock_head.side_effect = [(False, 403), (True, 200)] + mock_range.return_value = (False, 403) + + validator = FileValidator(auth_param='token') + result = validator.validate_remote_url('https://github.com/org/repo/file.md') + + assert result == (True, 'HEAD') + mock_auth.assert_called_once() + + @patch('setup_environment.get_auth_headers') + @patch.object(FileValidator, '_check_with_range') + @patch.object(FileValidator, '_check_with_head') + def test_500_does_not_prompt( + self, + mock_head: MagicMock, + mock_range: MagicMock, + mock_auth: MagicMock, + ) -> None: + """HTTP 5xx failures must NOT trigger auth resolution (no spurious prompts). + + Non-authentication failures such as 500 Internal Server Error are + transient network conditions, not authentication problems. The lazy-auth + contract escalates to authentication ONLY on 401/403/404. + """ + mock_head.return_value = (False, 500) + mock_range.return_value = (False, 500) + + validator = FileValidator(auth_param='token') + result = validator.validate_remote_url('https://github.com/org/repo/file.md') + + assert result == (False, 'None') + mock_auth.assert_not_called() + + @patch('setup_environment.get_auth_headers') + @patch.object(FileValidator, '_check_with_range') + @patch.object(FileValidator, '_check_with_head') + def test_ssl_failure_does_not_prompt( + self, + mock_head: MagicMock, + mock_range: MagicMock, + mock_auth: MagicMock, + ) -> None: + """SSL/DNS/URL errors (http_code=None) must NOT trigger auth resolution.""" + mock_head.return_value = (False, None) + mock_range.return_value = (False, None) + + validator = FileValidator(auth_param='token') + result = validator.validate_remote_url('https://nonexistent.example.com/file.md') + + assert result == (False, 'None') + mock_auth.assert_not_called() + + @patch('setup_environment.get_auth_headers') + @patch.object(FileValidator, '_check_with_range') + @patch.object(FileValidator, '_check_with_head') + def test_parallel_validation_does_not_double_prompt( + self, + mock_head: MagicMock, + mock_range: MagicMock, + mock_auth: MagicMock, + ) -> None: + """Two threads validating URLs from same origin must resolve auth only once. + + Verifies that AuthHeaderCache.resolve_and_cache double-checked locking + serializes authentication resolution across parallel validation threads, + preventing duplicate prompts observed in the regression. + """ + mock_auth.return_value = {'Authorization': 'Bearer shared-token'} + # Both threads see HEAD=404 first, then 200 after auth. + mock_head.side_effect = lambda _url, headers: ( + (True, 200) if headers else (False, 404) + ) + mock_range.side_effect = lambda _url, headers: ( + (True, 206) if headers else (False, 404) + ) + + cache = AuthHeaderCache(auth_param='shared-token') + validator = FileValidator(auth_param='shared-token', auth_cache=cache) + + results: list[tuple[bool, str]] = [] + results_lock = threading.Lock() + + def worker(url: str) -> None: + result = validator.validate_remote_url(url) + with results_lock: + results.append(result) + + urls = [ + 'https://github.com/org/repo/file1.md', + 'https://github.com/org/repo/file2.md', + ] + threads = [threading.Thread(target=worker, args=(u,)) for u in urls] + for t in threads: + t.start() + for t in threads: + t.join() + + assert len(results) == 2 + for r in results: + assert r[0] is True + # Auth MUST be resolved exactly ONCE via double-checked locking. + mock_auth.assert_called_once() + + @patch('setup_environment.get_auth_headers') + @patch.object(FileValidator, '_check_with_range') + @patch.object(FileValidator, '_check_with_head') + def test_auth_cache_empty_result_does_not_retry( + self, + mock_head: MagicMock, + mock_range: MagicMock, + mock_auth: MagicMock, + ) -> None: + """When resolve_and_cache returns {}, validation must terminate without retry. + + AuthHeaderCache.resolve_and_cache returns {} (empty dict, not None) when + no credentials are available from any source. Treat this as terminal -- + the retry probe must NOT be attempted. + """ + + mock_auth.return_value = {} # No credentials available from any source. + mock_head.return_value = (False, 404) + mock_range.return_value = (False, 404) + + cache = AuthHeaderCache(auth_param=None) + validator = FileValidator(auth_cache=cache) + result = validator.validate_remote_url('https://github.com/org/repo/file.md') + + assert result == (False, 'None') + # Initial unauth probes: HEAD + Range (2 probes total, no retry). + assert mock_head.call_count == 1 + assert mock_range.call_count == 1 + mock_auth.assert_called_once() + + @patch('setup_environment._github_repo_is_public', return_value=True) + @patch('setup_environment.get_auth_headers') + @patch.object(FileValidator, '_check_with_range') + @patch.object(FileValidator, '_check_with_head') + def test_404_on_public_github_repo_does_not_prompt( + self, + mock_head: MagicMock, + mock_range: MagicMock, + mock_auth: MagicMock, + mock_repo_public: MagicMock, + ) -> None: + """404 on a confirmed-public GitHub repo must skip the auth prompt. + + When the repo is verified public via api.github.com/repos/{owner}/{repo}, + the original 404 indicates a genuine missing file and validation must + return (False, 'None') without invoking get_auth_headers. + """ + mock_head.return_value = (False, 404) + mock_range.return_value = (False, 404) + + validator = FileValidator(auth_param='token') + result = validator.validate_remote_url( + 'https://raw.githubusercontent.com/owner/repo/main/missing.md', + ) + + assert result == (False, 'None') + mock_repo_public.assert_called_once_with('owner', 'repo') + mock_auth.assert_not_called() + + @patch('setup_environment._github_repo_is_public', return_value=False) + @patch('setup_environment.get_auth_headers') + @patch.object(FileValidator, '_check_with_range') + @patch.object(FileValidator, '_check_with_head') + def test_404_on_private_github_repo_prompts_as_normal( + self, + mock_head: MagicMock, + mock_range: MagicMock, + mock_auth: MagicMock, + mock_repo_public: MagicMock, + ) -> None: + """404 on a private/nonexistent GitHub repo must escalate to auth prompt. + + When _github_repo_is_public returns False (ambiguous: private or + nonexistent), the validator must fall through to the normal auth + escalation path (legitimate for the private case). + """ + resolved = {'Authorization': 'Bearer t'} + mock_auth.return_value = resolved + mock_head.side_effect = [(False, 404), (True, 200)] + mock_range.return_value = (False, 404) + + validator = FileValidator(auth_param='token') + result = validator.validate_remote_url( + 'https://github.com/owner/repo/blob/main/private.md', + ) + + assert result == (True, 'HEAD') + mock_repo_public.assert_called_once_with('owner', 'repo') + mock_auth.assert_called_once() + + @patch('setup_environment._github_repo_is_public', return_value=None) + @patch('setup_environment.get_auth_headers') + @patch.object(FileValidator, '_check_with_range') + @patch.object(FileValidator, '_check_with_head') + def test_404_on_unknown_repo_visibility_prompts_conservative( + self, + mock_head: MagicMock, + mock_range: MagicMock, + mock_auth: MagicMock, + mock_repo_public: MagicMock, + ) -> None: + """404 with rate-limited / network-failed visibility probe escalates. + + When _github_repo_is_public returns None (rate-limit, timeout, network + error), the validator must conservatively escalate to the auth prompt + (fail-safe behavior: auth prompt is legitimate when uncertain). + """ + resolved = {'Authorization': 'Bearer t'} + mock_auth.return_value = resolved + mock_head.side_effect = [(False, 404), (True, 200)] + mock_range.return_value = (False, 404) + + validator = FileValidator(auth_param='token') + result = validator.validate_remote_url( + 'https://raw.githubusercontent.com/owner/repo/main/file.md', + ) + + assert result == (True, 'HEAD') + mock_repo_public.assert_called_once_with('owner', 'repo') + mock_auth.assert_called_once() + class TestLocalConfigWithRemoteFiles: """Tests for the specific bug: local config with remote files requiring auth. @@ -370,10 +701,12 @@ class TestLocalConfigWithRemoteFiles: @patch('setup_environment.get_auth_headers') @patch('setup_environment.resolve_resource_path') + @patch.object(FileValidator, '_check_with_range') @patch.object(FileValidator, '_check_with_head') def test_local_config_with_github_files_uses_file_url_for_auth( self, mock_head: MagicMock, + mock_range: MagicMock, mock_resolve: MagicMock, mock_auth: MagicMock, ) -> None: @@ -384,13 +717,19 @@ def test_local_config_with_github_files_uses_file_url_for_auth( - File to validate: https://raw.githubusercontent.com/user/repo/main/agent.md - Expected: get_auth_headers called with GitHub URL - Bug behavior: get_auth_headers not called (config_source is local) + + Auth escalation is triggered by the lazy-auth contract: unauth probe + returns 404, which causes resolve_and_cache -> get_auth_headers to be + invoked for the FILE URL. """ mock_auth.return_value = {'Authorization': 'Bearer github_token'} mock_resolve.return_value = ( 'https://raw.githubusercontent.com/user/repo/main/agent.md', True, # is_remote ) - mock_head.return_value = True + # Force escalation: initial HEAD=404, then HEAD=200 after auth. + mock_head.side_effect = [(False, 404), (True, 200)] + mock_range.return_value = (False, 404) config = {'agents': ['agent.md']} @@ -409,10 +748,12 @@ def test_local_config_with_github_files_uses_file_url_for_auth( @patch('setup_environment.get_auth_headers') @patch('setup_environment.resolve_resource_path') + @patch.object(FileValidator, '_check_with_range') @patch.object(FileValidator, '_check_with_head') def test_local_config_with_gitlab_files_uses_file_url_for_auth( self, mock_head: MagicMock, + mock_range: MagicMock, mock_resolve: MagicMock, mock_auth: MagicMock, ) -> None: @@ -422,7 +763,9 @@ def test_local_config_with_gitlab_files_uses_file_url_for_auth( 'https://gitlab.com/user/repo/-/raw/main/agent.md', True, ) - mock_head.return_value = True + # Force escalation: initial HEAD=404, then HEAD=200 after auth. + mock_head.side_effect = [(False, 404), (True, 200)] + mock_range.return_value = (False, 404) config = {'agents': ['agent.md']} @@ -445,15 +788,22 @@ class TestMixedAuthScenarios: @patch('setup_environment.detect_repo_type') @patch('setup_environment.get_auth_headers') @patch('setup_environment.resolve_resource_path') + @patch.object(FileValidator, '_check_with_range') @patch.object(FileValidator, '_check_with_head') def test_mixed_github_and_gitlab_files( self, mock_head: MagicMock, + mock_range: MagicMock, mock_resolve: MagicMock, mock_auth: MagicMock, mock_detect: MagicMock, ) -> None: - """Test that different files get appropriate auth headers.""" + """Test that different files get appropriate auth headers. + + Two URLs from different origins both return 404 unauth, triggering + escalation to get_auth_headers for each origin. The lazy-auth + contract ensures per-URL auth resolution. + """ # Simulate different repo types def detect_side_effect(url: str) -> str | None: @@ -474,7 +824,16 @@ def auth_side_effect(url: str, _param: str | None) -> dict[str, str]: return {} mock_auth.side_effect = auth_side_effect - mock_head.return_value = True + + # Force escalation per URL: unauth 404, retry with auth succeeds. + def head_side_effect(_url: str, headers: dict[str, str] | None) -> tuple[bool, int | None]: + return (True, 200) if headers else (False, 404) + + def range_side_effect(_url: str, headers: dict[str, str] | None) -> tuple[bool, int | None]: + return (True, 206) if headers else (False, 404) + + mock_head.side_effect = head_side_effect + mock_range.side_effect = range_side_effect mock_resolve.side_effect = [ ('https://raw.githubusercontent.com/user/repo/main/agent1.md', True), @@ -491,7 +850,7 @@ def auth_side_effect(url: str, _param: str | None) -> dict[str, str]: assert all_valid is True assert len(results) == 2 - # Verify get_auth_headers was called twice with different URLs + # Verify get_auth_headers was called for each distinct origin. assert mock_auth.call_count == 2 @@ -507,10 +866,15 @@ def test_public_file_with_auth_token_provided( mock_resolve: MagicMock, mock_auth: MagicMock, ) -> None: - """Test that public files work even when auth token is provided.""" + """Test that public files work even when auth token is provided. + + With the lazy-auth contract, public URLs return 200 from the initial + unauthenticated probe and never escalate to get_auth_headers, even + when a token is provided by the user. + """ mock_auth.return_value = {'Authorization': 'Bearer token'} mock_resolve.return_value = ('https://example.com/public.md', True) - mock_head.return_value = True + mock_head.return_value = (True, 200) config = {'agents': ['public.md']} @@ -524,20 +888,25 @@ def test_public_file_with_auth_token_provided( @patch('setup_environment.get_auth_headers') @patch('setup_environment.resolve_resource_path') - @patch.object(FileValidator, '_check_with_head') @patch.object(FileValidator, '_check_with_range') + @patch.object(FileValidator, '_check_with_head') def test_private_file_without_auth_token( self, - mock_range: MagicMock, mock_head: MagicMock, + mock_range: MagicMock, mock_resolve: MagicMock, mock_auth: MagicMock, ) -> None: - """Test that private files fail gracefully without auth.""" - mock_auth.return_value = {} # No auth headers + """Test that private files fail gracefully without auth. + + Unauth probe returns 404; escalation attempts get_auth_headers which + returns {} (no credentials available). Validation terminates with + (False, 'None'). + """ + mock_auth.return_value = {} # No auth headers available mock_resolve.return_value = ('https://example.com/private.md', True) - mock_head.return_value = False - mock_range.return_value = False + mock_head.return_value = (False, 404) + mock_range.return_value = (False, 404) config = {'agents': ['private.md']} @@ -1071,3 +1440,78 @@ def test_resolve_resource_path_local_variations(self) -> None: ) assert resolved == str((tmpdir_path / 'file.md').resolve()) assert is_remote is False + + +class TestExtractGithubOwnerRepo: + """Tests for the _extract_github_owner_repo URL parsing helper.""" + + def test_extract_from_raw_url(self) -> None: + """raw.githubusercontent.com URL: extract from path[0:2].""" + result = setup_environment._extract_github_owner_repo( + 'https://raw.githubusercontent.com/owner/repo/main/file.md', + ) + assert result == ('owner', 'repo') + + def test_extract_from_api_url(self) -> None: + """api.github.com Contents API URL: extract path[1:3] after 'repos'.""" + result = setup_environment._extract_github_owner_repo( + 'https://api.github.com/repos/owner/repo/contents/file.md?ref=main', + ) + assert result == ('owner', 'repo') + + def test_extract_from_web_url(self) -> None: + """github.com web URL (with tree/blob subpath): extract path[0:2].""" + result = setup_environment._extract_github_owner_repo( + 'https://github.com/owner/repo/tree/main', + ) + assert result == ('owner', 'repo') + + +class TestGithubRepoIsPublic: + """Tests for the _github_repo_is_public visibility probe.""" + + @patch('setup_environment.urlopen') + def test_returns_true_on_200(self, mock_urlopen: MagicMock) -> None: + """HTTP 200 means repo is confirmed public.""" + mock_response = MagicMock() + mock_response.status = 200 + mock_response.__enter__ = MagicMock(return_value=mock_response) + mock_response.__exit__ = MagicMock(return_value=False) + mock_urlopen.return_value = mock_response + + assert setup_environment._github_repo_is_public('owner', 'repo') is True + request = mock_urlopen.call_args[0][0] + assert request.full_url == 'https://api.github.com/repos/owner/repo' + + @patch('setup_environment.urlopen') + def test_returns_false_on_404(self, mock_urlopen: MagicMock) -> None: + """HTTP 404 means repo is private or does not exist (ambiguous).""" + mock_urlopen.side_effect = urllib.error.HTTPError( + 'https://api.github.com/repos/owner/repo', + 404, + 'Not Found', + {}, + None, + ) + + assert setup_environment._github_repo_is_public('owner', 'repo') is False + + @patch('setup_environment.urlopen') + def test_returns_none_on_rate_limit(self, mock_urlopen: MagicMock) -> None: + """HTTP 403 (rate-limit) returns None for conservative escalation.""" + mock_urlopen.side_effect = urllib.error.HTTPError( + 'https://api.github.com/repos/owner/repo', + 403, + 'rate limit exceeded', + {}, + None, + ) + + assert setup_environment._github_repo_is_public('owner', 'repo') is None + + @patch('setup_environment.urlopen') + def test_returns_none_on_network_error(self, mock_urlopen: MagicMock) -> None: + """URLError / TimeoutError / OSError return None for conservative escalation.""" + mock_urlopen.side_effect = urllib.error.URLError('connection refused') + + assert setup_environment._github_repo_is_public('owner', 'repo') is None