Fix GitHub API endpoint selection based on host parsing#1850
Fix GitHub API endpoint selection based on host parsing#1850bc-lee wants to merge 5 commits intomicrosoft:mainfrom
Conversation
src/vcpkg/base/downloads.cpp
Outdated
| } | ||
|
|
||
| // Extracts the host part from a URL string. | ||
| std::string extract_host(StringView url) |
There was a problem hiding this comment.
This thing needs unit tests.
There was a problem hiding this comment.
URL parsing is an insanely complicated problem; a parsing function that just returns whether we get github.com or not may be easier. Otherwise we may want to look at using trurl.
There was a problem hiding this comment.
Thanks for the review. I’ll revisit this once the curl library integration has been re-landed.
src/vcpkg/base/downloads.cpp
Outdated
| const char* at_sign = Strings::find_first_of(url, "@"); | ||
| size_t at_pos = at_sign[0] == '\0' ? std::string::npos : static_cast<size_t>(at_sign - url.data()); |
There was a problem hiding this comment.
This is undefined behavior: you can't dereference the returned iterator if there was no match. And there is no requirement that there is a \0 after a StringView
src/vcpkg/base/downloads.cpp
Outdated
| } | ||
|
|
||
| // Remove userinfo if present (e.g., user:pass@host) | ||
| const char* at_sign = Strings::find_first_of(url, "@"); |
There was a problem hiding this comment.
Seems little reason to use find_first_of when there is only one char you're looking for?
|
Ah crap it looks like the URL APIs were added in 7.62.0 which is more recent than we currently assume. So we might be stuck using a handwritten thing like this or possibly copy/pasta-ing the relevant bits from libcurl's urlapi.c (with attribution) |
|
(I guess that most systems GitHub cares about would have greater than 7.62 so that may be fine) |
|
I reviewed your suggestion to use libcurl’s URL API for URL parsing.
While reading https://learn.microsoft.com/en-us/vcpkg/concepts/supported-hosts and https://github.com/microsoft/vcpkg-tool/blob/792f2dba1715f180412eef13b4de55f8085c82ca/cmake/FindLibCURL.cmake, I noticed that vcpkg-tool is currently too permissive about the minimum libcurl version it accepts. The libcurl URL API was introduced in 7.62.0 (released on 2018-10-31), and the lowest supported libcurl version on Linux is 7.74.0, which is already newer than 7.62.0. That means we should be able to use the libcurl URL API within vcpkg-tool’s supported host baseline. If you’re OK with that direction, I’d like to split this work into two PRs:
|
|
@ras0219-msft suggests (and @MahmoudGSaleh agrees): rather than doing the whole URI parsing, can we just check "if it's https://github.com" (just a string match) do the special github thing, otherwise do the existing behavior? That way we don't need to make the curl situation more complicated. (And I'm sorry for not noticing this before when I first reviewed this 😅) |
Previously, any provided GitHub URL was treated as a GitHub Enterprise Server (GHES) instance when selecting the API endpoint. This assumption was incorrect because the URL can also be for GitHub.com (e.g., https://github.com), which led to using the wrong API endpoint. This commit adds a function to extract the host from a URL. This is used to differentiate between GitHub.com and GHES instances to select the correct API endpoint.
79462ea to
da9910c
Compare
|
I simplified the logic significantly to align with your suggestion. Could you please take another look? |
src/vcpkg/base/downloads.cpp
Outdated
| { | ||
| if (*github_server_url == github_com_url) | ||
| { | ||
| uri = api_github_com_url.data(); |
There was a problem hiding this comment.
| uri = api_github_com_url.data(); | |
| uri.assign(api_github_com_url.data(), api_github_com_url.size()); |
src/vcpkg-test/downloads.cpp
Outdated
| REQUIRE(github_dependency_graph_snapshots_uri(Optional<std::string>{"https://github.com/"}, "owner/repo") == | ||
| "https://github.com//api/v3/repos/owner/repo/dependency-graph/snapshots"); |
There was a problem hiding this comment.
Why doesn't this one become api.? (Should it use starts with rather than ==?)
src/vcpkg/base/downloads.cpp
Outdated
| uri = *github_server_url; | ||
| uri.append("/api/v3"); |
There was a problem hiding this comment.
Should this defend against github_server_url ending with /?
src/vcpkg/base/downloads.cpp
Outdated
| } | ||
| else | ||
| { | ||
| uri = api_github_com_url.data(); |
There was a problem hiding this comment.
| uri = api_github_com_url.data(); | |
| uri.assign(api_github_com_url.data(), api_github_com_url.size()); |
|
As you commented earlier, I intentionally kept the change narrow because URL parsing is hard, so I initially avoided URL normalization. |
I guess I should have clarified: I wasn't asking you to normalize the URL. This is about a value that comes from GitHub. I was kind of expecting a 'well I tested and GitHub sets it to this'. (I'm mostly nervous about replacing one block of code that is apparently broken with a different block that may be broken...) |
|
Ah, I may have misunderstood your intent. You were asking for evidence of what GitHub actually sets, not for URL normalization. GitHub documents the Actions environment variables here: I verified this recently in a GitHub.com Actions run:
So for GitHub.com, I no longer have access to a GHES instance, although I did when I started working on this PR in 2025-11, so I cannot confidently confirm the current GHES values. |
Previously, when
GITHUB_SERVER_URLwas set, vcpkg always treated it as a GitHub Enterprise Server(GHES) endpoint and appended/api/v3forGITHUB_SERVER_URL=https://github.com, this produced an incorrect endpoint (https://github.com/api/v3/...).This change special-cases the exact value
https://github.comand useshttps://api.github.comfor dependency graph submission. All other server URLs keep the existing behavior (<server>/api/v3/...).Added tests for endpoint selection behavior in
src/vcpkg-test/downloads.cpp.Fixes microsoft/vcpkg#48347