Skip to content

Fix GitHub API endpoint selection based on host parsing#1850

Open
bc-lee wants to merge 5 commits intomicrosoft:mainfrom
bc-lee:fix-github-api-url
Open

Fix GitHub API endpoint selection based on host parsing#1850
bc-lee wants to merge 5 commits intomicrosoft:mainfrom
bc-lee:fix-github-api-url

Conversation

@bc-lee
Copy link
Copy Markdown

@bc-lee bc-lee commented Nov 17, 2025

Previously, when GITHUB_SERVER_URL was set, vcpkg always treated it as a GitHub Enterprise Server(GHES) endpoint and appended /api/v3 for GITHUB_SERVER_URL=https://github.com, this produced an incorrect endpoint (https://github.com/api/v3/...).

This change special-cases the exact value https://github.com and uses https://api.github.com for dependency graph submission. All other server URLs keep the existing behavior (<server>/api/v3/...).

Added tests for endpoint selection behavior in src/vcpkg-test/downloads.cpp.

Fixes microsoft/vcpkg#48347

}

// Extracts the host part from a URL string.
std::string extract_host(StringView url)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This thing needs unit tests.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

URL parsing is an insanely complicated problem; a parsing function that just returns whether we get github.com or not may be easier. Otherwise we may want to look at using trurl.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. I’ll revisit this once the curl library integration has been re-landed.

Comment on lines +54 to +55
const char* at_sign = Strings::find_first_of(url, "@");
size_t at_pos = at_sign[0] == '\0' ? std::string::npos : static_cast<size_t>(at_sign - url.data());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is undefined behavior: you can't dereference the returned iterator if there was no match. And there is no requirement that there is a \0 after a StringView

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto several times below

}

// Remove userinfo if present (e.g., user:pass@host)
const char* at_sign = Strings::find_first_of(url, "@");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems little reason to use find_first_of when there is only one char you're looking for?

@bc-lee bc-lee marked this pull request as draft November 25, 2025 02:37
@BillyONeal
Copy link
Copy Markdown
Member

Ah crap it looks like the URL APIs were added in 7.62.0 which is more recent than we currently assume. So we might be stuck using a handwritten thing like this or possibly copy/pasta-ing the relevant bits from libcurl's urlapi.c (with attribution)

@BillyONeal
Copy link
Copy Markdown
Member

(I guess that most systems GitHub cares about would have greater than 7.62 so that may be fine)

@bc-lee
Copy link
Copy Markdown
Author

bc-lee commented Feb 16, 2026

I reviewed your suggestion to use libcurl’s URL API for URL parsing.

  • curl’s urlapi.c is fairly complex and tightly coupled to other parts of curl. If the goal is to minimize drift from upstream, embedding that code in vcpkg-tool is still hard to read and maintain. If the goal is to minimize code size, we effectively end up rewriting URL parsing logic ourselves, which is risky given how subtle URL parsing can be.

While reading https://learn.microsoft.com/en-us/vcpkg/concepts/supported-hosts and https://github.com/microsoft/vcpkg-tool/blob/792f2dba1715f180412eef13b4de55f8085c82ca/cmake/FindLibCURL.cmake, I noticed that vcpkg-tool is currently too permissive about the minimum libcurl version it accepts. The libcurl URL API was introduced in 7.62.0 (released on 2018-10-31), and the lowest supported libcurl version on Linux is 7.74.0, which is already newer than 7.62.0. That means we should be able to use the libcurl URL API within vcpkg-tool’s supported host baseline.

If you’re OK with that direction, I’d like to split this work into two PRs:

  • Bump the minimum required libcurl version to 7.74.0 in FindLibCURL.cmake and update the documentation accordingly.
  • Use libcurl’s URL parsing functions in src/vcpkg/base/downloads.cpp instead of a custom implementation.

@BillyONeal
Copy link
Copy Markdown
Member

@ras0219-msft suggests (and @MahmoudGSaleh agrees): rather than doing the whole URI parsing, can we just check "if it's https://github.com" (just a string match) do the special github thing, otherwise do the existing behavior? That way we don't need to make the curl situation more complicated.

(And I'm sorry for not noticing this before when I first reviewed this 😅)

Previously, any provided GitHub URL was treated as a GitHub Enterprise
Server (GHES) instance when selecting the API endpoint. This assumption was
incorrect because the URL can also be for GitHub.com (e.g., https://github.com),
which led to using the wrong API endpoint.

This commit adds a function to extract the host from a URL. This is used to
differentiate between GitHub.com and GHES instances to select the correct
API endpoint.
@bc-lee bc-lee force-pushed the fix-github-api-url branch from 79462ea to da9910c Compare February 18, 2026 23:43
@bc-lee bc-lee marked this pull request as ready for review February 18, 2026 23:44
@bc-lee bc-lee requested a review from BillyONeal February 18, 2026 23:44
@bc-lee
Copy link
Copy Markdown
Author

bc-lee commented Feb 18, 2026

I simplified the logic significantly to align with your suggestion. Could you please take another look?

{
if (*github_server_url == github_com_url)
{
uri = api_github_com_url.data();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
uri = api_github_com_url.data();
uri.assign(api_github_com_url.data(), api_github_com_url.size());

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines +64 to +65
REQUIRE(github_dependency_graph_snapshots_uri(Optional<std::string>{"https://github.com/"}, "owner/repo") ==
"https://github.com//api/v3/repos/owner/repo/dependency-graph/snapshots");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doesn't this one become api.? (Should it use starts with rather than ==?)

Comment on lines +339 to +340
uri = *github_server_url;
uri.append("/api/v3");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this defend against github_server_url ending with /?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I applied.

}
else
{
uri = api_github_com_url.data();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
uri = api_github_com_url.data();
uri.assign(api_github_com_url.data(), api_github_com_url.size());

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@bc-lee
Copy link
Copy Markdown
Author

bc-lee commented Mar 19, 2026

As you commented earlier, I intentionally kept the change narrow because URL parsing is hard, so I initially avoided URL normalization.
As requested, I updated it to normalize URL by trimming a trailing slash. Please take another look.

@BillyONeal
Copy link
Copy Markdown
Member

As you commented earlier, I intentionally kept the change narrow because URL parsing is hard, so I initially avoided URL normalization. As requested, I updated it to normalize URL by trimming a trailing slash. Please take another look.

I guess I should have clarified: I wasn't asking you to normalize the URL. This is about a value that comes from GitHub. I was kind of expecting a 'well I tested and GitHub sets it to this'. (I'm mostly nervous about replacing one block of code that is apparently broken with a different block that may be broken...)

@bc-lee
Copy link
Copy Markdown
Author

bc-lee commented Mar 24, 2026

Ah, I may have misunderstood your intent.

You were asking for evidence of what GitHub actually sets, not for URL normalization.

GitHub documents the Actions environment variables here:
https://docs.github.com/en/actions/reference/workflows-and-actions/variables

I verified this recently in a GitHub.com Actions run:
https://github.com/bc-lee/dump-github-env-vars/actions/runs/23513789186/job/68441175977

  • GITHUB_SERVER_URL=https://github.com
  • GITHUB_API_URL=https://api.github.com

So for GitHub.com, GITHUB_SERVER_URL does not include a trailing slash.
As a follow-up, it may be better to use GITHUB_API_URL directly instead.

I no longer have access to a GHES instance, although I did when I started working on this PR in 2025-11, so I cannot confidently confirm the current GHES values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dependency graph submission failed

2 participants