Open
Conversation
_format_link_display() was truncating the URL path to only the first segment, causing all scraped sources to show an identical, unhelpful display name. For example, every FASRC docs page appeared as "docs.rc.fas.harvard.edu/kb" regardless of the actual page, because the code split the path on "/" and took only element [0]. Before: docs.rc.fas.harvard.edu/kb (for /kb/abaqus/) Before: docs.rc.fas.harvard.edu/kb (for /kb/cluster-storage/) After: docs.rc.fas.harvard.edu/kb/abaqus After: docs.rc.fas.harvard.edu/kb/cluster-storage This made it impossible for users to identify which documentation page a response was sourced from, since every citation looked identical. Note: the full URL was always stored correctly in the document metadata (extra["url"]), so no data was lost. Only the display_name used for citations in chat responses was affected. Existing documents will need to be re-ingested to pick up corrected display names. Tested on our FASRC deployment only. Other deployments with multi-segment URL paths will also benefit from this fix. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Configure the marked.js renderer to add target="_blank" and rel="noopener noreferrer" to all links in chat responses. This ensures source citations open in a new browser tab instead of navigating away from the chat. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ScrapedResource._format_link_display()truncated URL paths to only the first segment, making all source citations from the same domain look identicaldocs.rc.fas.harvard.edu/kb/abaqus) instead of a generic prefix (docs.rc.fas.harvard.edu/kb)Root Cause
In
src/data_manager/collectors/scrapers/scraped_resource.py:66, the display name was built by splitting the URL path on/and taking only element[0]:For a URL like
https://docs.rc.fas.harvard.edu/kb/cluster-storage/, the path/kb/cluster-storage/was reduced to justkb, producing the display namedocs.rc.fas.harvard.edu/kbfor every single FASRC docs page.Before/After
.../kb/abaqus/docs.rc.fas.harvard.edu/kbdocs.rc.fas.harvard.edu/kb/abaqus.../kb/cluster-storage/docs.rc.fas.harvard.edu/kbdocs.rc.fas.harvard.edu/kb/cluster-storage.../kb/running-jobs/docs.rc.fas.harvard.edu/kbdocs.rc.fas.harvard.edu/kb/running-jobsNotes
extra["url"]metadata — no data was lost, only thedisplay_nameused for chat citations was affectedTest plan
🤖 Generated with Claude Code