-
Notifications
You must be signed in to change notification settings - Fork 7
Add markdown alternate links for LLM training data discovery #665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add <link rel="alternate" type="text/markdown"> to page headers pointing to .md version - Improve MDX-to-markdown compilation to produce clean markdown output - Preserve code blocks and frontmatter while stripping JSX components Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Pages that only contain React components (like the landing page) now return a helpful markdown response with the title, description, and a link to the full interactive page. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
- Add dedent function to normalize indentation when extracting content from JSX components - Add normalizeIndentation function to clean up stray whitespace while preserving meaningful markdown indentation (nested lists, blockquotes) - Move list detection regex patterns to module top level for performance - Ensures code block markers (```) start at column 0 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The previous regex patterns `["']?([^"'\n]+)["']?` would truncate text at the first apostrophe (e.g., "Arcade's" became "Arcade"). This fix: - Uses separate patterns for double-quoted, single-quoted, and unquoted values - Requires closing quotes to be at end of line to prevent apostrophes from being misinterpreted as closing delimiters - Adds stripSurroundingQuotes helper for fallback cases Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When x-pathname header is not set, pathname defaults to "/" which would produce an invalid alternate link "https://docs.arcade.dev/.md". Only render the alternate link when we have a real page path. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
@evantahler Got a minute to have a second look? |
|
I like the If the goal is to keep html fragments, buy 0-out react, I'd suggest an alternative approach:
|
|
@evantahler Dammit! So when agents parse markdown, HTML and MDX getting mixed in there make it hard on them. IIIRC from the friend who did this, the entire thing needs rendering down to markdown. What approach do you recommend in light of this? |
|
Actually, let me just try parsing the HTML back into Markdown. We have some complex MDX. |
- Add scripts/generate-markdown.ts to pre-render MDX to markdown - Update proxy.ts to serve static .md files from public/ - Delete API route in favor of static file serving - Add link rewriting to add /en/ prefix and .md extension - Add markdown-friendly component implementations - Fix localhost URL in gmail integration page Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This reverts commit d7b7c71.
|
@nearestnabors I looked at the deploy preview and am not seeing the I visited the .md version of the page, and I see components and imports, but they are rendering as text not as whatever the .mdx would spit out, so maybe that's the intention here. |
Replace simple regex pattern that failed on nested braces with
existing BRACE_PATTERN that properly handles up to 3 levels of
nesting. This fixes issues with expressions like {obj || {default: true}}
and {items.map(x => ({a: x}))} where stray closing braces would remain
in the output.
|
Bugbot Autofix resolved the bug found in the latest run.
|
- Convert GuideOverview.Outcomes/Prerequisites/YouWillLearn to ## headers - Convert <Image> components to  markdown syntax - Add .md extension to internal links for LLM crawlers Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix IMAGE_COMPONENT_REGEX to use JSX_ATTRS_PATTERN for handling > in attributes - Fix INTERNAL_LINK_REGEX to correctly place .md before URL fragments - Fix IMAGE_ALT_REGEX and IMAGE_SRC_REGEX to handle apostrophes in quoted strings
|
Bugbot Autofix resolved all 3 bugs found in the markdown API route regex patterns.
|
Track the indentation level of opening code fences and only treat appears inside code blocks (e.g., in Python strings containing markdown examples).
|
Bugbot Autofix resolved the bug found in the latest run.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is ON. A Cloud Agent has been kicked off to fix the reported issue.
The bug was caused by calling trim() before dedent(), which removed leading whitespace from the first line and caused dedent to calculate minIndent as 0. This left subsequent lines with incorrect indentation. Fixed by swapping the order to dedent(content).trim() at all four occurrences (lines 256, 260, 264, and 300).
|
Bugbot Autofix resolved the bug found in the latest run.
|

Summary
<link rel="alternate" type="text/markdown">to all page headers, pointing to the.mdversion of each page.mdURLs return clean, readable markdown instead of raw MDXThis enables LLM crawlers and training pipelines to discover and consume the markdown versions of our documentation, similar to what Vercel does with their docs.
Test plan
<link rel="alternate" type="text/markdown" href="...">https://docs.arcade.dev/en/get-started/quickstarts/call-tool-agent.md- should return clean markdown with code blocks preservedhttps://docs.arcade.dev/en/home.md- should return fallback content with title/description and link to full page🤖 Generated with Claude Code
Note
Enables discovery and consumption of Markdown-rendered docs alongside MDX pages.
app/api/markdown/[[...slug]]/route.tscompiles MDX to clean Markdown: preserves frontmatter, strips imports/exports and JSX, convertsImageto, maps internal links to.md, normalizes indentation, and provides fallback title/description for component-only pagesGEThandler now readspage.mdx, compiles viacompileMdxToMarkdown, and servestext/markdownapp/layout.tsx: injects<link rel="alternate" type="text/markdown" href="https://docs.arcade.dev{pathname}.md">on all non-root pagesWritten by Cursor Bugbot for commit c43e842. This will update automatically on new commits. Configure here.