Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
169 changes: 158 additions & 11 deletions scripts/vale-style-review.ts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs/.vale.ini

Line 18 in 35be995

BlockIgnores = (?s)```[\s\S]*?```
should be telling Vale to ignore code blocks already.

I think maybe it's missing parentheses around the rule:

(?s)([\s\S]*?)

or

(?s)*([\s\S]*?)

if there may be leading whitespace (I don't think that would be true for a code block)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code:

The issue: Vale's BlockIgnores regex doesn't work for fenced code blocks because:

  1. For .md files: Vale parses Markdown natively and creates code scopes. IgnoredScopes = code
    works for most content, BUT lines starting with # inside code blocks get misidentified as
    headings.
  2. For .mdx files: Vale doesn't have native MDX support - it treats MDX as plain text, so:
    - No code scopes are created → IgnoredScopes = code doesn't help
    - BlockIgnores regex is supposed to work but has known issues
    (BlockIgnores not working for .mdx files errata-ai/vale#115, Vale is picking up errors for stuff in code blocks in GitHub markdown errata-ai/vale#387)

The bottom line: The commenter's suggestion about parentheses won't fix it - it's a Vale
limitation for MDX files. That's exactly why this PR's vale-style-review.ts script adds its own
getLinesInCodeBlocks() function to detect code blocks independently.

Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,9 @@ const DIFF_HUNK_REGEX = /^@@ -\d+(?:,\d+)? \+(\d+)(?:,\d+)? @@/;
const OWNER = "ArcadeAI";
const REPO = "docs";
const MAX_LENGTH_CHANGE_RATIO = 0.5;
const MIN_SUGGESTED_LENGTH_RATIO = 0.7; // Suggested must be at least 70% of original length
const MAX_FENCE_INDENT = 3; // CommonMark allows 0-3 spaces before fence markers
const TAB_STOP_WIDTH = 4; // CommonMark tab stops are at multiples of 4

// Load style guide
const STYLE_GUIDE_PATH = join(__dirname, "..", "STYLEGUIDE.md");
Expand Down Expand Up @@ -256,16 +259,22 @@ ${STYLE_GUIDE}

TASK: Fix the Vale style issues listed below for this file. Return JSON with your suggestions.

CRITICAL - NEVER MODIFY CODE:
- NEVER suggest changes to lines inside code blocks (lines between \`\`\` or ~~~ markers)
- NEVER suggest changes to code examples, import statements, function calls, or variable names
- SKIP any Vale issue that appears inside a code block or affects code

RULES:
1. ONLY fix the specific issues listed - do not make any other changes
2. Make MINIMAL changes - only change the specific word or phrase mentioned in the issue
3. NEVER delete content, rewrite sentences, or change anything beyond the flagged issue
4. If a message says "Use 'X' instead of 'Y'", find ONLY Y and replace with X - nothing else
5. Preserve technical accuracy - never change code or technical details
6. For passive voice - only fix if active voice is clearer
7. If an issue should NOT be fixed (e.g., passive voice is appropriate), omit it
7. If an issue should NOT be fixed (e.g., passive voice is appropriate, or it's inside code), OMIT IT
8. The "original" field must contain the EXACT full line from the file
9. The "suggested" field must be identical to "original" EXCEPT for the specific fix
10. The "suggested" field should not be significantly shorter than the "original" - you are fixing style, not removing content

FILE: ${filename}

Expand Down Expand Up @@ -364,6 +373,135 @@ async function getSuggestions(
}
}

// Convert leading whitespace (including tabs) to equivalent space count
// Per CommonMark spec: tabs behave as if replaced by spaces with tab stop of 4
// Tab stops are at positions 0, 4, 8, 12, etc. (multiples of 4)
function countLeadingWhitespace(line: string): number {
let position = 0;
for (const char of line) {
if (char === " ") {
position += 1;
} else if (char === "\t") {
// Advance to next tab stop (multiple of TAB_STOP_WIDTH)
position = Math.ceil((position + 1) / TAB_STOP_WIDTH) * TAB_STOP_WIDTH;
} else {
// Non-whitespace character - stop counting
break;
}
}
return position;
}

// Count consecutive fence characters (backticks or tildes) at the start of a trimmed line
// Returns both the count and the character type
function countLeadingFenceChars(trimmedLine: string): { count: number; char: string } {
const firstChar = trimmedLine[0];
if (firstChar !== "`" && firstChar !== "~") {
return { count: 0, char: "" };
}

let count = 0;
for (const char of trimmedLine) {
if (char === firstChar) {
count += 1;
} else {
break;
}
}
return { count, char: firstChar };
}

// Check if a line could be a fence marker (has valid indentation and starts with ``` or ~~~)
function isFenceCandidate(line: string): boolean {
const leadingWhitespace = countLeadingWhitespace(line);
const trimmedLine = line.trim();
return (
leadingWhitespace <= MAX_FENCE_INDENT &&
(trimmedLine.startsWith("```") || trimmedLine.startsWith("~~~"))
);
}

// Check if a line is a valid closing fence
function isValidClosingFence(
trimmedLine: string,
fenceInfo: { count: number; char: string },
openingFenceInfo: { count: number; char: string }
): boolean {
// For closing fence: must use same character, have at least as many chars as opening,
// AND only whitespace after the fence chars (per CommonMark spec)
const afterFenceChars = trimmedLine.slice(fenceInfo.count);
return (
fenceInfo.char === openingFenceInfo.char &&
fenceInfo.count >= openingFenceInfo.count &&
afterFenceChars.trim() === ""
);
}

// Check which lines are inside code blocks (fenced with ``` or ~~~)
// Returns a Set of line numbers (1-indexed) that are inside code blocks
function getLinesInCodeBlocks(content: string): Set<number> {
const linesInCodeBlocks = new Set<number>();
const lines = content.split("\n");
let inCodeBlock = false;
let openingFenceInfo = { count: 0, char: "" };

for (let i = 0; i < lines.length; i += 1) {
const line = lines[i];
const lineNum = i + 1; // 1-indexed
const trimmedLine = line.trim();

if (isFenceCandidate(line)) {
const fenceInfo = countLeadingFenceChars(trimmedLine);

if (inCodeBlock) {
linesInCodeBlocks.add(lineNum);
if (
isValidClosingFence(trimmedLine, fenceInfo, openingFenceInfo)
) {
inCodeBlock = false;
openingFenceInfo = { count: 0, char: "" };
}
} else {
// Opening a code block
linesInCodeBlocks.add(lineNum);
inCodeBlock = true;
openingFenceInfo = fenceInfo;
}
} else if (inCodeBlock) {
linesInCodeBlocks.add(lineNum);
}
}

return linesInCodeBlocks;
}

// Validate that a suggestion has the required fields with correct types
function hasValidFields(s: Suggestion): boolean {
return (
typeof s.line === "number" &&
typeof s.original === "string" &&
typeof s.suggested === "string" &&
typeof s.rule === "string"
);
}

// Check if a suggestion would remove too much content
function wouldRemoveContent(s: Suggestion): boolean {
const originalLen = s.original.trim().length;
const suggestedLen = s.suggested.trim().length;
return (
originalLen > 0 && suggestedLen < originalLen * MIN_SUGGESTED_LENGTH_RATIO
);
}

// Check if a suggestion makes destructive length changes
function isDestructiveChange(s: Suggestion): boolean {
const lengthDiff = Math.abs(
s.suggested.trim().length - s.original.trim().length
);
return lengthDiff > s.original.trim().length * MAX_LENGTH_CHANGE_RATIO;
}

// Format suggestions as GitHub review comments
// Only includes suggestions for lines that are in the PR diff and don't already have Vale comments
function formatReviewComments(options: {
Expand All @@ -382,6 +520,9 @@ function formatReviewComments(options: {
} = options;
const lines = fileContent.split("\n");

// Get lines that are inside code blocks - NEVER modify these
const linesInCodeBlocks = getLinesInCodeBlocks(fileContent);

return suggestions
.filter((s) => {
// Skip if there's already a Vale comment on this line
Expand All @@ -391,30 +532,36 @@ function formatReviewComments(options: {
return false;
}
// Validate required fields exist and have correct types
if (
typeof s.line !== "number" ||
typeof s.original !== "string" ||
typeof s.suggested !== "string" ||
typeof s.rule !== "string"
) {
if (!hasValidFields(s)) {
return false;
}
// Validate line number is in range
if (s.line < 1 || s.line > lines.length) {
return false;
}
// CRITICAL: Never modify lines inside code blocks
if (linesInCodeBlocks.has(s.line)) {
console.log(
` Skipping line ${s.line} (inside code block - code must not be modified)`
);
return false;
}
// Validate line is in the PR diff (GitHub API requirement)
if (!commentableLines.has(s.line)) {
return false;
}
// Reject destructive suggestions (length change > 50% of original)
const lengthDiff = Math.abs(s.suggested.length - s.original.length);
if (lengthDiff > s.original.length * MAX_LENGTH_CHANGE_RATIO) {
// Reject suggestions that remove content (suggested is too short)
if (wouldRemoveContent(s)) {
console.log(
` Skipping destructive suggestion on line ${s.line} (length change: ${lengthDiff} chars)`
` Skipping line ${s.line} (suggested text too short - would remove content)`
);
return false;
}
// Reject destructive suggestions (length change > 50% of original)
if (isDestructiveChange(s)) {
console.log(` Skipping destructive suggestion on line ${s.line}`);
return false;
}
// Validate original content matches (loosely)
const actualLine = lines[s.line - 1];
return actualLine.includes(
Expand Down
Loading