Skip to content

Conversation

@puranikyashaswin
Copy link

Description

TinyStories models were trained with sequence length 512, but HuggingFace config incorrectly claims n_ctx=2048. This causes severe performance degradation for sequences >512 tokens.

This fix adds a post-processing override to correct the n_ctx value when loading any TinyStories model.

Fixes #492

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Checklist:

  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings

TinyStories models were trained with sequence length 512, but HuggingFace
config claims n_ctx=2048. This causes performance degradation for sequences
>512 tokens. Added warning to alert users of this limitation.

Note: We cannot change n_ctx in the config because the pretrained weights
have positional embeddings for 2048 positions. Changing n_ctx would break
weight loading.

Fixes TransformerLensOrg#492
@puranikyashaswin puranikyashaswin force-pushed the fix/tinystories-n_ctx-492 branch from aeca4d1 to 4d74ac0 Compare January 25, 2026 07:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug Report] Tiny stories models have longer n_ctx than they were trained with

1 participant