ClipLoop — Product Specification (Hypothesis-Driven Video Studio)

Version: 1.0
Last Updated: 2026-01-23
Status: Implementation-ready (final)

1. Overview

ClipLoop is a simple web app that:

Ingests one or more Google Sheets produced by ClipPulse (upstream product), including the linked video artifacts referenced in those sheets.
Accepts human intent (goals/constraints) and then runs a ChatGPT-style dialogue.
Iteratively proposes next hypotheses and success criteria worth validating, grounded in the ingested data and media.
When a human selects a hypothesis, generates a short vertical "AI influencer talking-head" style video aligned to that hypothesis via an external, official video-generation API (default: HeyGen). (HeyGen API Documentation)
Logs everything to Google Sheets + saves all generated videos to Google Drive so the workflow can be resumed later.

2. Purpose

Enable continuous, hypothesis-driven video creation by combining:

Structured performance data (Sheets from ClipPulse),
Real media assets (videos referenced in those sheets),
Human judgment (intent and selections),
AI-assisted iterative validation (chat + hypothesis/success-criteria loops).

3. Goals and Non-Goals

3.1 Goals

A single web app with:
- Sheet selection (multi-select),
- Chat-room UI as the main experience,
- AI proposing hypotheses + success criteria,
- Button-driven video generation from the chat,
- Full logging (Sheets) + video storage (Drive),
- Resume/restart from past sessions with correct context.

3.2 Non-Goals (v1)

Automated posting to Instagram/TikTok.
Automated re-ingestion of new ClipPulse runs on a schedule (manual selection is sufficient).
Deep video understanding (full frame-by-frame analysis). v1 uses transcripts/metadata/artifacts available via ClipPulse and Drive (details below).

4. Platform and Technical Stack

4.1 Platform (deployment target)

Google Apps Script Web App (V8 runtime) as the primary platform, including:
- Backend logic (server-side Apps Script),
- Frontend served via Apps Script HTML Service. (Google for Developers)

4.2 Backend stack

Google Apps Script (JavaScript, V8)
Core Google services:
- SpreadsheetApp (read ClipPulse sheets + write logs),
- DriveApp (store session artifacts + generated videos),
- PropertiesService (store secrets/config),
- CacheService (short-lived caches),
- LockService (concurrency safety),
- UrlFetchApp (OpenAI + HeyGen API calls).
Design constraints:
- Apps Script execution time limits apply (use async polling patterns + background tasks where needed). (Google for Developers)

4.3 Frontend stack

Apps Script HTML Service + vanilla HTML/CSS/JS (no build step required).
Use google.script.run for client ↔ server RPC. (Google for Developers)

4.4 External APIs

OpenAI for LLM reasoning (Responses API) and optional transcription. (OpenAI Platform)
HeyGen (default) for avatar/talking-head video generation (official API). (HeyGen API Documentation)

5. Inputs and Data Contracts

5.1 Reference sheets (ClipPulse output)

ClipLoop only guarantees compatibility with Google Sheets produced by ClipPulse.

Required worksheet tabs

Instagram
TikTok

Sheet-level assumptions

Row 1 is the header row (column names).
Each subsequent row is one post (one record).

Drive artifacts referenced by the sheet

ClipPulse stores a Drive artifact per post and writes its link into drive_url. That artifact is either:

video.mp4 (downloaded file), or
watch.html (a lightweight HTML file with the original platform URL).

ClipLoop must treat drive_url as the canonical "real media asset pointer" and attempt to enrich context from it.

5.2 Columns ClipLoop must support

ClipLoop must parse all columns as raw strings, but it must specifically recognize and use the following columns for downstream logic.

TikTok tab — required columns (ClipLoop usage)

From ClipPulse schema:

Column	Used for
`platform_post_id`	stable post identifier + dedup key
`username`, `create_username`	creator/account metadata
`create_time`, `posted_at`	chronology
`description`	caption/context
`voice_to_text`	transcript-like context (preferred)
`view`, `like`, `comments`, `share_count`, `collect_count`	performance metrics
`video_duration`	content constraints
`region_code`	segmenting patterns
`hashtag_names`	topic extraction
`drive_url`	link to media artifact in Drive

Instagram tab — required columns (ClipLoop usage)

From ClipPulse schema:

Column	Used for
`platform_post_id`	stable post identifier + dedup key
`account`, `create_username`	creator/account metadata
`timestamp`, `posted_at`	chronology
`caption`	caption/context
`like_count`, `comments_count`	performance metrics
`post_url`	reference URL
`media_type`, `media_product_type`	content constraints
`drive_url`	link to media artifact in Drive

Note: Even if additional columns exist (ClipPulse has more), ClipLoop must store them for context but only requires the above for core logic.

5.3 Human intent input

At session creation time, ClipLoop must accept:

Intent / Goal (free text; required)
Optional constraints:
- target platform(s) (IG, TikTok, both),
- topic/domain,
- tone/persona,
- language,
- posting cadence,
- constraints about what not to do.

This intent becomes part of the session's "Context Pack" and is always available to the AI.

5.4 "AI can read videos" definition (v1)

Because Apps Script is constrained, ClipLoop defines "AI can check videos" as:

For each post row, ClipLoop tries to provide the AI with:

Caption/description text,
Transcript-like text:
- Prefer voice_to_text (TikTok), else
- Optional OpenAI transcription if the Drive artifact is a small video.mp4 and can be sent to OpenAI within file-size constraints (see §9.2).
The drive_url itself (as a human-auditable pointer),
Key metrics.

6. User Experience and UI/UX

6.1 Screens

A) Home / Session Launcher

Components:

Reference Sheets selector
- Add by pasting Google Sheet URL(s) (multi-add)
- Display selected sheets list with remove buttons
Intent / Goal text area
Optional settings:
- default platform focus (IG/TikTok/Both),
- default video settings (length range, avatar/voice)
Primary CTA: Start AI

B) Session (Chat Room) — main UI

Layout:

Left/top: Chat timeline (ChatGPT-style)
Right/bottom side panel (or collapsible drawer):
- "Current hypotheses" list (AI-generated candidates)
- Controls:
  - "Regenerate hypotheses"
  - "Select hypothesis"
  - "Create video from selected hypothesis"
Below timeline: message composer + send button
Status area: "Loading context…", "AI thinking…", "Rendering video…", etc.

6.2 Required interaction flow

User selects reference sheet(s) + enters intent → clicks Start AI.
App builds a Context Pack (data summary + exemplars) and saves it.
App requests the AI to produce initial hypotheses/success criteria.
Chat becomes active once the initial AI output is ready.
User iterates with AI; hypotheses panel updates each turn (or on demand).
User selects a hypothesis → clicks Create video.
App calls:
- OpenAI to produce a Video Brief aligned with conversation + chosen hypothesis,
- HeyGen to generate the video.
When done, the video appears in chat with:
- embedded player (if possible) + Drive link,
- metadata (hypothesis ID, timestamps).
User can repeat steps 5–8 indefinitely.

6.3 Resume flow

From Home:

"Resume session" list (most recent first)
Selecting a session loads:
- prior conversation,
- the Context Pack snapshot used at that time,
- prior hypotheses and videos,
- and restarts chat with full context.

7. System Architecture

7.1 High-level architecture

Apps Script Web App

Serves frontend (HTML Service)
Implements backend endpoints (server functions callable via google.script.run)
Owns logging + storage in:
- a dedicated log spreadsheet,
- a dedicated Drive folder tree.

External providers

OpenAI (LLM + optional transcription) via HTTPS calls. (OpenAI Platform)
HeyGen (video generation) via HTTPS calls. (HeyGen API Documentation)

7.2 Asynchronous pattern (required)

Because:

HeyGen video rendering can take minutes to hours depending on load and plan,
OpenAI reasoning can take > typical UI wait time, ClipLoop must use:
OpenAI background mode (for LLM calls) + polling, (OpenAI Platform)
HeyGen status polling (video_status.get) until completed/failed. (HeyGen API Documentation)

8. Data Storage Design

8.1 Drive folder structure (mandatory)

On first run, ClipLoop creates (or requests) a root folder:

Drive/ClipLoop/

Inside:

ClipLoop/
  sessions/
    session_<SESSION_ID>/
      context/
        reference_sheets.json
        context_pack.json
      chat/
        messages.jsonl
        openai_calls/
          <OPENAI_RESPONSE_ID>.request.json
          <OPENAI_RESPONSE_ID>.response.json
      hypotheses/
        hypotheses_<TIMESTAMP>.json
      videos/
        video_<ITERATION>_<HEYGEN_VIDEO_ID>/
          brief.json
          heygen.request.json
          heygen.status.json
          output.mp4
          drive_link.txt
      errors/
        <TIMESTAMP>_<KIND>.json
  exports/

Notes:

The log spreadsheet stores indexes and links; Drive stores full JSON payloads to avoid huge cells.
SESSION_ID is globally unique (see §10.1).

8.2 Log spreadsheet (mandatory)

Create one dedicated spreadsheet: ClipLoop Logs.

Tabs and columns:

`Sessions`

Column	Type	Notes
`session_id`	string	primary key
`created_at`	ISO string
`updated_at`	ISO string
`title`	string	derived from intent or user input
`intent`	string	original user intent
`reference_sheet_urls_json`	string (JSON)	list of selected sheets
`session_drive_folder_url`	string	Drive folder
`context_pack_drive_url`	string	Drive link to context_pack.json
`status`	enum	`active`, `archived`, `error`

`Messages`

Column	Type	Notes
`session_id`	string
`message_id`	string	unique within session
`ts`	ISO string
`role`	enum	`user`, `assistant`, `system`
`content_text`	string	rendered content
`openai_response_id`	string	nullable
`attachments_json`	string (JSON)	e.g., video drive links
`raw_drive_url`	string	optional pointer to JSONL chunk

`Hypotheses`

Column	Type	Notes
`session_id`	string
`ts`	ISO string
`hypotheses_json_drive_url`	string	Drive link to hypotheses JSON
`selected_hypothesis_id`	string	nullable

`Videos`

Column	Type	Notes
`session_id`	string
`iteration`	number	1,2,3...
`ts_requested`	ISO string
`ts_completed`	ISO string	nullable
`hypothesis_id`	string
`heygen_video_id`	string
`heygen_status`	string	`pending	waiting	processing	completed	failed`
`drive_video_url`	string	Drive link to mp4
`video_brief_drive_url`	string	Drive link to brief.json

`Errors`

Column	Type	Notes
`session_id`	string
`ts`	ISO string
`stage`	string	`context_build`, `llm`, `heygen`, `storage`, etc
`error_summary`	string	short
`error_json_drive_url`	string	full payload

9. AI / LLM Design (OpenAI)

9.1 Model choice

Use OpenAI GPT-5.2 Pro as the primary reasoning model: gpt-5.2-pro. (OpenAI Platform)
Use OpenAI Responses API for all LLM calls. (OpenAI Platform)

9.2 Background mode (mandatory)

All LLM calls must be created with background mode to avoid request timeouts and support polling. (OpenAI Platform)

Implementation requirement:

Create response → get response_id
Poll retrieve endpoint until terminal state (completed, failed, cancelled)
Store response_id on the assistant message row for traceability (OpenAI Platform)

9.3 Data privacy settings

Set store: false in OpenAI requests (ClipLoop will persist everything itself). (OpenAI Platform)
Note: Background mode responses are stored briefly for retrieval; background mode is not compatible with some "zero data retention" constraints. (OpenAI Platform)

9.4 Structured outputs strategy

Because gpt-5.2-pro does not support "Structured outputs" directly, ClipLoop must use function calling to produce machine-readable artifacts (hypothesis candidates and video briefs). (OpenAI Platform)

Required function calls

ClipLoop must define two functions in the OpenAI request tools:

propose_hypotheses
compose_video_brief

The assistant's natural-language chat response should remain human-readable, but hypotheses and video briefs must come from tool-call arguments.

9.5 Hypothesis output schema (function arguments)

propose_hypotheses(args) must conform to:

mode: "initial" or "iteration"
hypotheses: array of 3–7 candidates

Each hypothesis candidate must include:

hypothesis_id (string, unique within session)
title (string)
hypothesis_statement (string; "If we do X, then Y because Z…")
rationale_from_data (string; reference observed patterns)
what_to_make_next (string; content concept guidance)
success_criteria (array of metric rules)
test_plan (string; what to post, how many, timeframe)
risks_and_unknowns (array of strings)
questions_for_user (array of strings)

Metric rule object:

metric (enum: views, likes, comments, shares, engagement_rate, watch_time, saves)
operator (enum: >=, >, <=, <)
target_value (number)
window (string; e.g., 24h, 7d)
notes (string)

9.6 Video brief schema (function arguments)

compose_video_brief(args) must conform to:

brief_id (string)
hypothesis_id (string)
platform_target (instagram_reels | tiktok | both)
video_format
- aspect_ratio default "9:16"
- resolution default { "width": 720, "height": 1280 }
- duration_seconds integer (default 20–45)
script
- hook (string, 1–2 sentences)
- body (string)
- cta (string)
- on_screen_text (array of short strings)
- captions (boolean)
avatar
- heygen_avatar_id (string; can be default from settings)
- heygen_voice_id (string; can be default from settings)
- voice_speed (number; default 1.0)
safety
- requires_moderation (boolean; default true)
- blocked_topics (array; default empty)

Hard constraint: script.body must be sized so that the final HeyGen input_text is < 5000 characters. (HeyGen API Documentation)

9.7 Prompting rules (system behavior)

ClipLoop's system prompt must enforce:

Always ground hypotheses in the provided Context Pack.
Always produce success criteria that are measurable.
Ask clarifying questions when uncertainty is high.
Never fabricate metrics; if unknown, label as assumption.
For compose_video_brief, prioritize short, punchy, vertical "AI influencer" style.

9.8 Optional transcription (OpenAI Audio)

If drive_url resolves to a Drive video.mp4 and it is ≤ OpenAI file size limits, ClipLoop may transcribe it using OpenAI speech-to-text. File uploads are limited (documented as 25MB for speech-to-text).

Recommended model for transcription:

gpt-4o-mini-transcribe

If the file is too large or cannot be fetched, skip transcription and rely on caption/voice_to_text.

10. Video Generation (HeyGen)

10.1 Provider choice (v1)

Default video provider: HeyGen API (avatar/talking-head generation). (HeyGen API Documentation)

10.2 Required HeyGen endpoints

List avatars: GET https://api.heygen.com/v2/avatars (HeyGen API Documentation)
List voices: GET https://api.heygen.com/v2/voices (HeyGen API Documentation)
Create video: POST https://api.heygen.com/v2/video/generate (HeyGen API Documentation)
Video status: GET https://api.heygen.com/v1/video_status.get (HeyGen API Documentation)

10.3 HeyGen status handling

ClipLoop must treat status values as:

pending, waiting, processing, completed, failed (HeyGen API Documentation)

Polling must continue until completed or failed.

10.4 Video URL expiration (mandatory handling)

HeyGen's returned video_url expires after 7 days; ClipLoop must download and store the video in Drive promptly on completion. (HeyGen API Documentation)

11. Core Data Models (internal)

11.1 IDs

SESSION_ID: sess_<YYYYMMDD>_<randomBase36(10)>
MESSAGE_ID: msg_<incrementingInt>
HYPOTHESIS_ID: hyp_<incrementingInt>
BRIEF_ID: brief_<incrementingInt>
ITERATION: integer starting at 1

11.2 Context Pack (saved snapshot)

context_pack.json must include:

session_id
intent
reference_sheets (array of {url, spreadsheet_id, title, ingested_at})
ingested_posts (array of "ContentCard"; see below) OR (recommended) an exemplars subset + summary stats
summary_stats
- counts, top posts, median/mean views where available
- hashtag frequency
- platform breakdown
exemplars
- top N by views (TikTok)
- top N by engagement proxy
- N random samples

11.3 ContentCard (per post)

Minimal fields:

platform (tiktok|instagram)
platform_post_id
posted_at
caption_or_description
transcript_text (from voice_to_text or transcription)
drive_url
metrics (object; includes raw numeric strings + parsed numbers where safe)
raw_row (key/value map for all columns)

12. Logging Requirements (what must be recorded)

For every session:

The user's intent.
The exact reference sheet URLs used.
The Context Pack snapshot link.
All chat messages (user + assistant).
Every hypothesis set produced (stored as JSON in Drive + indexed in Hypotheses tab).
Every video brief and HeyGen job (request/response/status snapshots).
The Drive link to every created video.

13. Backend API (Apps Script server functions)

ClipLoop must implement these server functions callable from the frontend:

Session lifecycle

createSession({ intent, referenceSheetUrls[] }) -> { sessionId }
listSessions() -> { sessions[] }
loadSession({ sessionId }) -> { session, contextPack, messages, latestHypotheses, videos }

Context ingestion

startContextBuild({ sessionId }) -> { jobId }
pollContextBuild({ sessionId, jobId }) -> { status, progress, ready }

Chat / LLM

sendUserMessage({ sessionId, text }) -> { messageId, openaiResponseId }
pollAssistantMessage({ sessionId, openaiResponseId }) -> { status, assistantText?, hypothesesUpdate? }

Hypothesis selection

selectHypothesis({ sessionId, hypothesisId }) -> { ok }

Video generation

startVideoGeneration({ sessionId, hypothesisId }) -> { iteration, briefOpenaiResponseId }
pollVideoBrief({ sessionId, iteration, briefOpenaiResponseId }) -> { status, brief? }
submitHeygenJob({ sessionId, iteration, brief }) -> { heygenVideoId }
pollHeygenJob({ sessionId, iteration, heygenVideoId }) -> { status, videoUrl?, completedDriveUrl? }

14. Concrete Implementation Plan (step-by-step)

Step 1 — Create required Google resources

Create a Drive folder named ClipLoop.
Create a Google Spreadsheet named ClipLoop Logs with tabs:
- Sessions, Messages, Hypotheses, Videos, Errors (exact schemas in §8.2).

Step 2 — Create and configure Apps Script Web App

Create a new Apps Script project named ClipLoop.
Add HTML Service frontend files:
- index.html (single-page UI)
- app.js.html (client JS)
- styles.css.html (inline CSS or separate template)
Add server-side .gs files (recommended split):
- Main.gs (doGet + routing)
- Config.gs (script properties + constants)
- Storage.gs (Drive + log sheet helpers)
- ClipPulseIngest.gs (sheet parsing)
- ContextPack.gs (summary + exemplars)
- OpenAI.gs (Responses API wrapper + polling)
- HeyGen.gs (API wrapper + polling + download)
- SessionApi.gs (RPC functions)
Set OAuth scopes in manifest (minimum):
- Drive read/write
- Sheets read/write
- External requests
Deploy as Web App:
- Execute as: Me (owner) (recommended for consistent Drive/log ownership)
- Access: restrict to allowed users (single-user MVP: only you). (Google for Developers)

Step 3 — Implement storage helpers (Drive + Logs)

Implement "ensure folder" logic for ClipLoop/ and per-session subfolders (§8.1).
Implement log-sheet append/update helpers that:
- batch writes where possible,
- never exceed Apps Script execution time limits. (Google for Developers)
Implement JSON persistence:
- write/read JSON files in Drive by path.

Step 4 — Implement ClipPulse sheet ingestion

For each selected reference sheet URL:
- open spreadsheet,
- read Instagram and TikTok tabs (if missing → error).
Build ContentCards:
- parse all columns into raw_row,
- normalize required fields (post_id, caption, metrics, drive_url).
Deduplicate by (platform, platform_post_id) across all selected sheets.
Store a snapshot of ingested data (or exemplars) into context_pack.json.

Step 5 — Build Context Pack summarization (deterministic)

Compute:
- per-platform counts,
- top posts (by views when available),
- hashtag frequencies,
- simple engagement proxy metrics where possible.
Select exemplars:
- top N by views / engagement,
- N random.
Save context_pack.json and link it in Sessions tab.

Step 6 — Implement OpenAI integration (chat + hypotheses + brief)

Implement OpenAI Responses API wrapper:
- create response with background: true and store: false. (OpenAI Platform)
- poll GET /v1/responses/{id} until terminal state. (OpenAI Platform)
Implement function calling tool definitions:
- propose_hypotheses
- compose_video_brief (OpenAI Platform)
Implement message pipeline:
- write user message → kick off OpenAI → write assistant placeholder → poll updates → finalize message + store hypotheses JSON.

Step 7 — Implement HeyGen integration (video generation)

Implement HeyGen API wrapper:
- list avatars (for settings UI). (HeyGen API Documentation)
- list voices (for settings UI). (HeyGen API Documentation)
- create video: POST /v2/video/generate. (HeyGen API Documentation)
- poll status: GET /v1/video_status.get. (HeyGen API Documentation)
Enforce input constraints:
- ensure script text < 5000 chars. (HeyGen API Documentation)
On completion:
- download video_url immediately and store into Drive (URL expires in 7 days). (HeyGen API Documentation)
Write back:
- Videos tab row + chat message attachment with Drive link.

Step 8 — Implement "Resume session"

Build listSessions() from Sessions tab.
loadSession() loads:
- messages from Messages,
- latest hypotheses from Hypotheses,
- videos from Videos,
- context from Drive context_pack.json.
UI renders and allows continued conversation.

Step 9 — QA / Acceptance tests (must pass)

Load a known ClipPulse sheet and confirm:
- Context build completes.
- Initial hypotheses appear.
Chat loop:
- Send a message → receive assistant response and hypothesis refresh.
Select hypothesis → create video:
- Video brief created.
- HeyGen job submitted.
- Status transitions to completed or failed.
- On completed: MP4 saved to Drive + shown in chat.
Resume:
- Reload session and verify context + history match prior state.

15. Acceptance Criteria (functional)

ClipLoop is "done" when:

Reference sheet selection

User can add and remove multiple ClipPulse sheet URLs.

Context gating

Chat cannot begin until Context Pack snapshot exists and initial AI message is generated.

Hypothesis generation

AI provides 3–7 hypotheses with explicit success criteria, and the UI presents them as selectable items.

Video generation

After selecting a hypothesis, user can generate a HeyGen video and receive the output in chat + Drive.

Logging

Every session, message, hypotheses set, and video is logged to the log spreadsheet and Drive.

Resume

User can reopen a previous session and continue with correct prior context.

16. Appendix — Required configuration keys (Script Properties)

Store in PropertiesService:

CLIPLOOP_LOG_SPREADSHEET_ID (string)
CLIPLOOP_ROOT_DRIVE_FOLDER_ID (string)

OpenAI:

OPENAI_API_KEY (string)
OPENAI_MODEL_REASONING default gpt-5.2-pro (OpenAI Platform)
OPENAI_MODEL_TRANSCRIBE default gpt-4o-mini-transcribe

HeyGen:

HEYGEN_API_KEY (string)
HEYGEN_DEFAULT_AVATAR_ID (string)
HEYGEN_DEFAULT_VOICE_ID (string)

App behavior:

MAX_EXEMPLARS_PER_PLATFORM (number; default 20)
MAX_RANDOM_SAMPLES_PER_PLATFORM (number; default 20)
HYPOTHESIS_CANDIDATE_COUNT (number; default 5)

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src		src
.clasp.json.example		.clasp.json.example
.gitignore		.gitignore
README.md		README.md

Ceed-dev/ClipLoop

Folders and files

Latest commit

History

Repository files navigation