Skip to content

Hypothesis-driven video studio: ingest ClipPulse data, chat with AI to generate hypotheses, create HeyGen talking-head videos.

Notifications You must be signed in to change notification settings

Ceed-dev/ClipLoop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ClipLoop — Product Specification (Hypothesis-Driven Video Studio)

Version: 1.0
Last Updated: 2026-01-23
Status: Implementation-ready (final)


1. Overview

ClipLoop is a simple web app that:

  1. Ingests one or more Google Sheets produced by ClipPulse (upstream product), including the linked video artifacts referenced in those sheets.
  2. Accepts human intent (goals/constraints) and then runs a ChatGPT-style dialogue.
  3. Iteratively proposes next hypotheses and success criteria worth validating, grounded in the ingested data and media.
  4. When a human selects a hypothesis, generates a short vertical "AI influencer talking-head" style video aligned to that hypothesis via an external, official video-generation API (default: HeyGen). (HeyGen API Documentation)
  5. Logs everything to Google Sheets + saves all generated videos to Google Drive so the workflow can be resumed later.

2. Purpose

Enable continuous, hypothesis-driven video creation by combining:

  • Structured performance data (Sheets from ClipPulse),
  • Real media assets (videos referenced in those sheets),
  • Human judgment (intent and selections),
  • AI-assisted iterative validation (chat + hypothesis/success-criteria loops).

3. Goals and Non-Goals

3.1 Goals

  • A single web app with:

    • Sheet selection (multi-select),
    • Chat-room UI as the main experience,
    • AI proposing hypotheses + success criteria,
    • Button-driven video generation from the chat,
    • Full logging (Sheets) + video storage (Drive),
    • Resume/restart from past sessions with correct context.

3.2 Non-Goals (v1)

  • Automated posting to Instagram/TikTok.
  • Automated re-ingestion of new ClipPulse runs on a schedule (manual selection is sufficient).
  • Deep video understanding (full frame-by-frame analysis). v1 uses transcripts/metadata/artifacts available via ClipPulse and Drive (details below).

4. Platform and Technical Stack

4.1 Platform (deployment target)

  • Google Apps Script Web App (V8 runtime) as the primary platform, including:

    • Backend logic (server-side Apps Script),
    • Frontend served via Apps Script HTML Service. (Google for Developers)

4.2 Backend stack

  • Google Apps Script (JavaScript, V8)

  • Core Google services:

    • SpreadsheetApp (read ClipPulse sheets + write logs),
    • DriveApp (store session artifacts + generated videos),
    • PropertiesService (store secrets/config),
    • CacheService (short-lived caches),
    • LockService (concurrency safety),
    • UrlFetchApp (OpenAI + HeyGen API calls).
  • Design constraints:

    • Apps Script execution time limits apply (use async polling patterns + background tasks where needed). (Google for Developers)

4.3 Frontend stack

  • Apps Script HTML Service + vanilla HTML/CSS/JS (no build step required).
  • Use google.script.run for client ↔ server RPC. (Google for Developers)

4.4 External APIs


5. Inputs and Data Contracts

5.1 Reference sheets (ClipPulse output)

ClipLoop only guarantees compatibility with Google Sheets produced by ClipPulse.

Required worksheet tabs

  • Instagram
  • TikTok

Sheet-level assumptions

  • Row 1 is the header row (column names).
  • Each subsequent row is one post (one record).

Drive artifacts referenced by the sheet

ClipPulse stores a Drive artifact per post and writes its link into drive_url. That artifact is either:

  • video.mp4 (downloaded file), or
  • watch.html (a lightweight HTML file with the original platform URL).

ClipLoop must treat drive_url as the canonical "real media asset pointer" and attempt to enrich context from it.


5.2 Columns ClipLoop must support

ClipLoop must parse all columns as raw strings, but it must specifically recognize and use the following columns for downstream logic.

TikTok tab — required columns (ClipLoop usage)

From ClipPulse schema:

Column Used for
platform_post_id stable post identifier + dedup key
username, create_username creator/account metadata
create_time, posted_at chronology
description caption/context
voice_to_text transcript-like context (preferred)
view, like, comments, share_count, collect_count performance metrics
video_duration content constraints
region_code segmenting patterns
hashtag_names topic extraction
drive_url link to media artifact in Drive

Instagram tab — required columns (ClipLoop usage)

From ClipPulse schema:

Column Used for
platform_post_id stable post identifier + dedup key
account, create_username creator/account metadata
timestamp, posted_at chronology
caption caption/context
like_count, comments_count performance metrics
post_url reference URL
media_type, media_product_type content constraints
drive_url link to media artifact in Drive

Note: Even if additional columns exist (ClipPulse has more), ClipLoop must store them for context but only requires the above for core logic.


5.3 Human intent input

At session creation time, ClipLoop must accept:

  • Intent / Goal (free text; required)

  • Optional constraints:

    • target platform(s) (IG, TikTok, both),
    • topic/domain,
    • tone/persona,
    • language,
    • posting cadence,
    • constraints about what not to do.

This intent becomes part of the session's "Context Pack" and is always available to the AI.


5.4 "AI can read videos" definition (v1)

Because Apps Script is constrained, ClipLoop defines "AI can check videos" as:

For each post row, ClipLoop tries to provide the AI with:

  1. Caption/description text,

  2. Transcript-like text:

    • Prefer voice_to_text (TikTok), else
    • Optional OpenAI transcription if the Drive artifact is a small video.mp4 and can be sent to OpenAI within file-size constraints (see §9.2).
  3. The drive_url itself (as a human-auditable pointer),

  4. Key metrics.


6. User Experience and UI/UX

6.1 Screens

A) Home / Session Launcher

Components:

  • Reference Sheets selector

    • Add by pasting Google Sheet URL(s) (multi-add)
    • Display selected sheets list with remove buttons
  • Intent / Goal text area

  • Optional settings:

    • default platform focus (IG/TikTok/Both),
    • default video settings (length range, avatar/voice)
  • Primary CTA: Start AI

B) Session (Chat Room) — main UI

Layout:

  • Left/top: Chat timeline (ChatGPT-style)

  • Right/bottom side panel (or collapsible drawer):

    • "Current hypotheses" list (AI-generated candidates)

    • Controls:

      • "Regenerate hypotheses"
      • "Select hypothesis"
      • "Create video from selected hypothesis"
  • Below timeline: message composer + send button

  • Status area: "Loading context…", "AI thinking…", "Rendering video…", etc.

6.2 Required interaction flow

  1. User selects reference sheet(s) + enters intent → clicks Start AI.

  2. App builds a Context Pack (data summary + exemplars) and saves it.

  3. App requests the AI to produce initial hypotheses/success criteria.

  4. Chat becomes active once the initial AI output is ready.

  5. User iterates with AI; hypotheses panel updates each turn (or on demand).

  6. User selects a hypothesis → clicks Create video.

  7. App calls:

    • OpenAI to produce a Video Brief aligned with conversation + chosen hypothesis,
    • HeyGen to generate the video.
  8. When done, the video appears in chat with:

    • embedded player (if possible) + Drive link,
    • metadata (hypothesis ID, timestamps).
  9. User can repeat steps 5–8 indefinitely.

6.3 Resume flow

From Home:

  • "Resume session" list (most recent first)

  • Selecting a session loads:

    • prior conversation,
    • the Context Pack snapshot used at that time,
    • prior hypotheses and videos,
    • and restarts chat with full context.

7. System Architecture

7.1 High-level architecture

Apps Script Web App

  • Serves frontend (HTML Service)

  • Implements backend endpoints (server functions callable via google.script.run)

  • Owns logging + storage in:

    • a dedicated log spreadsheet,
    • a dedicated Drive folder tree.

External providers

7.2 Asynchronous pattern (required)

Because:

  • HeyGen video rendering can take minutes to hours depending on load and plan,
  • OpenAI reasoning can take > typical UI wait time, ClipLoop must use:
  • OpenAI background mode (for LLM calls) + polling, (OpenAI Platform)
  • HeyGen status polling (video_status.get) until completed/failed. (HeyGen API Documentation)

8. Data Storage Design

8.1 Drive folder structure (mandatory)

On first run, ClipLoop creates (or requests) a root folder:

Drive/ClipLoop/

Inside:

ClipLoop/
  sessions/
    session_<SESSION_ID>/
      context/
        reference_sheets.json
        context_pack.json
      chat/
        messages.jsonl
        openai_calls/
          <OPENAI_RESPONSE_ID>.request.json
          <OPENAI_RESPONSE_ID>.response.json
      hypotheses/
        hypotheses_<TIMESTAMP>.json
      videos/
        video_<ITERATION>_<HEYGEN_VIDEO_ID>/
          brief.json
          heygen.request.json
          heygen.status.json
          output.mp4
          drive_link.txt
      errors/
        <TIMESTAMP>_<KIND>.json
  exports/

Notes:

  • The log spreadsheet stores indexes and links; Drive stores full JSON payloads to avoid huge cells.
  • SESSION_ID is globally unique (see §10.1).

8.2 Log spreadsheet (mandatory)

Create one dedicated spreadsheet: ClipLoop Logs.

Tabs and columns:

Sessions

Column Type Notes
session_id string primary key
created_at ISO string
updated_at ISO string
title string derived from intent or user input
intent string original user intent
reference_sheet_urls_json string (JSON) list of selected sheets
session_drive_folder_url string Drive folder
context_pack_drive_url string Drive link to context_pack.json
status enum active, archived, error

Messages

Column Type Notes
session_id string
message_id string unique within session
ts ISO string
role enum user, assistant, system
content_text string rendered content
openai_response_id string nullable
attachments_json string (JSON) e.g., video drive links
raw_drive_url string optional pointer to JSONL chunk

Hypotheses

Column Type Notes
session_id string
ts ISO string
hypotheses_json_drive_url string Drive link to hypotheses JSON
selected_hypothesis_id string nullable

Videos

Column Type Notes
session_id string
iteration number 1,2,3...
ts_requested ISO string
ts_completed ISO string nullable
hypothesis_id string
heygen_video_id string
heygen_status string `pending waiting processing completed failed`
drive_video_url string Drive link to mp4
video_brief_drive_url string Drive link to brief.json

Errors

Column Type Notes
session_id string
ts ISO string
stage string context_build, llm, heygen, storage, etc
error_summary string short
error_json_drive_url string full payload

9. AI / LLM Design (OpenAI)

9.1 Model choice

9.2 Background mode (mandatory)

All LLM calls must be created with background mode to avoid request timeouts and support polling. (OpenAI Platform)

Implementation requirement:

  • Create response → get response_id
  • Poll retrieve endpoint until terminal state (completed, failed, cancelled)
  • Store response_id on the assistant message row for traceability (OpenAI Platform)

9.3 Data privacy settings

  • Set store: false in OpenAI requests (ClipLoop will persist everything itself). (OpenAI Platform)
  • Note: Background mode responses are stored briefly for retrieval; background mode is not compatible with some "zero data retention" constraints. (OpenAI Platform)

9.4 Structured outputs strategy

Because gpt-5.2-pro does not support "Structured outputs" directly, ClipLoop must use function calling to produce machine-readable artifacts (hypothesis candidates and video briefs). (OpenAI Platform)

Required function calls

ClipLoop must define two functions in the OpenAI request tools:

  1. propose_hypotheses
  2. compose_video_brief

The assistant's natural-language chat response should remain human-readable, but hypotheses and video briefs must come from tool-call arguments.


9.5 Hypothesis output schema (function arguments)

propose_hypotheses(args) must conform to:

  • mode: "initial" or "iteration"
  • hypotheses: array of 3–7 candidates

Each hypothesis candidate must include:

  • hypothesis_id (string, unique within session)
  • title (string)
  • hypothesis_statement (string; "If we do X, then Y because Z…")
  • rationale_from_data (string; reference observed patterns)
  • what_to_make_next (string; content concept guidance)
  • success_criteria (array of metric rules)
  • test_plan (string; what to post, how many, timeframe)
  • risks_and_unknowns (array of strings)
  • questions_for_user (array of strings)

Metric rule object:

  • metric (enum: views, likes, comments, shares, engagement_rate, watch_time, saves)
  • operator (enum: >=, >, <=, <)
  • target_value (number)
  • window (string; e.g., 24h, 7d)
  • notes (string)

9.6 Video brief schema (function arguments)

compose_video_brief(args) must conform to:

  • brief_id (string)

  • hypothesis_id (string)

  • platform_target (instagram_reels | tiktok | both)

  • video_format

    • aspect_ratio default "9:16"
    • resolution default { "width": 720, "height": 1280 }
    • duration_seconds integer (default 20–45)
  • script

    • hook (string, 1–2 sentences)
    • body (string)
    • cta (string)
    • on_screen_text (array of short strings)
    • captions (boolean)
  • avatar

    • heygen_avatar_id (string; can be default from settings)
    • heygen_voice_id (string; can be default from settings)
    • voice_speed (number; default 1.0)
  • safety

    • requires_moderation (boolean; default true)
    • blocked_topics (array; default empty)

Hard constraint: script.body must be sized so that the final HeyGen input_text is < 5000 characters. (HeyGen API Documentation)


9.7 Prompting rules (system behavior)

ClipLoop's system prompt must enforce:

  • Always ground hypotheses in the provided Context Pack.
  • Always produce success criteria that are measurable.
  • Ask clarifying questions when uncertainty is high.
  • Never fabricate metrics; if unknown, label as assumption.
  • For compose_video_brief, prioritize short, punchy, vertical "AI influencer" style.

9.8 Optional transcription (OpenAI Audio)

If drive_url resolves to a Drive video.mp4 and it is ≤ OpenAI file size limits, ClipLoop may transcribe it using OpenAI speech-to-text. File uploads are limited (documented as 25MB for speech-to-text).

Recommended model for transcription:

  • gpt-4o-mini-transcribe

If the file is too large or cannot be fetched, skip transcription and rely on caption/voice_to_text.


10. Video Generation (HeyGen)

10.1 Provider choice (v1)

Default video provider: HeyGen API (avatar/talking-head generation). (HeyGen API Documentation)

10.2 Required HeyGen endpoints

10.3 HeyGen status handling

ClipLoop must treat status values as:

Polling must continue until completed or failed.

10.4 Video URL expiration (mandatory handling)

HeyGen's returned video_url expires after 7 days; ClipLoop must download and store the video in Drive promptly on completion. (HeyGen API Documentation)


11. Core Data Models (internal)

11.1 IDs

  • SESSION_ID: sess_<YYYYMMDD>_<randomBase36(10)>
  • MESSAGE_ID: msg_<incrementingInt>
  • HYPOTHESIS_ID: hyp_<incrementingInt>
  • BRIEF_ID: brief_<incrementingInt>
  • ITERATION: integer starting at 1

11.2 Context Pack (saved snapshot)

context_pack.json must include:

  • session_id

  • intent

  • reference_sheets (array of {url, spreadsheet_id, title, ingested_at})

  • ingested_posts (array of "ContentCard"; see below) OR (recommended) an exemplars subset + summary stats

  • summary_stats

    • counts, top posts, median/mean views where available
    • hashtag frequency
    • platform breakdown
  • exemplars

    • top N by views (TikTok)
    • top N by engagement proxy
    • N random samples

11.3 ContentCard (per post)

Minimal fields:

  • platform (tiktok|instagram)
  • platform_post_id
  • posted_at
  • caption_or_description
  • transcript_text (from voice_to_text or transcription)
  • drive_url
  • metrics (object; includes raw numeric strings + parsed numbers where safe)
  • raw_row (key/value map for all columns)

12. Logging Requirements (what must be recorded)

For every session:

  • The user's intent.
  • The exact reference sheet URLs used.
  • The Context Pack snapshot link.
  • All chat messages (user + assistant).
  • Every hypothesis set produced (stored as JSON in Drive + indexed in Hypotheses tab).
  • Every video brief and HeyGen job (request/response/status snapshots).
  • The Drive link to every created video.

13. Backend API (Apps Script server functions)

ClipLoop must implement these server functions callable from the frontend:

Session lifecycle

  • createSession({ intent, referenceSheetUrls[] }) -> { sessionId }
  • listSessions() -> { sessions[] }
  • loadSession({ sessionId }) -> { session, contextPack, messages, latestHypotheses, videos }

Context ingestion

  • startContextBuild({ sessionId }) -> { jobId }
  • pollContextBuild({ sessionId, jobId }) -> { status, progress, ready }

Chat / LLM

  • sendUserMessage({ sessionId, text }) -> { messageId, openaiResponseId }
  • pollAssistantMessage({ sessionId, openaiResponseId }) -> { status, assistantText?, hypothesesUpdate? }

Hypothesis selection

  • selectHypothesis({ sessionId, hypothesisId }) -> { ok }

Video generation

  • startVideoGeneration({ sessionId, hypothesisId }) -> { iteration, briefOpenaiResponseId }
  • pollVideoBrief({ sessionId, iteration, briefOpenaiResponseId }) -> { status, brief? }
  • submitHeygenJob({ sessionId, iteration, brief }) -> { heygenVideoId }
  • pollHeygenJob({ sessionId, iteration, heygenVideoId }) -> { status, videoUrl?, completedDriveUrl? }

14. Concrete Implementation Plan (step-by-step)

Step 1 — Create required Google resources

  1. Create a Drive folder named ClipLoop.

  2. Create a Google Spreadsheet named ClipLoop Logs with tabs:

    • Sessions, Messages, Hypotheses, Videos, Errors (exact schemas in §8.2).

Step 2 — Create and configure Apps Script Web App

  1. Create a new Apps Script project named ClipLoop.

  2. Add HTML Service frontend files:

    • index.html (single-page UI)
    • app.js.html (client JS)
    • styles.css.html (inline CSS or separate template)
  3. Add server-side .gs files (recommended split):

    • Main.gs (doGet + routing)
    • Config.gs (script properties + constants)
    • Storage.gs (Drive + log sheet helpers)
    • ClipPulseIngest.gs (sheet parsing)
    • ContextPack.gs (summary + exemplars)
    • OpenAI.gs (Responses API wrapper + polling)
    • HeyGen.gs (API wrapper + polling + download)
    • SessionApi.gs (RPC functions)
  4. Set OAuth scopes in manifest (minimum):

    • Drive read/write
    • Sheets read/write
    • External requests
  5. Deploy as Web App:

    • Execute as: Me (owner) (recommended for consistent Drive/log ownership)
    • Access: restrict to allowed users (single-user MVP: only you). (Google for Developers)

Step 3 — Implement storage helpers (Drive + Logs)

  1. Implement "ensure folder" logic for ClipLoop/ and per-session subfolders (§8.1).

  2. Implement log-sheet append/update helpers that:

  3. Implement JSON persistence:

    • write/read JSON files in Drive by path.

Step 4 — Implement ClipPulse sheet ingestion

  1. For each selected reference sheet URL:

    • open spreadsheet,
    • read Instagram and TikTok tabs (if missing → error).
  2. Build ContentCards:

    • parse all columns into raw_row,
    • normalize required fields (post_id, caption, metrics, drive_url).
  3. Deduplicate by (platform, platform_post_id) across all selected sheets.

  4. Store a snapshot of ingested data (or exemplars) into context_pack.json.

Step 5 — Build Context Pack summarization (deterministic)

  1. Compute:

    • per-platform counts,
    • top posts (by views when available),
    • hashtag frequencies,
    • simple engagement proxy metrics where possible.
  2. Select exemplars:

    • top N by views / engagement,
    • N random.
  3. Save context_pack.json and link it in Sessions tab.

Step 6 — Implement OpenAI integration (chat + hypotheses + brief)

  1. Implement OpenAI Responses API wrapper:

  2. Implement function calling tool definitions:

  3. Implement message pipeline:

    • write user message → kick off OpenAI → write assistant placeholder → poll updates → finalize message + store hypotheses JSON.

Step 7 — Implement HeyGen integration (video generation)

  1. Implement HeyGen API wrapper:

  2. Enforce input constraints:

  3. On completion:

  4. Write back:

    • Videos tab row + chat message attachment with Drive link.

Step 8 — Implement "Resume session"

  1. Build listSessions() from Sessions tab.

  2. loadSession() loads:

    • messages from Messages,
    • latest hypotheses from Hypotheses,
    • videos from Videos,
    • context from Drive context_pack.json.
  3. UI renders and allows continued conversation.

Step 9 — QA / Acceptance tests (must pass)

  • Load a known ClipPulse sheet and confirm:

    • Context build completes.
    • Initial hypotheses appear.
  • Chat loop:

    • Send a message → receive assistant response and hypothesis refresh.
  • Select hypothesis → create video:

    • Video brief created.
    • HeyGen job submitted.
    • Status transitions to completed or failed.
    • On completed: MP4 saved to Drive + shown in chat.
  • Resume:

    • Reload session and verify context + history match prior state.

15. Acceptance Criteria (functional)

ClipLoop is "done" when:

  1. Reference sheet selection
  • User can add and remove multiple ClipPulse sheet URLs.
  1. Context gating
  • Chat cannot begin until Context Pack snapshot exists and initial AI message is generated.
  1. Hypothesis generation
  • AI provides 3–7 hypotheses with explicit success criteria, and the UI presents them as selectable items.
  1. Video generation
  • After selecting a hypothesis, user can generate a HeyGen video and receive the output in chat + Drive.
  1. Logging
  • Every session, message, hypotheses set, and video is logged to the log spreadsheet and Drive.
  1. Resume
  • User can reopen a previous session and continue with correct prior context.

16. Appendix — Required configuration keys (Script Properties)

Store in PropertiesService:

  • CLIPLOOP_LOG_SPREADSHEET_ID (string)
  • CLIPLOOP_ROOT_DRIVE_FOLDER_ID (string)

OpenAI:

  • OPENAI_API_KEY (string)
  • OPENAI_MODEL_REASONING default gpt-5.2-pro (OpenAI Platform)
  • OPENAI_MODEL_TRANSCRIBE default gpt-4o-mini-transcribe

HeyGen:

  • HEYGEN_API_KEY (string)
  • HEYGEN_DEFAULT_AVATAR_ID (string)
  • HEYGEN_DEFAULT_VOICE_ID (string)

App behavior:

  • MAX_EXEMPLARS_PER_PLATFORM (number; default 20)
  • MAX_RANDOM_SAMPLES_PER_PLATFORM (number; default 20)
  • HYPOTHESIS_CANDIDATE_COUNT (number; default 5)

About

Hypothesis-driven video studio: ingest ClipPulse data, chat with AI to generate hypotheses, create HeyGen talking-head videos.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •