Document version: 1.0
Last updated: 2026-01-22
Status: Final (implementation-ready)
ClipPulse is an internal tool that collects and aggregates information from short vertical videos posted on Instagram, then outputs the results into a single Google Spreadsheet per run, with one row per post and one column per metric.
The tool is designed for on-demand usage: data is fetched each time a human runs a query, not on a schedule.
Note: TikTok collection is currently disabled. The TikTok code is preserved for potential future use.
Convert short-video market trends into structured, numeric, analyzable data so that the team can use the resulting dataset as reference material when forming hypotheses for later testing.
Manual browsing and manual metric collection is possible but inefficient; ClipPulse automates the workflow to improve speed and consistency.
Target data:
- Posts on Instagram
- Video-related data associated with those posts
- Metrics are retrieved only via official APIs.
A human gives instructions in natural language, e.g.:
- "Find 50 posts about skincare trends"
- "Collect TikTok only, 100 posts, US region"
- "Focus on fitness creators"
The system interprets instructions, decides API parameters, and retrieves data.
- No scheduled crawling.
- Runs are triggered by human interaction only.
- Each execution creates one brand-new Spreadsheet containing:
- Tab 1: Instagram
- Each post = one row
- Each metric = one column
- Include a Drive URL per row that points to the stored "video artifact" in Drive
- Include an additional memo column per row for exceptions/notes
Google Apps Script (GAS) + Web App is the primary platform.
Reasons:
- No separate server hosting required
- Native integration with Google Sheets and Drive
- Secure access control via Google accounts
- Fast iteration and simple operations for an internal tool
- Primary: TikTok Research API (best match for public-content research use cases)
- Fallback: TikTok Display API (only when Research API is unavailable and a connected user context is acceptable)
Rationale:
- Research API supports querying public content via structured query conditions.
- Display API is typically tied to authorized user data and may not satisfy "trend research" goals.
- Preferred: Store the actual video file in Drive when it is feasible via official API-returned URLs and Apps Script limits.
- Allowed fallback (approved): If downloading the video file is not feasible, store a Drive "watch artifact" that contains a link to watch the video (so a human can click through and watch).
This preserves the requirement: Drive URL exists and the video is watchable via that link.
- Google Apps Script (V8 runtime)
- Apps Script Web App deployment
- Frontend: HTML Service
- Backend: Apps Script server functions
Google Drive:
- Run folders
- Per-post artifact files (video or watch artifact)
- Per-post raw metadata JSON
- Run manifest JSON
Google Sheets:
- One spreadsheet per run, two tabs
- Instagram Graph API (Instagram API with Facebook Login; professional accounts)
- TikTok Research API (disabled; code preserved for future use)
- TikTok Display API (disabled; code preserved for future use)
OpenAI GPT-5.2 Pro via Responses API
- Model:
gpt-5.2-pro
Used for:
- Parsing the user's natural-language instruction into a structured plan
- Deciding query strategy and parameters within allowed official API capabilities
- Generating fallback strategies if insufficient results are retrieved
- Producing short, consistent "memo" notes when fields are missing or errors occur
Why this choice:
- Highest accuracy / strongest instruction-following among available options
- Supports structured outputs to reduce parsing errors
Store secrets in Apps Script Script Properties (never in client-side code):
- OpenAI API key
- TikTok client key/secret
- Meta app credentials
Use googleworkspace/apps-script-oauth2 library for 3-legged OAuth flows (Meta + TikTok Display, if enabled).
Apps Script has hard limits that affect reliability:
- Script runtime limit: 6 minutes per execution
- UrlFetch limits: request/response size limits and daily call quotas
- Triggers per user per script: limited (must keep active trigger count low)
Therefore, ClipPulse must implement batch processing + continuation:
- Process items in batches (e.g., 10–20 posts at a time)
- Persist run state
- If not finished, schedule a short-delay continuation trigger (not a periodic schedule; only a continuation mechanism)
Header:
- App title and subtitle
- Dark/Light mode toggle button (preference saved to localStorage)
Instruction input:
- Multiline text area
- Placeholder examples
Execute button:
- Starts a new run immediately
Data fields toggle:
- Collapsible section showing all 23 Instagram data fields that will be collected
- Displays field name and type for each column
Status / log view:
- Shows:
- Run status badge with spinner (PLANNING / RUNNING / COMPLETED / FAILED)
- Live status message describing current operation
- Progress counts with animated progress bar (Instagram collected)
- Error messages when applicable
Result link:
- Link to the generated spreadsheet once created (even if still running)
- User enters instruction and clicks Execute
- UI immediately shows:
- Run ID
- Link to spreadsheet (created at run start)
- UI polls run status (e.g., every 3–5 seconds)
- When completed, UI shows final counts and keeps the spreadsheet link
A single root folder is created once:
ClipPulse/
runs/
YYYY/
MM/
YYYYMMDD_HHMMSS_<runShortId>/
spreadsheet/
instagram/
tiktok/
manifests/
Run ID:
- Format:
YYYYMMDD_HHMMSS_<8charHash>
Post artifact folder:
- TikTok:
tiktok/<platform_post_id>/ - Instagram:
instagram/<platform_post_id>/
Files inside each post folder:
raw.json(raw API response for that post)video.mp4(only if downloaded)watch.html(fallback "watch artifact" if video not downloaded)thumbnail.jpg(optional, if available and feasible)
For each row, drive_url must point to the primary artifact:
- If
video.mp4exists →drive_url= URL tovideo.mp4 - Else →
drive_url= URL towatch.html
- Each execution creates one new spreadsheet
- It contains one tab:
- The tab has:
- Header row (row 1)
- Data rows starting at row 2
- Columns are fixed and must not change across runs
These columns appear first in both tabs, in this order:
platform_post_idcreate_usernameposted_at(ISO 8601 UTC)caption_or_description
| Column | Name | Type | Notes |
|---|---|---|---|
| 1 | platform_post_id | string | TikTok video ID |
| 2 | create_username | string | TikTok username |
| 3 | posted_at | string | ISO 8601 UTC |
| 4 | caption_or_description | string | from video_description |
| 5 | region_code | string | |
| 6 | music_id | string/number | store as string to avoid precision issues |
| 7 | hashtag_names | string | JSON string array |
| 8 | effect_ids | string | JSON string array |
| 9 | favorites_count | number | normalize from API (handle spelling differences) |
| 10 | video_duration | number | seconds |
| 11 | is_stem_verified | boolean | |
| 12 | voice_to_text | string | subtitles/transcription if provided |
| 13 | view | number | normalize from view_count |
| 14 | like | number | normalize from like_count |
| 15 | comments | number | normalize from comment_count |
| 16 | share_count | number | |
| 17 | playlist_id | string | |
| 18 | hashtag_info_list | string | JSON string array/object |
| 19 | sticker_info_list | string | JSON string array/object |
| 20 | effect_info_list | string | JSON string array/object |
| 21 | video_mention_list | string | JSON string array/object |
| 22 | video_label | string | JSON or string depending on API |
| 23 | video_tag | string | JSON or string depending on API |
| 24 | drive_url | string | Drive URL to primary artifact |
| 25 | memo | string | short note on missing fields/errors |
| Column | Name | Type | Description |
|---|---|---|---|
| 1 | platform_post_id | string | Unique Instagram media ID |
| 2 | create_username | string | Username of the post creator |
| 3 | posted_at | string | Post timestamp in ISO 8601 UTC format |
| 4 | caption_or_description | string | Post caption text |
| 5 | post_url | string | Shareable permalink URL |
| 6 | like_count | number | Number of likes on the post |
| 7 | comments_count | number | Number of comments on the post |
| 8 | media_type | string | IMAGE / VIDEO / CAROUSEL_ALBUM |
| 9 | media_url | string | Direct URL to the media file (may be ephemeral) |
| 10 | thumbnail_url | string | Thumbnail image URL for videos |
| 11 | shortcode | string | Short code extracted from permalink URL |
| 12 | media_product_type | string | FEED / REELS / STORY / AD |
| 13 | is_comment_enabled | boolean | Whether comments are enabled on the post |
| 14 | is_shared_to_feed | boolean | Whether Reel is shared to feed |
| 15 | children | string | JSON string with carousel children info |
| 16 | edges_comments | string | JSON string with summary or first N comments |
| 17 | edges_insights | string | JSON string with post metrics and insights |
| 18 | edges_collaborators | string | JSON string list of collaborators |
| 19 | boost_ads_list | string | JSON string with boost/promotion ads info |
| 20 | boost_eligibility_info | string | JSON string with eligibility for boosting |
| 21 | copyright_check_information_status | string | Copyright check result status |
| 22 | drive_url | string | Google Drive URL to stored artifact |
| 23 | memo | string | Notes on missing fields or errors |
- Arrays/objects must be stored as JSON strings in a single cell.
- Timestamps: Always store
posted_atas ISO 8601 in UTC. - Empty / unavailable values: Leave blank, and write a short explanation in memo.
Each run transitions through these states:
CREATEDPLANNING(LLM parses instruction into a plan)RUNNING_INSTAGRAMFINALIZINGCOMPLETEDorFAILED
Note:
RUNNING_TIKTOKstate is skipped (TikTok collection disabled).
Input:
- User instruction text
Output:
- A structured plan object containing (minimum):
- target platforms: Instagram, TikTok, or both
- target counts per platform
- extracted keywords/hashtags/creator handles
- time window preference (if stated)
- region preferences (TikTok; if stated)
Important: The plan may influence how to query, but must not change the sheet column schema.
The system can flexibly decide query parameters, but only within official API capabilities.
Use Research API video query endpoint
Build query conditions using:
keyword(from instruction)hashtag_name(from instruction)region_code(if specified)create_daterange (default if not specified)
Pagination:
- use cursor/search_id per API rules until target count reached or no more results
If insufficient results:
- expand date range (e.g., last 7 → 30 days)
- add synonyms/related keywords (LLM-generated)
- switch
is_randomto true if appropriate to broaden sampling
- Only if Research API is not configured/available
- Collect what is possible for the authorized user context
- Missing fields must be blank + memo
Use one of these retrieval strategies (chosen by AI):
- Hashtag-based retrieval (preferred for trend discovery)
- Owned-account media retrieval (fallback)
If insufficient results:
- try multiple hashtags
- broaden to recent media if top media is limited (if available)
- relax filtering constraints
- Do not write duplicate
platform_post_idrows within the same tab. - If duplicates occur from pagination, skip and note in internal logs (not in row memo unless it affects output).
Because Research API does not guarantee a direct downloadable video file:
- Always create a post folder in Drive
- Always store
raw.json - Create
watch.htmlcontaining:- a clickable TikTok watch URL (constructed from username + id when share URL is not provided)
- any additional links returned by fallback APIs (e.g., embed link)
- If video download becomes feasible via official means:
- Download only when it does not exceed Apps Script fetch limits
- Otherwise keep watch artifact only
- Always create a post folder in Drive
- Always store
raw.json - If
media_urlis a video URL and downloading is feasible within Apps Script limits:- download and store
video.mp4
- download and store
- Else:
- store
watch.htmllinking topost_url(permalink)
- store
- If
thumbnail_urlexists and download is feasible:- store
thumbnail.jpg(optional)
- store
The memo column is mandatory and must be populated only when needed.
Allowed memo content (examples):
- "like_count not returned (missing permission or field unavailable)"
- "video not downloaded (URL too large); stored watch.html instead"
- "insights edge unavailable for this media; left blank"
- "TikTok Research API unavailable; used Display API fallback; many fields missing"
Rules:
- Must be short (target ≤ 300 characters).
- Must describe what happened and what the system did.
Minimum required keys:
CLIPPULSE_ROOT_FOLDER_IDOPENAI_API_KEYOPENAI_MODEL(default:gpt-5.2-pro)
TikTok Research API:
TIKTOK_RESEARCH_CLIENT_KEYTIKTOK_RESEARCH_CLIENT_SECRET
TikTok Display API (optional):
TIKTOK_DISPLAY_CLIENT_KEYTIKTOK_DISPLAY_CLIENT_SECRET
Instagram / Meta:
META_APP_IDMETA_APP_SECRETMETA_GRAPH_API_VERSION(stored to allow quick upgrades)IG_DEFAULT_PAGE_ID(or equivalent selection mechanism)IG_DEFAULT_IG_USER_ID(resolved during setup)
Operational:
MAX_POSTS_PER_PLATFORM_DEFAULT(e.g., 30)BATCH_SIZE(e.g., 10–20)MAX_RETRIES(e.g., 3)RETRY_BACKOFF_MS(e.g., 1000 → exponential)
The codebase must be structured by responsibility. Example logical modules:
Web UI
- Serves HTML
- Starts run
- Polls run status
Run Orchestrator
- Creates run folders + spreadsheet
- Stores run state
- Schedules continuation triggers
- Coordinates platform collectors
Instagram Collector
- Retrieves posts via official API strategy
- Fetches per-media details/edges as needed
- Normalizes fields
- Creates Drive artifacts
- Writes rows to Instagram tab
TikTok Collector
- Uses Research API when configured
- Fallback to Display API when needed
- Normalizes fields
- Creates Drive artifacts
- Writes rows to TikTok tab
Sheet Writer
- Creates tabs
- Writes headers
- Appends rows in batches
Drive Manager
- Creates folder structure
- Writes
raw.json - Writes
watch.html - Saves media files when possible
LLM Planner
- Calls OpenAI Responses API
- Produces structured plan
- Produces concise memo messages when needed
State Store
- Persists run state (status, cursors, progress, created IDs)
- Supports resume after trigger continuation
- Create a new Apps Script project named
ClipPulse - Create the Drive root folder
ClipPulse/and record its folder ID - Deploy as a Web App (initial deployment)
- Add Script Properties:
CLIPPULSE_ROOT_FOLDER_IDOPENAI_API_KEYOPENAI_MODEL=gpt-5.2-pro
TikTok Research API:
- Create/apply for a Research project in TikTok developer portal
- Obtain
client_keyandclient_secret - Store in Script Properties
Instagram Graph API:
- Create Meta app
- Configure Instagram Graph API access
- Ensure you have an Instagram professional account connected to a Facebook Page
- Store Meta app credentials in Script Properties
TikTok Research token retrieval (client credentials):
- Store
access_tokenandexpires_atin Script Properties (or in run state cache) - Refresh automatically when expired
Meta (Instagram) OAuth flow:
- Use Apps Script OAuth2 library
- Persist tokens securely
- Resolve and store default
ig_user_id/ page token during setup
- Implement "Create run":
- Generate run ID
- Create Drive run folder structure
- Create spreadsheet with two tabs + headers
- Implement run state persistence
- Implement batch processing + continuation trigger
TikTok collector:
- Research API query + pagination
- Normalize field names into required column schema
Instagram collector:
- Hashtag strategy + fallback strategy
- Fetch media details and optional edges
- Normalize into required schema
For each post:
- Create Drive post folder
- Write
raw.json - Create
video.mp4ORwatch.html - Set
drive_urlaccordingly - Append rows in batches (never per-cell loops)
- Build HTML UI page
- Implement server endpoints:
- start run
- get run status
- Display:
- running progress
- spreadsheet link
- Add retries (429/5xx with exponential backoff)
- Add run failure recovery:
- If partial success, keep spreadsheet and mark run failed with reason
- Ensure memo column is populated for all missing critical fields
A run is considered correct when:
- A user can open the Web App, submit an instruction, and start a run.
- A new spreadsheet is created for each run and contains:
- Instagram tab
- Each collected post occupies exactly one row in the correct tab.
- Columns match the specified schema exactly (names + order).
- Each row contains a valid
drive_urlpointing to:- an mp4 file OR a watch artifact in Drive
- When metrics are missing/unavailable:
- fields are blank
- memo contains a short explanation
- The system completes or fails cleanly without exceeding Apps Script runtime limits (by using batching + continuation).
- TikTok Research API — Getting Started
- TikTok Research API — Video Query (fields list)
- TikTok — Client Access Token Management (client_credentials)
- TikTok Display API — Video Query overview
- Apps Script Quotas (runtime, triggers, urlfetch limits)
- Apps Script OAuth2 Library (googleworkspace/apps-script-oauth2)
- OpenAI — Responses API Reference
- OpenAI — Using GPT-5.2 (model names)
- OpenAI — Structured Outputs guide