What formats does the bulk export come in?

JSON or CSV — your choice — delivered as one file set per entity type (transcripts, extracted units, chunks, and clusters) so the joins are explicit. Files are scoped to the niches and volume you request.

What fields are in each record?

Transcripts carry id, niche, source_kind (vsl or ad), and full text. Units carry unit_kind and their parent transcript. Chunks are section-typed transcript segments. Clusters carry membership plus a weighted recency, prevalence, and niche-diversity score.

Can I scope the dataset to specific niches?

Yes. Access is provisioned to the niche slugs you name, so you receive only the creatives, units, chunks, and clusters for those verticals. Scoping keeps export sizes and API responses lean and ties cost to what you use.

How fresh is the data?

Nightly by default — new and updated creatives flow into the corpus each night, so the API serves the latest on each pull and an export reflects the corpus at build time. A different cadence can be set during scoping.

Is the API read-only?

Yes. Both the API and the bulk export return copies of the dataset. Nothing you send mutates the corpus, and there is no write surface on your side, which keeps the integration narrow and the dataset a clean shared source of truth.

Provisioning is on demand. Tell us your niches, volume, whether you want a full export or an incremental feed, and your cadence via /contact, and we scope and stand up the export or API for your team.

Ad Creative Dataset — Bulk Export & API Access

8,226+

Validated VSLs & ad creatives in the dataset

72+

Niches you can scope access to

JSON / CSV / API

Read-only export & feed formats

What the dataset contains

This page is the access layer for the VSL and ad creative dataset — the structured corpus described in the parent overview at /vsl-ad-creative-dataset. It is the same intelligence layer the AI Copy Agent queries internally, exposed to your team as data you can pull and process yourself.

At its core the dataset is a set of validated direct-response creatives: 8,226+ VSLs and ad creatives across 72+ niches, each captured as a transcript and then decomposed into structured records. You do not get a folder of screenshots — you get parsed, normalized rows you can load into a warehouse, a notebook, or your own model pipeline on day one.

Each creative carries a niche assignment, a source kind (vsl or ad), the full transcript, the extracted units pulled from that transcript, the retrieval chunks the transcript was split into, and the cluster membership that ties recurring patterns together across the corpus. That last field is what turns a pile of transcripts into a queryable map of what is recurring in-market.

The field schema

Records are stable and typed so you can build against them without guessing. The primary entities are transcripts, extracted units, chunks, and clusters, joined by id.

A transcript record carries its id, niche (canonical slug), source_kind (vsl or ad), and the full transcript text. Extracted units are the typed building blocks pulled from each transcript — hooks, pains, tactics, authorities, promises, social proof, urgency, CTAs, vocabulary, villains, avatars, and mechanisms — each tagged with its unit_kind and the transcript it came from. Chunks are the transcript split into section-typed segments (opening, big_idea, problem, agitation, mechanism, social_proof, offer, urgency_scarcity, close, and ad-specific kinds), which is the granularity our own retrieval runs on.

Cluster membership links units that express the same recurring pattern, with a weighted score that blends recency, how many creatives use the pattern, and how many niches it spans. With those four entity types you can reproduce most of the analysis the agent does — most common hooks per niche, cross-niche patterns, or section-level prose study — entirely inside your own stack.

Two ways to access it: bulk export and API

Bulk export is the simplest path. We deliver the scoped records as JSON or CSV — one file set per entity type (transcripts, units, chunks, clusters) so the joins are obvious — sized to the niches and volume you ask for. This suits a one-time corpus load, a training or evaluation set, or any workflow where you want the whole slice in your warehouse and will refresh on your own schedule.

The API is for teams that want programmatic, ongoing pulls: query by niche, by source kind, by unit kind, or incrementally since a timestamp, and page through results rather than moving one large file. It mirrors the same field schema as the export, so you can prototype against a bulk sample and then switch to the API for the live feed without rewriting your parsers.

Both modes are read-only. They return copies of the dataset; nothing you send back mutates the corpus, and there is no write surface to secure on your side. That keeps the integration narrow and the dataset a clean, shared source of truth.

License the dataset on demand

Tell us your niches and use case — we scope a dataset export or API feed for your team.

Request dataset access

Per-niche scoping and nightly freshness

You rarely need all 72+ niches. Most teams scope to the verticals they operate in, so the access is provisioned to the niches you name — you receive the creatives, units, chunks, and clusters for those slugs and nothing else. Scoping keeps export sizes manageable, keeps API responses fast, and keeps your cost tied to what you actually use.

Freshness is nightly. New and updated creatives flow into the corpus on a nightly cadence, so a bulk export reflects the corpus as of its build, and the API serves the latest records on each pull. If your use case needs a different cadence — a weekly snapshot, or a tighter refresh on a hot niche — we set that during scoping rather than forcing one fixed rhythm.

What teams build on it

The dataset is a substrate, not a finished product, which is the point: it lets your team build the thing the off-the-shelf agent does not. Analytics teams load it to track which hooks, mechanisms, and offers are recurring in their niches over time. Data-science teams use the structured units and cluster labels as a grounded training or evaluation set for their own copy or classification models. Agencies and tooling vendors wire the API into internal swipe-file search, briefing generators, or QA checks that score drafts against real winning structure.

Because every record is traceable back to a real creative and its niche, the analysis stays grounded — you are reasoning over validated in-market copy, not synthetic examples. What you build on top is yours; we provide the read-only feed and keep it fresh.

How provisioning works

Access is provisioned on demand rather than self-serve, because the right shape depends on your use case. We scope three things with you: which niches you need, the volume and whether you want a full export or an incremental feed, and the cadence. From that we provision either a delivered export or scoped API credentials.

The fastest way to start is to tell us your niches and what you intend to build. We will confirm coverage against the live corpus, propose an export or API feed, and stand it up. If you want the conceptual overview of the corpus before you scope, the parent page at /vsl-ad-creative-dataset walks through how the dataset is built and validated.

The bottom line

The ad creative dataset API and bulk export give your team direct, read-only access to the same 8,226+-creative corpus the AI Copy Agent runs on — structured transcripts, extracted units, section chunks, and cluster membership across 72+ niches, scoped to your verticals and refreshed nightly. You bring the use case; we scope the feed and keep it current.

Frequently asked questions

What formats does the bulk export come in?
JSON or CSV — your choice — delivered as one file set per entity type (transcripts, extracted units, chunks, and clusters) so the joins are explicit. Files are scoped to the niches and volume you request.
What fields are in each record?
Transcripts carry id, niche, source_kind (vsl or ad), and full text. Units carry unit_kind and their parent transcript. Chunks are section-typed transcript segments. Clusters carry membership plus a weighted recency, prevalence, and niche-diversity score.
Can I scope the dataset to specific niches?
Yes. Access is provisioned to the niche slugs you name, so you receive only the creatives, units, chunks, and clusters for those verticals. Scoping keeps export sizes and API responses lean and ties cost to what you use.
How fresh is the data?
Nightly by default — new and updated creatives flow into the corpus each night, so the API serves the latest on each pull and an export reflects the corpus at build time. A different cadence can be set during scoping.
Is the API read-only?
Yes. Both the API and the bulk export return copies of the dataset. Nothing you send mutates the corpus, and there is no write surface on your side, which keeps the integration narrow and the dataset a clean shared source of truth.
How do I get access?
Provisioning is on demand. Tell us your niches, volume, whether you want a full export or an incremental feed, and your cadence via /contact, and we scope and stand up the export or API for your team.