Structured data extraction

Pull screenshots, page content, and structured JSON from any URL in a single API call

Turn any URL into a structured bundle — screenshot, main-content markdown, and selector-driven JSON — without stitching a browser, an HTML parser, and a Markdown converter together yourself.

The problem

Most "fetch data about a URL" workflows end up looking the same:

Run a headless browser to capture a screenshot
Run a separate fetch to download the HTML
Parse the HTML to pull meta tags, favicons, and OG images
Run a Markdown converter to feed an LLM
Hope all four of those see the same version of the page

Four round trips, two browsers, drifting page state, and four billing surfaces.

The outputs array on POST /v1/screenshots returns the screenshot, raw HTML, main-content markdown, and CSS-selector-driven JSON from a single browser session — and counts as one screenshot against your quota no matter how many output types you request.

One call, one billing unit — five outputs cost the same as one screenshot
Same DOM for every output — the screenshot and the scrape are consistent by construction
No glue infrastructure — no Puppeteer fleet, no parser cache, no Markdown converter to maintain
LLM-ready — pair the screenshot with the markdown for industry/tone classification

Multi-output works on the sync, async, and bulk endpoints. For batching prospect lists or nightly refreshes, see bulk processing.

Scenarios

Onboarding auto-fill

Users paste their company URL; you pre-fill name, description, logo, and brand colors.

Sales / CRM enrichment

Paste a prospect URL into the contact card and auto-derive company info from one request.

Link previews

Generate rich URL cards with title, description, image, and favicon for chat or comments.

Competitor & SEO audits

Snapshot titles, metas, canonicals, and headings for any list of URLs on a schedule.

Onboarding auto-fill

Pull title, description, ogImage, themeColor, and favicon to seed a brand profile. Pair the screenshot and markdown with an LLM to classify industry and write a tagline.

{
  "title":      { "selector": "meta[property=\"og:title\"]", "type": "attribute", "attribute": "content" },
  "description":{ "selector": "meta[name=description]",     "type": "attribute", "attribute": "content" },
  "logoImage":  { "selector": "meta[property=\"og:image\"]", "type": "attribute", "attribute": "content" },
  "themeColor": { "selector": "meta[name=theme-color]",     "type": "attribute", "attribute": "content" }
}

Sales / CRM enrichment

Add a prospect's URL to a contact card and auto-fill company name, tagline, and homepage hero copy. The markdown gives the LLM enough to write a one-line industry summary.

{
  "company":     { "selector": "meta[property=\"og:site_name\"]", "type": "attribute", "attribute": "content" },
  "tagline":     { "selector": "h1", "type": "text" },
  "description": { "selector": "meta[name=description]", "type": "attribute", "attribute": "content" }
}

Competitor & SEO audits

Run the same schema against every competitor URL on a schedule. Diff the results to catch title rewrites, meta-description changes, or canonical drift.

{
  "title":     { "selector": "title", "type": "text" },
  "metaDesc":  { "selector": "meta[name=description]", "type": "attribute", "attribute": "content" },
  "canonical": { "selector": "link[rel=canonical]",    "type": "attribute", "attribute": "href" },
  "h1":        { "selector": "h1", "type": "text", "multiple": true }
}

Quick example

One request that covers most workflows above:

curl -X POST 'https://api.allscreenshots.com/v1/screenshots' \
  -H 'X-API-Key: YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "url": "https://acme.com",
    "blockCookieBanners": true,
    "responseType": "url",
    "outputs": [
      { "type": "screenshot", "format": "png" },
      { "type": "markdown", "mainContentOnly": true },
      { "type": "json", "schema": {
        "title":      { "selector": "title", "type": "text" },
        "description":{ "selector": "meta[name=description]",     "type": "attribute", "attribute": "content" },
        "ogImage":    { "selector": "meta[property=\"og:image\"]", "type": "attribute", "attribute": "content" },
        "themeColor": { "selector": "meta[name=theme-color]",     "type": "attribute", "attribute": "content" },
        "favicon":    { "selector": "link[rel~=icon]",            "type": "attribute", "attribute": "href" }
      }}
    ]
  }'

Next steps

Multi-output extraction guide — the step-by-step build, including LLM post-processing with Claude or OpenAI.
Outputs API reference — the canonical option list and response shapes.
Async jobs and bulk — for batches and webhooks.

Structured data extraction

The problem

How Allscreenshots helps

Scenarios

Onboarding auto-fill

Sales / CRM enrichment

Link previews

Competitor & SEO audits

Onboarding auto-fill

Sales / CRM enrichment

Competitor & SEO audits

Quick example

Next steps

On this page