ADSX
JULY 1, 2026 // UPDATED JUL 1, 2026

Fetch Your Entire Shopify Catalog with GraphQL

Pull every product and variant from Shopify at scale: cursor pagination vs the Bulk Operations API, rate-limit math, JSONL parsing, and retry-safe code.

AUTHOR
AE
AdsX Engineering
SHOPIFY API & COMMERCE ENGINEERING
READ TIME
6 MIN
SUMMARY

Pull every product and variant from Shopify at scale: cursor pagination vs the Bulk Operations API, rate-limit math, JSONL parsing, and retry-safe code.

To fetch your entire Shopify product catalog you have two tools: cursor-based pagination on the products query for small or filtered pulls, and the Bulk Operations API for exporting everything at once, asynchronously, past the rate limit. Use pagination when you need a few thousand products synchronously; use a bulk operation when you need the whole catalog — variants, media, and all — as a downloadable JSONL file. This guide shows both, with retry-safe code and the rate-limit math you need to not get throttled.

This is the extraction deep-dive that the Shopify Product Catalog API guide points to. For OAuth, tokens, and rate-limit fundamentals, see the Shopify Admin API guide. Everything here uses the GraphQL Admin API — REST is legacy as of 2024–2025 and new catalog features ship to GraphQL first (Shopify: API versioning).

Which approach: a quick decision table

SituationUseWhy
< ~2,000 products, or a filtered subsetCursor paginationSynchronous, simple, fits in one job
Full catalog (all products + variants + media)Bulk Operations APIAsync, no per-page throttling, one JSONL result
Live feed that must reflect edits instantlyWebhooks + targeted readsBulk is a snapshot; webhooks keep it current
One-off audit or migrationBulk Operations APICheapest way to get everything once

The mistake teams make is paginating a 50,000-product catalog synchronously, hitting the throttle every few pages, and building fragile sleep-and-retry loops. That is exactly what the Bulk Operations API exists to replace.

Approach 1: cursor pagination

The products connection returns a pageInfo with hasNextPage and endCursor. You loop, passing endCursor back as after, until hasNextPage is false. Request only the fields your feed needs — GraphQL bills by query cost, so over-fetching burns your rate-limit budget.

query CatalogPage($cursor: String) {
  products(first: 50, after: $cursor) {
    pageInfo { hasNextPage endCursor }
    nodes {
      id
      title
      descriptionHtml
      productType
      vendor
      status
      featuredMedia { ... on MediaImage { image { url altText } } }
      variants(first: 100) {
        nodes { id sku barcode price inventoryQuantity }
      }
    }
  }
}

The driver loop reads the cost object on every response and backs off before the bucket empties:

async function fetchAllProducts(shop, token) {
  const products = [];
  let cursor = null;
  do {
    const res = await fetch(`https://${shop}/admin/api/2026-01/graphql.json`, {
      method: "POST",
      headers: {
        "X-Shopify-Access-Token": token,
        "Content-Type": "application/json",
      },
      body: JSON.stringify({ query: CATALOG_PAGE, variables: { cursor } }),
    });
    const json = await res.json();

    // Cost-based throttling: refill is ~50 points/sec. Wait if we're low.
    const cost = json.extensions?.cost?.throttleStatus;
    if (cost && cost.currentlyAvailable < 200) {
      const deficit = 200 - cost.currentlyAvailable;
      await new Promise((r) => setTimeout(r, (deficit / cost.restoreRate) * 1000));
    }

    const page = json.data.products;
    products.push(...page.nodes);
    cursor = page.pageInfo.hasNextPage ? page.pageInfo.endCursor : null;
  } while (cursor);

  return products;
}

Two things make this production-safe: reading extensions.cost.throttleStatus to pace requests against the real bucket (Shopify: GraphQL rate limits), and capping variants(first: 100) — a product with more than 100 variants needs its own nested pagination, which is a strong signal you should switch to a bulk operation instead.

Approach 2: the Bulk Operations API

For the whole catalog, submit one bulkOperationRunQuery. Shopify runs it asynchronously with no per-page throttling and hands you a single JSONL file with every node — including nested variants and media — flattened into lines.

mutation {
  bulkOperationRunQuery(
    query: """
      {
        products {
          edges {
            node {
              id
              title
              status
              variants { edges { node { id sku barcode price } } }
            }
          }
        }
      }
    """
  ) {
    bulkOperation { id status }
    userErrors { field message }
  }
}

Then poll currentBulkOperation until it completes and exposes a url:

query {
  currentBulkOperation {
    id
    status
    objectCount
    url
  }
}
async function runBulkExport(shop, token) {
  await gql(shop, token, START_BULK_EXPORT); // the mutation above
  // Poll until COMPLETED — prefer the bulk_operations/finish webhook in production.
  let op;
  do {
    await new Promise((r) => setTimeout(r, 5000));
    op = (await gql(shop, token, POLL_BULK)).data.currentBulkOperation;
  } while (op.status === "RUNNING" || op.status === "CREATED");
  if (op.status !== "COMPLETED") throw new Error(`Bulk op ${op.status}`);
  return op.url; // signed URL to the JSONL result
}

In production, don't busy-poll — subscribe to the bulk_operations/finish webhook and start the download when it fires. See webhooks for catalog changes for the pattern.

Parsing the JSONL result (the part people get wrong)

The result is not a nested JSON document — it's one object per line, with children carrying a __parentId back to their parent. Stream it line by line and reassemble; never JSON.parse the whole file.

import readline from "node:readline";

async function assembleCatalog(stream) {
  const products = new Map();
  const orphanVariants = [];

  const rl = readline.createInterface({ input: stream });
  for await (const line of rl) {
    if (!line.trim()) continue;
    const obj = JSON.parse(line);
    if (obj.id.includes("/Product/")) {
      products.set(obj.id, { ...obj, variants: [] });
    } else if (obj.__parentId) {
      const parent = products.get(obj.__parentId);
      if (parent) parent.variants.push(obj);
      else orphanVariants.push(obj); // child seen before parent — reattach after
    }
  }
  for (const v of orphanVariants) products.get(v.__parentId)?.variants.push(v);
  return [...products.values()];
}

Because a bulk export can be hundreds of megabytes, streaming is not optional — it's the difference between a job that runs in constant memory and one that OOM-kills on a large store.

Rate-limit math, briefly

The GraphQL Admin API bills by cost, not request count: a 1,000-point bucket that refills at 50 points/second on a standard plan. A 50-product page with variants can cost 200–500 points, so you get a handful of pages before you wait. The bulk API sidesteps this entirely — the query runs server-side and only the mutation and poll calls count against your bucket. This is the real reason to prefer bulk for full exports: it's not just convenience, it's an order of magnitude more headroom (Shopify: rate limits).

From export to feed

A clean full-catalog export is the raw material for every channel: Google Merchant Center sync, Meta product catalog sync, and AI shopping feeds for ChatGPT and Perplexity. The completeness of what you extract here — titles, GTINs (barcode), images, structured attributes — directly caps how those feeds perform.

That last point is where extraction meets revenue: catalog data quality is the ceiling on ad performance and AI shopping visibility, no matter how good the campaigns are. If your feeds are underperforming, the fix usually starts in the catalog, not the ad account — which is exactly the work AdsX does for Shopify brands. To pressure-test your own data, run a product through the feed-readiness checker.

Next steps

ABOUT THE AUTHOR
AE
AdsX Engineering
SHOPIFY API & COMMERCE ENGINEERING

The AdsX engineering team builds the data pipelines that turn a Shopify product catalog into high-performing ad feeds across Google, Meta, and AI shopping agents. We work hands-on with the Shopify Admin GraphQL API, the Product Feed and Catalog APIs, metafields, and bulk operations every day, and these guides document the patterns we use in production.

MORE BY ADSX ENGINEERING

Ready to Dominate AI Search?

Get your free AI visibility audit and see how your brand appears across ChatGPT, Claude, and more.

Get Your Free Audit