robots.txt, DMCA & the June 2026 Scraping Rulings Explained

TL;DR

Two 2026 developments matter for anyone scraping the web. First, in the OpenAI copyright litigation, Judge Sidney Stein rejected the theory that ignoring a robots.txt directive "circumvents" a technological protection measure under the DMCA — because robots.txt is a request, not a barrier. Second, a wave of DMCA-based scraping suits has spread, with YouTube creators filing against Snap and a class action against Meta. The throughline: courts are sorting out which legal theory (CFAA, DMCA, breach of contract, copyright) actually applies to scraping — and the answers depend heavily on what you collect and how. ScrapeMaster is a neutral, local tool that extracts what you can already see and never bypasses access controls — which keeps you on the right side of the lines these cases are drawing. (Not legal advice.)

The short answer: the legal theory depends on what and how you scrape

There is no single "is scraping legal" answer in 2026 — there's a menu of legal theories, and which one bites depends on the facts. The recent rulings are clarifying that menu:

robots.txt ≠ a technical barrier. Judge Stein found that ignoring robots.txt doesn't "circumvent" a protection measure under the DMCA, because robots.txt only requests that crawlers stay out; it doesn't control access. So merely disregarding robots.txt isn't, by itself, a DMCA violation. (That's not a license to ignore it — see below.)
DMCA is the new front. Plaintiffs are increasingly framing scraping as removal of copyright-management information or circumvention under the DMCA. YouTube creators sued Snap, and a separate class action hit Meta, on DMCA theories. These are unsettled.
CFAA and contract still matter. As the LinkedIn/hiQ saga shows, scraping public data may survive a CFAA challenge while still breaching a site's contract. Different theory, different outcome.

The practical upshot: how you scrape (do you bypass anything? do you copy wholesale copyrighted works? do you violate a contract you agreed to?) matters more than the abstract question of whether "scraping" is legal.

A map of the 2026 theories

Legal theory	What triggers it	What the 2026 cases suggest
CFAA ("unauthorized access")	Bypassing a technical access control	Public data likely safe (hiQ); bypassing logins/paywalls is the danger zone
DMCA circumvention	Defeating a technological protection measure	robots.txt isn't one (Stein); active anti-bot defeat might be
DMCA / copyright-management info	Stripping attribution; copying protected works	Active front — Snap, Meta suits; unsettled
Breach of contract	Violating Terms of Service you accepted	Real and enforced (LinkedIn); independent of CFAA
Data protection (GDPR/CCPA)	Collecting personal data	Applies even to public data

This is a map, not legal advice — your facts and jurisdiction decide the outcome, and you should talk to counsel for anything consequential.

What this means for your data pipeline

You don't need to be a lawyer to keep your collection on safer ground. A few principles fall straight out of the cases:

Don't bypass access controls. Logins, paywalls, CAPTCHAs, and aggressive anti-bot systems are exactly the "technical barriers" that turn scraping from gray into risky. Extract what's already visible to you.
Respect robots.txt as a courtesy and a signal. The ruling says ignoring it isn't automatically a DMCA violation — but it's still the site's stated wish, it's relevant to other claims, and honoring it is good citizenship. "Not automatically illegal" is a low bar to clear.
Don't wholesale-copy copyrighted works. Extracting facts and structured data (prices, listings, public records) is different from copying entire articles, images, or videos. The DMCA suits cluster around the latter.
Read the Terms before you scrape a site you have an account on. Contract claims don't need a CFAA hook.
Treat personal data carefully. GDPR/CCPA apply regardless of how public the data is.

How ScrapeMaster fits the safer pattern

ScrapeMaster is built around the "extract what you can already see" principle, which happens to line up with where the law is friendliest:

It doesn't bypass anything. No paywall defeat, no login circumvention, no CAPTCHA solving. As our FAQ puts it: if you can see it logged-in, ScrapeMaster can extract it — and if you can't see it, neither can the tool. That keeps you clear of the CFAA/DMCA-circumvention danger zone.
It runs in your own session. It uses your normal browser, paces requests naturally, and doesn't rotate proxies or fingerprints — so it's not an evasion tool dressed up as a scraper.
Your data stays local. Extracted records live in your browser's IndexedDB and are never uploaded; only page structure is analyzed during auto-detect.
It's neutral. It auto-detects tables, handles pagination and detail pages, and exports to CSV/XLSX/JSON. What you point it at, and what you do with the data, is your call and your responsibility.

That neutrality is the honest framing: a scraper is like a camera — lawful to own and use, and capable of being misused. The 2026 rulings are about misuse patterns (bypassing controls, copying protected works, breaching contracts), not about the existence of extraction tools.

Keep a record of what you collected and from where

As these theories get litigated, provenance matters. For anything sensitive, keep a dated snapshot of the source page and its terms as they read when you collected. Convert: Web to PDF captures a page (including its robots.txt or Terms) as a dated, selectable-text PDF — useful if you ever need to show you scraped public data, honored the rules in place, and didn't bypass anything. We keep both tools free and local by design; our manifesto explains the stance.

Frequently asked questions

Does ignoring robots.txt break the law now?

A 2026 ruling (Judge Stein, in the OpenAI litigation) found that ignoring robots.txt doesn't "circumvent" a technological protection measure under the DMCA, because robots.txt is a request, not an access barrier. That's narrow — it doesn't make ignoring robots.txt advisable, and it doesn't address breach-of-contract or other claims. Treat robots.txt as a signal to respect.

Why are scrapers being sued under the DMCA?

Plaintiffs are framing certain scraping as circumvention of protection measures or removal of copyright-management information. YouTube creators sued Snap, and a class action targeted Meta, on DMCA theories. These cases are unsettled, and they generally involve copying protected content rather than extracting public facts.

Is scraping public data legal?

Often, but it depends on the theory. Public-data scraping has survived CFAA challenges (hiQ), but can still breach a site's Terms of Service, and copying copyrighted works or bypassing technical controls raises separate risks. This isn't legal advice — consult counsel for your situation.

Does ScrapeMaster bypass paywalls, logins, or CAPTCHAs?

No. It extracts only what's already visible to you in your browser and doesn't defeat access controls. That design keeps it clear of the circumvention theories at the center of these cases.

Does ScrapeMaster rotate proxies to evade detection?

No. It uses your normal browser session and lets you set extraction delays, but it doesn't rotate proxies or fingerprints. It's an extraction tool, not an evasion tool.

Do data-protection laws still apply to public web data?

Yes. GDPR and CCPA can apply to personal data regardless of whether it's publicly posted. Purpose limitation, retention, and deletion obligations don't disappear because data is public.

Which browsers does it work on?

Chrome, Edge, Brave, Arc, and any Chromium browser. Not Firefox or Safari.

Bottom line

The 2026 rulings don't give a yes/no answer on scraping — they sharpen the question. robots.txt isn't a DMCA barrier, but the DMCA, CFAA, contract, and data-protection theories all still apply depending on what and how you collect. Stay on the safer side: don't bypass access controls, don't wholesale-copy protected works, respect terms and robots.txt, and keep personal data minimal. ScrapeMaster is built to extract what you can already see — local, neutral, no circumvention — which is exactly the posture these cases reward.