Kadrey v. Meta
2023–Authors alleged LLaMA was trained on Books3. The fight turned on whether plaintiffs could demonstrate per-work memorization, not just inclusion in a dataset.
IPLYR runs a forensic extraction test and returns a binary verdict — EXTRACTION CONFIRMED or NO EXTRACTION DETECTED — with a signed, hash-anchored evidence bundle you can put in front of a judge.
Method based on Ahmed, Cooper, Koyejo & Liang, Extracting books from production language models (arXiv:2601.02671). Designed for Daubert / FRE 702 admissibility — not a claim of prior court acceptance.
Method: nv-recall block matching (Ahmed, Cooper, Koyejo & Liang — arXiv:2601.02671). A verdict is the test's output at a chosen sensitivity threshold, not a legal conclusion.
Sources: Authors Guild & Susman Godfrey on the Anthropic settlement; ChatGPT Is Eating the World AI Copyright Case Tracker (June 1, 2026); EU Commission AI Act Article 53. IPLYR is not counsel of record in these matters.
Pick the door that fits the work you're doing today. The forensic record underneath is the same — what changes is the deliverable.
Run the Litmus Test on a specific work and a specific model. Receive a binary verdict, a Daubert-compliant evidence bundle, hash-anchored and signed — sized for motion practice and case-portfolio underwriting.
Audit a training corpus for copyrighted exposure before regulators or plaintiffs do. Quote a deal against 270+ tracked comparators. The same forensic engine — applied upstream.
Competitors produce PDFs. IPLYR produces a cryptographic evidence bundle that survives a motion to exclude and a hostile cross-examination.
Every artifact in the report is hashed into an RFC 6962 / 9162 Merkle tree. Any later alteration to a single line of text, a single prompt, or a single response is mathematically detectable.
The evidence root is anchored to an external OpenTimestamps proof on Bitcoin's blockchain — establishing when the record existed without trusting IPLYR's clock or servers.
Each package carries a C2PA signed seal disclosing the signer, the signer's trust level, and the method version. No anonymous, unverifiable claims.
Honesty note: We disclose signer trust level and never assert a timestamp or signature we cannot produce. A signed seal is a verifiable record of provenance — not a substitute for chain-of-custody discipline by counsel.
IPLYR's fair-use scoring and jurisdiction model update as rulings land. Sophisticated buyers test for currency; here it is, on the homepage.
Pirated copies aren't transformative; ~$3K/work settlement floor across ~500K titles.
Fair use found where plaintiffs failed to develop an output-infringement record.
Memorization is permanent infringement under EU law — not transient copying.
Couldn't prove where training happened, nor that the model itself contained the works.
No fair use for a market-substitute AI built on a competitor's copyrighted database.
Holdings summarized for marketing — accurate but compressed. This is product capability, not legal advice.
Discovery surfaces probabilities. Trials demand proof. Without reproducible methodology and a defensible chain of custody, even strong matters collapse at Daubert.
Authors alleged LLaMA was trained on Books3. The fight turned on whether plaintiffs could demonstrate per-work memorization, not just inclusion in a dataset.
Stylistic and watermark-residual claims required cross-modal attribution scoring that withstood adversarial-robustness challenges.
Class settlements have repeatedly collapsed back to bespoke expert work. Standardized, reproducible attribution would have changed the negotiating posture.
From multi-modal similarity to court-ready packaging — every stage is research-backed and adversarially tested.
Multi-modal Transformation Quotient across text, image, audio, video. 11-dim statistical + neural + adversarial-robust. Defensible similarity score with bias-corrected confidence intervals.
nv-recall, Cooper et al. PDE, Min-K%++, DE-COP, MIA ensemble, distillation/merge detection, LoRA probes, audio MIA, cross-modal MIA.
Four-factor fair use + Bibas + post-Warhol analyzers. 111-case validated dataset. Jurisdiction-aware US circuit-by-circuit, EU DSM, UK, DE, JP, KR. DMCA + GDPR + EU AI Act Annex III workflows. Settlement pricing engine.
Daubert-tier evidence classification, SHA-256 chain of custody, blockchain anchoring, court-ready PDF export, deposition-ready exhibits.
Five stages, end-to-end, with a signed evidence chain at each step.
Upload your copyrighted content and select the target LLM, image, audio, or video model. We never retain plaintext beyond the engagement window.
1$ iplyr submit \2 --work ./novel.epub \3 --target anthropic/claude-3.7 \4 --jurisdiction US-9 \5 --retention 30d6 ✔ work hashed sha256:b2f0…d41a7 ✔ AES-256 sealed envelope_id env_01J…8 → matter_id mat_01J9X8K2QVIPLYR runs multi-turn extraction probes, MIA ensembles, and cross-modal alignment against the target model with adversarial-robustness controls.
1[probe 01] divergence-attack seed=… budget=2048t2[probe 02] prefix-completion k=64 temp=0.03[probe 03] paraphrase-canary n=1284[probe 04] style-fingerprint (CSD) d=5125[probe 05] watermark-residual FFT-band=π/46 → 16 probes complete · 14 positive · 0 inconclusiveTQ + nv-recall + concept memorization + RAG two-stage analysis produce per-block scores with confidence intervals.
Results are mapped to SaTML 2025 evidence tiers with Daubert compliance flags and jurisdiction-specific thresholds.
Signed PDF with chain of custody, expert-declaration template, exhibit list, and blockchain-anchored hash. Ready for filing.
1{2 "package_id": "pkg_01J9X8K2QV",3 "tier": "A",4 "daubert_compliant": true,5 "chain_of_custody": "sha256:9c3a…f201",6 "btc_anchor": { "block": 832914, "txid": "5fa1…" },7 "exhibits": ["A-Findings.pdf", "B-PerBlock.pdf", "C-Methods.pdf"],8 "expert_declaration_template": "decl-rev-2025.03.docx",9 "reproducible_from_hash": true10}Discover, RFQ, clear, and settle training-data licenses with verified provenance.
Daily reference prices for training-data classes — text, image, audio, video, code — segmented by domain and rights regime.
The forensic methods change with the medium. The chain of custody does not.
Search, filter and inspect every detector in the IPLYR forensic stack. Each method links to its paper and to a /methods slug page with implementation, test count, and adversarial-robustness notes.
Calibrated membership-inference attack against pretraining data. Uses per-token log-likelihood vs. neighborhood entropy to achieve state-of-the-art AUROC across LLaMA, Pythia, GPT-NeoX families. Decouples token informativeness from membership signal, producing a clean p-value under permutation null.
Detection of copyrighted content via paraphrase-distinguishing probes. Forces the model to choose between original passages and high-quality paraphrases; preference for the original is statistically attributable to memorization.
Non-verbatim recall measurement. Replaces exact-match with semantic-equivalence scoring under a calibrated neighborhood, recovering memorization signal that survives stylistic rewriting and translation.
Probabilistic discoverable extraction with confidence bounds. Estimates the probability that a target string is recoverable from a model under a budgeted prompt distribution, with formal lower-bound guarantees.
Training-data attribution at scale via random-projection of gradients. Returns a per-training-example influence score on a target query.
Scalable TRAK variant for billion-parameter diffusion models. Uses distilled gradient surrogates and locality-sensitive hashing for tractable attribution across catalog-scale training indices.
Low-rank adapter probes that fingerprint memorization-prone parameter subspaces. Surface circuits where training-data residue concentrates.
Speech and music MIA combining acoustic loss-curvature, mel-spectrogram divergence, and waveform-level NN-distance.
Contrastive style descriptors capturing artist-specific visual fingerprints invariant to subject matter. Validated for Getty / artist-style cases.
Melodic similarity across pitch-class profiles and rhythmic envelopes for music memorization claims.
Spatiotemporal action-level latent localization for short-form video memorization detection.
Forensic detection of model merges and lineage via centered kernel alignment between candidate parent models and a target.
Independent novelty analysis identified 12 patentable inventions across attribution, forensics, and evidence packaging. The top three rated 5/5 on novelty.
Statistical aggregation of independent membership-inference detectors using Fisher's, Stouffer's, Simes', harmonic-mean, and Cauchy combination — calibrated against dependence structure.
Diffusion-model attribution via inverted embedding probes. Recovers a pseudo-token best activating a target style, with significance scoring against a distractor distribution.
Automated evidentiary-tier classification mapping detector outputs to the four Daubert factors — testability, peer review, known error rates, and general acceptance.
Pre-filing diligence, expert-declaration templates, cross-jurisdiction memos, and a calibrated settlement-pricing model.
Independent landscape analysis of the AI training-data attribution space.
| Capability | IPLYR | ProRata | Patronus | Spawning | Story |
|---|---|---|---|---|---|
| Funding | Solo / pre-seed | $30M | $17M | $6M | $54M |
| Daubert-tier evidence packaging | |||||
| Memorization forensics | 40+ methods | Hallucination focus | Opt-out registry | ||
| Jurisdictions modeled | 9 + 10 US states | US | US | US | US |
| Pricing | $2K–$5K / report | Enterprise | Enterprise | Free opt-out | Token-based |
Forensic evidence is a category none of them serve. Plaintiffs in Kadrey, Getty, and Bartz-style matters are losing for lack of exactly this.
Methods reviewed with IP attorneys at AIPLA · Validated against the 0-case fair-use dataset (53 FU / 58 NFU) · Built on 0 CourtListener copyright precedents (71M opinion clusters scanned) · 0 judge profiles · 0 landmark cases indexed · 0 distillation-moat tests passing · 0 TDPI tests across 16 modules · 193/193 smart contracts compile (OZ v5) · 9 jurisdictions + 10 US state statutes modeled · 14 LLM provider cost models.
Forensic-grade evidence at a fraction of an expert engagement.
Single-engagement evidence package, Daubert-classified, 7-day turnaround.
Multi-work, multi-model, expert-declaration drafting, deposition support, settlement-pricing memo.
Catalog-scale monitoring across 50+ models, weekly reports, alert on new memorization events.
White-label, on-prem, SSO, SOC 2, audit logs, contractual SLAs.
Drag the sliders to model a matter. Baseline: $2K per work per model. Expert engagements anchor at $50K–$500K (American Lawyer expert rate surveys, 2024).
Every detector is published. Every score is reproducible from the chain-of-custody hash.
Pick an endpoint, inspect the request, watch a realistic response stream in. Outputs below are canned-but-realistic — actual calls require an API key.
{
"model": "anthropic/claude-3.7",
"work_blocks": ["sha256:b2f0…d41a", "sha256:9c3a…f201"],
"k_percent": 20
}…40+ attribution & forensic endpoints · 8 TDPI marketplace endpoints · 6 memorization-risk endpoints. Full OpenAPI 3.1 spec at /docs/api.
Circuit-by-circuit US analysis plus the major international copyright and AI regimes. Hover the map to drill in.
"I built IPLYR because I sat through too many AI copyright matters where plaintiffs had a real case and no forensic vocabulary to prove it.
The methods exist in the literature — Min-K%++, nv-recall, Cooper PDE, TRAK, MIA ensembles — but they live in conference papers, not in litigation workflows. IPLYR translates them into evidentiary-grade packages with chain of custody, Daubert classification, and jurisdiction-aware analysis.
Methods reviewed with IP attorneys at AIPLA. Open to expert-witness collaborations and academic partnerships."
IPLYR is a platform. Each lane has its own audience, narrative, pricing, and CTA.
IPLYR is a forensic evidence platform. We do not provide legal advice or represent parties in litigation.