Turn websites into
LLM-ready data

Power your AI apps with clean web data from any website.
It's also open source and Firecrawl-compatible.

[ 200 OK ] [ .JSON ] [ SCRAPE ] [ .MD ]
Scrape
https://www.thordata.com
Scraping...
{
  "url": "https://www.thordata.com",
  "markdown": "# Thordata\n\nThordata provides AI-native web data infrastructure...",
  "json": {
    "title": "Thordata",
    "description": "AI-native web data infrastructure"
  }
}
🔓 100% Open Source
🚀 Self-hostable Deploy Anywhere
Firecrawl Compatible

Open source • Self-hostable • Firecrawl-compatible

Main Features

Scrape

Get LLM-ready data from websites. Markdown, JSON, HTML, screenshots, and more.

from thordata_firecrawl import ThordataCrawl

client = ThordataCrawl(api_key="td-YOUR_API_KEY")

result = client.scrape(
    url="https://www.thordata.com",
    formats=["markdown", "json"]
)

print(result["data"]["markdown"])
curl -X POST "https://thordata-firecrawl-api.onrender.com/v1/scrape" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.thordata.com",
    "formats": ["markdown", "json"]
  }'
const response = await fetch('https://thordata-firecrawl-api.onrender.com/v1/scrape', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    url: 'https://www.thordata.com',
    formats: ['markdown', 'json']
  })
});

const data = await response.json();
console.log(data.data.markdown);

Map

Discover all URLs on a website. Build sitemaps and understand site structure.

result = client.map(
    url="https://www.thordata.com"
)

for url in result["data"]:
    print(url)
curl -X POST "https://thordata-firecrawl-api.onrender.com/v1/map" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.thordata.com"
  }'
const response = await fetch('https://thordata-firecrawl-api.onrender.com/v1/map', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    url: 'https://www.thordata.com'
  })
});

const data = await response.json();
console.log(data.data);

Crawl

Crawl entire websites with BFS traversal. Async jobs with webhook callbacks.

job = client.crawl(
    url="https://www.thordata.com",
    limit=10,
    formats=["markdown"]
)

# Check job status
status = client.get_crawl_status(job["jobId"])
print(status)
curl -X POST "https://thordata-firecrawl-api.onrender.com/v1/crawl" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.thordata.com",
    "limit": 10,
    "formats": ["markdown"]
  }'
const response = await fetch('https://thordata-firecrawl-api.onrender.com/v1/crawl', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    url: 'https://www.thordata.com',
    limit: 10,
    formats: ['markdown']
  })
});

const job = await response.json();
console.log(job.jobId);

Agent

Extract structured data using LLM prompts. Schema-based extraction with AI.

result = client.agent(
    prompt="Extract company name and description",
    urls=["https://www.thordata.com"],
    schema={
        "type": "object",
        "properties": {
            "company_name": {"type": "string"},
            "description": {"type": "string"}
        }
    }
)

print(result["data"])
curl -X POST "https://thordata-firecrawl-api.onrender.com/v1/agent" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Extract company name and description",
    "urls": ["https://www.thordata.com"],
    "schema": {
      "type": "object",
      "properties": {
        "company_name": {"type": "string"},
        "description": {"type": "string"}
      }
    }
  }'
const response = await fetch('https://thordata-firecrawl-api.onrender.com/v1/agent', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    prompt: 'Extract company name and description',
    urls: ['https://www.thordata.com'],
    schema: {
      type: 'object',
      properties: {
        company_name: { type: 'string' },
        description: { type: 'string' }
      }
    }
  })
});

const data = await response.json();
console.log(data.data);

Built to outperform

Core principles, proven performance

🚀

Speed that feels invisible

Blazingly fast. Delivers results in less than 1 second, fast for real-time agents and dynamic apps.

🛡️

No proxy headaches

Reliable. Covers 96% of the web, including JS-heavy pages. No proxies, no puppets, just clean data.

Zero configuration

We handle the hard stuff. Rotating proxies, orchestration, rate limits, js-blocked content and more.

Quick Start

Install

pip install thordata-firecrawl

Scrape Example

from thordata_firecrawl import ThordataCrawl

client = ThordataCrawl(api_key="td-YOUR_API_KEY")

result = client.scrape(
    url="https://www.thordata.com",
    formats=["markdown"]
)

print(result["data"]["markdown"])

Scrape Example

curl -X POST "https://thordata-firecrawl-api.onrender.com/v1/scrape" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.thordata.com",
    "formats": ["markdown"]
  }'

Crawl Example

curl -X POST "https://thordata-firecrawl-api.onrender.com/v1/crawl" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.thordata.com",
    "limit": 10,
    "formats": ["markdown"]
  }'
💡 For local development, replace the URL with http://localhost:3002

Scrape Example

const response = await fetch('https://thordata-firecrawl-api.onrender.com/v1/scrape', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    url: 'https://www.thordata.com',
    formats: ['markdown']
  })
});

const data = await response.json();
console.log(data.data.markdown);
💡 For local development, replace the URL with http://localhost:3002

Interactive Playground

⚠️ Use your own Thordata API Key. Get your key at dashboard.thordata.com. Your key is sent directly to the API server (via HTTPS) and never stored on this page.

Quick Start

⚠️ Use your own Thordata API key. Get one from dashboard.thordata.com. Your key is sent directly to the API and never stored.
Default: Cloud API (HTTPS). For local development, use http://localhost:3002
💡 Tip: Press Ctrl/Cmd + Enter to send request | Click "Load Example" to auto-fill
Quick Error Guide: 401/403 API key invalid or missing 429 Rate limited, retry later 5xx Server/cold start, retry in 5-10s

Response:

// 1. Enter your API Key above
// 2. Click "Load Example" to auto-fill request
// 3. Click "Send Request" or press Ctrl/Cmd + Enter
// 
// Response will appear here...

Use Cases

🤖 AI Platforms

Power your AI agents with real-time web data. Perfect for RAG systems, chatbots, and AI assistants.

🔍 SEO Teams

Analyze competitor content, track rankings, and monitor website changes at scale.

📊 Competitive Intelligence

Monitor competitor websites, track pricing changes, and gather market intelligence.

🔬 Deep Research

Collect and analyze data from multiple sources for research projects and analysis.

📈 Lead Enrichment

Enrich lead data with company information, contact details, and social profiles.

🔄 Data Sync

Keep your database in sync with external websites and data sources.

Integrations

Use well-known tools

🐍

Python SDK

Native Python client with async support

📦

CLI Tool

Command-line interface for quick scraping

🔗

REST API

Standard HTTP API for any language

LangChain

Official LangChain integration

Frequently Asked Questions

Thordata Firecrawl is a Firecrawl-compatible web scraping API that turns websites into LLM-ready data. It provides clean Markdown, JSON, HTML, and screenshots from any website, making it perfect for AI applications, RAG systems, and data pipelines.

Thordata Firecrawl is designed to be API-compatible with Firecrawl, making migration easy. It's powered by Thordata's web data infrastructure and is fully open-source (MIT license) and self-hostable. You can deploy it anywhere without vendor lock-in.

Yes! Thordata Firecrawl is open-source and free to self-host. You only need a Thordata API key for the underlying scraping infrastructure. The code itself is MIT-licensed and can be used in commercial projects.

Absolutely! We provide Docker support and a Render Blueprint (`render.yaml`) for one-click cloud deployment. You can also deploy to any platform that supports Python/Docker, including AWS, GCP, Azure, Fly.io, and more.

Thordata Firecrawl supports Markdown (LLM-ready), JSON (structured data), HTML (raw), and screenshots. You can request multiple formats in a single API call.

Yes! Thordata's infrastructure handles JavaScript-rendered content automatically. You don't need to configure anything - it just works.