Power your AI apps with clean web data from any website.
It's also open source and Firecrawl-compatible.
{
"url": "https://www.thordata.com",
"markdown": "# Thordata\n\nThordata provides AI-native web data infrastructure...",
"json": {
"title": "Thordata",
"description": "AI-native web data infrastructure"
}
}
Open source • Self-hostable • Firecrawl-compatible
Get LLM-ready data from websites. Markdown, JSON, HTML, screenshots, and more.
from thordata_firecrawl import ThordataCrawl
client = ThordataCrawl(api_key="td-YOUR_API_KEY")
result = client.scrape(
url="https://www.thordata.com",
formats=["markdown", "json"]
)
print(result["data"]["markdown"])
curl -X POST "https://thordata-firecrawl-api.onrender.com/v1/scrape" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://www.thordata.com",
"formats": ["markdown", "json"]
}'
const response = await fetch('https://thordata-firecrawl-api.onrender.com/v1/scrape', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
},
body: JSON.stringify({
url: 'https://www.thordata.com',
formats: ['markdown', 'json']
})
});
const data = await response.json();
console.log(data.data.markdown);
Search the web and get full content from results. Powered by Thordata SERP API.
result = client.search(
query="Thordata web scraping API",
limit=5,
engine="google"
)
for item in result["data"]:
print(item["title"], item["url"])
curl -X POST "https://thordata-firecrawl-api.onrender.com/v1/search" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "Thordata web scraping API",
"limit": 5,
"engine": "google"
}'
const response = await fetch('https://thordata-firecrawl-api.onrender.com/v1/search', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
},
body: JSON.stringify({
query: 'Thordata web scraping API',
limit: 5,
engine: 'google'
})
});
const data = await response.json();
console.log(data.data);
Discover all URLs on a website. Build sitemaps and understand site structure.
result = client.map(
url="https://www.thordata.com"
)
for url in result["data"]:
print(url)
curl -X POST "https://thordata-firecrawl-api.onrender.com/v1/map" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://www.thordata.com"
}'
const response = await fetch('https://thordata-firecrawl-api.onrender.com/v1/map', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
},
body: JSON.stringify({
url: 'https://www.thordata.com'
})
});
const data = await response.json();
console.log(data.data);
Crawl entire websites with BFS traversal. Async jobs with webhook callbacks.
job = client.crawl(
url="https://www.thordata.com",
limit=10,
formats=["markdown"]
)
# Check job status
status = client.get_crawl_status(job["jobId"])
print(status)
curl -X POST "https://thordata-firecrawl-api.onrender.com/v1/crawl" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://www.thordata.com",
"limit": 10,
"formats": ["markdown"]
}'
const response = await fetch('https://thordata-firecrawl-api.onrender.com/v1/crawl', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
},
body: JSON.stringify({
url: 'https://www.thordata.com',
limit: 10,
formats: ['markdown']
})
});
const job = await response.json();
console.log(job.jobId);
Extract structured data using LLM prompts. Schema-based extraction with AI.
result = client.agent(
prompt="Extract company name and description",
urls=["https://www.thordata.com"],
schema={
"type": "object",
"properties": {
"company_name": {"type": "string"},
"description": {"type": "string"}
}
}
)
print(result["data"])
curl -X POST "https://thordata-firecrawl-api.onrender.com/v1/agent" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "Extract company name and description",
"urls": ["https://www.thordata.com"],
"schema": {
"type": "object",
"properties": {
"company_name": {"type": "string"},
"description": {"type": "string"}
}
}
}'
const response = await fetch('https://thordata-firecrawl-api.onrender.com/v1/agent', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
},
body: JSON.stringify({
prompt: 'Extract company name and description',
urls: ['https://www.thordata.com'],
schema: {
type: 'object',
properties: {
company_name: { type: 'string' },
description: { type: 'string' }
}
}
})
});
const data = await response.json();
console.log(data.data);
Core principles, proven performance
Blazingly fast. Delivers results in less than 1 second, fast for real-time agents and dynamic apps.
Reliable. Covers 96% of the web, including JS-heavy pages. No proxies, no puppets, just clean data.
We handle the hard stuff. Rotating proxies, orchestration, rate limits, js-blocked content and more.
pip install thordata-firecrawl
from thordata_firecrawl import ThordataCrawl
client = ThordataCrawl(api_key="td-YOUR_API_KEY")
result = client.scrape(
url="https://www.thordata.com",
formats=["markdown"]
)
print(result["data"]["markdown"])
curl -X POST "https://thordata-firecrawl-api.onrender.com/v1/scrape" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://www.thordata.com",
"formats": ["markdown"]
}'
curl -X POST "https://thordata-firecrawl-api.onrender.com/v1/crawl" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://www.thordata.com",
"limit": 10,
"formats": ["markdown"]
}'
💡 For local development, replace the URL with http://localhost:3002
const response = await fetch('https://thordata-firecrawl-api.onrender.com/v1/scrape', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
},
body: JSON.stringify({
url: 'https://www.thordata.com',
formats: ['markdown']
})
});
const data = await response.json();
console.log(data.data.markdown);
💡 For local development, replace the URL with http://localhost:3002
⚠️ Use your own Thordata API Key. Get your key at dashboard.thordata.com. Your key is sent directly to the API server (via HTTPS) and never stored on this page.
http://localhost:3002
401/403 API key invalid or missing
429 Rate limited, retry later
5xx Server/cold start, retry in 5-10s
// 1. Enter your API Key above
// 2. Click "Load Example" to auto-fill request
// 3. Click "Send Request" or press Ctrl/Cmd + Enter
//
// Response will appear here...
Power your AI agents with real-time web data. Perfect for RAG systems, chatbots, and AI assistants.
Analyze competitor content, track rankings, and monitor website changes at scale.
Monitor competitor websites, track pricing changes, and gather market intelligence.
Collect and analyze data from multiple sources for research projects and analysis.
Enrich lead data with company information, contact details, and social profiles.
Keep your database in sync with external websites and data sources.
Use well-known tools
Native Python client with async support
Command-line interface for quick scraping
Standard HTTP API for any language
Official LangChain integration
Thordata Firecrawl is a Firecrawl-compatible web scraping API that turns websites into LLM-ready data. It provides clean Markdown, JSON, HTML, and screenshots from any website, making it perfect for AI applications, RAG systems, and data pipelines.
Thordata Firecrawl is designed to be API-compatible with Firecrawl, making migration easy. It's powered by Thordata's web data infrastructure and is fully open-source (MIT license) and self-hostable. You can deploy it anywhere without vendor lock-in.
Yes! Thordata Firecrawl is open-source and free to self-host. You only need a Thordata API key for the underlying scraping infrastructure. The code itself is MIT-licensed and can be used in commercial projects.
Absolutely! We provide Docker support and a Render Blueprint (`render.yaml`) for one-click cloud deployment. You can also deploy to any platform that supports Python/Docker, including AWS, GCP, Azure, Fly.io, and more.
Thordata Firecrawl supports Markdown (LLM-ready), JSON (structured data), HTML (raw), and screenshots. You can request multiple formats in a single API call.
Yes! Thordata's infrastructure handles JavaScript-rendered content automatically. You don't need to configure anything - it just works.