Website Intel
Company ResearchScrape any website and extract structured data using a custom schema.
Overview
Website Intel is an MCP server that scrapes any public website and returns structured JSON data based on a schema you define. It handles JavaScript-heavy SPAs, dynamic content, and can crawl multiple pages following links. Under the hood, it uses a headless browser with crawl4ai for intelligent data extraction — you describe what you want, and it pulls exactly that from any page. Whether you need pricing tables, team directories, product feature lists, or blog metadata, Website Intel transforms unstructured web pages into clean, typed data ready for your sales workflows.
Currently macOS only. Windows and Linux support coming soon.
Use Cases
Extract Pricing Page Data for Competitive Analysis
Your sales team needs to understand how a competitor structures their pricing tiers. Instead of manually copying pricing details from their website, you point Website Intel at their pricing page with a schema that defines fields like tier name, price, features included, and limits. The MCP renders the JavaScript-heavy pricing page, extracts every tier into structured JSON, and returns it in seconds.
Expected outcome: A clean JSON object with each pricing tier, its monthly and annual cost, feature list, and usage limits — ready to paste into a competitive battle card or feed into a comparison spreadsheet.
Build a Prospect List from a Conference Speaker Page
A major industry conference publishes its speaker lineup on a dynamic webpage. You want to build a prospect list from the speakers — names, titles, companies. You define a schema with those fields and set Website Intel to crawl the speaker directory, following pagination links across multiple pages.
Expected outcome: A structured list of every speaker with their name, job title, and company — typically 50 to 200 contacts from a single conference page, ready for outreach sequencing.
Scrape Product Feature Lists for Qualification
Before reaching out to a prospect, you need to understand what their product actually does. You point Website Intel at their product or features page and ask it to extract the feature categories, individual features, and any integration mentions. This tells you whether they are a fit for your solution.
Expected outcome: A categorized list of the prospect's product features and integrations, enabling your rep to write a personalized first email that references specific capabilities.
Capabilities
- ●Scrapes any public URL including JavaScript-rendered single-page applications
- ●Crawls multiple pages by following links with configurable page limits (1–10 pages)
- ●Accepts user-defined JSON schemas to extract exactly the data you need
- ●Renders dynamic content using a headless browser (Playwright under the hood)
- ●Uses LLM-powered extraction to intelligently map page content to your schema
- ●Handles pagination, tabs, accordions, and other interactive UI elements
- ●Returns clean, typed JSON output ready for downstream processing
Data Sources
Any Website — Scrapes and extracts structured data from any public URL
Tools
Scrapes a webpage or crawls multiple pages and extracts structured data as JSON using a custom schema. Supports single-page scraping with full JS rendering and multi-page crawling that follows links.
Parameters
| Name | Type | Required | Description |
|---|---|---|---|
url | string | Yes | Full URL to process (must include http/https protocol) |
schema | object | Yes | JSON Schema defining the desired output data structure |
prompt | string | Yes | Natural language extraction instructions describing what data to pull |
mode | string | No | "scrape" for single page with JS rendering, or "crawl" for multi-page link following. Default: "scrape" |
limit | integer | No | Maximum pages to crawl in crawl mode. Range: 1–10. Default: 5 |
Response Fields
JSON structured according to the user-provided schema
Dependencies
- ●macOS (Windows and Linux support coming soon)
- ●Python 3.10+
- ●Node.js 20+
- ●LLM API key (OpenAI, Anthropic, or compatible provider)
- ●crawl4ai (installed automatically during setup)
Works With
Run Website Intel first to extract company details from their website — pricing pages, team pages, product features — then chain Techstack Intel to detect their tools, and Social Intel to research key contacts on LinkedIn.
Used in Skills
Quick Setup
git clone https://github.com/ekas-io/open-sales-stack.git
cd open-sales-stack
./scripts/setup.sh./scripts/add-to-claude.sh --website-intelFrequently Asked Questions
Does Website Intel work on websites that require JavaScript to render?▾
What happens if the website blocks scraping or requires login?▾
How accurate is the data extraction compared to manual copy-paste?▾
Can I crawl an entire website?▾
Do I need to pay for an API key to use Website Intel?▾
What is the difference between scrape mode and crawl mode?▾
Related
Techstack Intel
Company ResearchDetect a company's technology stack from their website — no API keys needed.
View details →Social Intel
People ResearchScrape LinkedIn profiles, company pages, and posts for prospect research.
View details →Review Intel
Market IntelligenceExtract ratings, reviews, and sentiment from G2, Capterra, and Glassdoor.
View details →Need help with this MCP?
This MCP is open source. Need help integrating it into your sales stack, or want us to build something custom?
Book a Call →