Make Websites Conversational

NLWeb transforms natural language queries into structured Schema.org responses

Tap any layer for details

Data Sources

Schema.org RSS Feeds JSON-LD Sitemaps

Existing structured web data

Data Loading

db_load Embedding Chunking

Python scripts ingest & embed content

Vector Store

Qdrant Postgres/pgvector Azure AI Search Milvus Elasticsearch

Semantic index for retrieval

NLWeb Core

Query Understanding Retrieval Reranking Response Gen

Python service orchestrating the pipeline

LLM Layer

OpenAI Anthropic Gemini DeepSeek Local Models

Pluggable model providers

API Layer

REST API MCP Server A2A (soon)

Returns Schema.org JSON responses

Clients

Web UI AI Agents MCP Hosts Chatbots

Humans & agents consume the API

Key Insight

NLWeb : MCP/A2A :: HTML : HTTP
Every instance is an MCP server. Uses Schema.org for responses. MIT licensed, runs on clusters to laptops.

View on GitHub Try the Demo

Interactive Playground

Try example queries or connect to your own NLWeb instance

Use Mock Data Connect to NLWeb Instance

Your Query

Mode: List Summarize Generate

Example Queries

Response

Pretty View

Key Features

🤖 MCP Integration

Acts as a Model Context Protocol server, allowing AI assistants to query websites using natural language.

🌐 Schema.org Support

Uses structured web data formats already deployed on over 100 million websites.

⚡ Multi-Platform

Compatible with Windows, macOS, and Linux. Supports multiple vector databases and LLMs.

🔄 Flexible Infrastructure

Works with Qdrant, Snowflake, Milvus, Azure AI Search, Elasticsearch, Postgres, and Cloudflare AutoRAG.

🎯 Simple REST API

Clean protocol for natural language queries returning JSON responses with Schema.org vocabulary.

📦 Multiple Modes

List, summarize, or generate responses based on your use case needs.

Quick Start

1. Clone and Setup

git clone https://github.com/microsoft/NLWeb
cd NLWeb
python -m venv myenv
source myenv/bin/activate  # On Windows: myenv\Scripts\activate

2. Install Dependencies

cd code/python
pip install -r requirements.txt

3. Configure API Keys

cp .env.template .env
# Edit .env with your LLM provider credentials

4. Load Sample Data

# Load a podcast RSS feed
python -m data_loading.db_load https://feeds.libsyn.com/121695/rss behind-the-tech

5. Start the Server

python app-aiohttp.py
# Access at http://localhost:8000/

Full Documentation →

API Reference

Endpoints

POST /ask - Standard response format
POST /mcp - MCP client-compatible format

Request Parameters

Parameter	Type	Required	Description
`query`	string	✓	Natural language query
`mode`	string		list, summarize, or generate (default: list)
`site`	string		Backend subset/site token
`prev`	string		Comma-separated previous queries for context
`streaming`	boolean		Enable streaming (default: true)

Example Request

curl -X POST http://localhost:8000/ask \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are the latest tech podcasts?",
    "mode": "list"
  }'

Example Response

{
  "query_id": "abc123",
  "results": [
    {
      "name": "Behind the Tech Episode 42",
      "url": "https://example.com/episode-42",
      "site": "behind-the-tech",
      "score": 0.95,
      "description": "Discussion about AI and the future of technology",
      "schema_object": {
        "@type": "PodcastEpisode",
        "name": "Behind the Tech Episode 42",
        "datePublished": "2024-01-15"
      }
    }
  ]
}