Make Websites Conversational

NLWeb transforms natural language queries into structured Schema.org responses

Tap any layer for details

Data Sources

Schema.org RSS Feeds JSON-LD Sitemaps

Existing structured web data

Data Loading

db_load Embedding Chunking

Python scripts ingest & embed content

Vector Store

Qdrant Postgres/pgvector Azure AI Search Milvus Elasticsearch

Semantic index for retrieval

NLWeb Core

Query Understanding Retrieval Reranking Response Gen

Python service orchestrating the pipeline

LLM Layer

OpenAI Anthropic Gemini DeepSeek Local Models

Pluggable model providers

API Layer

REST API MCP Server A2A (soon)

Returns Schema.org JSON responses

Clients

Web UI AI Agents MCP Hosts Chatbots

Humans & agents consume the API

Key Insight

NLWeb : MCP/A2A :: HTML : HTTP
Every instance is an MCP server. Uses Schema.org for responses. MIT licensed, runs on clusters to laptops.

Interactive Playground

Try example queries or connect to your own NLWeb instance

Example Queries

Key Features

🤖 MCP Integration

Acts as a Model Context Protocol server, allowing AI assistants to query websites using natural language.

🌐 Schema.org Support

Uses structured web data formats already deployed on over 100 million websites.

⚡ Multi-Platform

Compatible with Windows, macOS, and Linux. Supports multiple vector databases and LLMs.

🔄 Flexible Infrastructure

Works with Qdrant, Snowflake, Milvus, Azure AI Search, Elasticsearch, Postgres, and Cloudflare AutoRAG.

🎯 Simple REST API

Clean protocol for natural language queries returning JSON responses with Schema.org vocabulary.

📦 Multiple Modes

List, summarize, or generate responses based on your use case needs.

Quick Start

1. Clone and Setup

git clone https://github.com/microsoft/NLWeb
cd NLWeb
python -m venv myenv
source myenv/bin/activate  # On Windows: myenv\Scripts\activate

2. Install Dependencies

cd code/python
pip install -r requirements.txt

3. Configure API Keys

cp .env.template .env
# Edit .env with your LLM provider credentials

4. Load Sample Data

# Load a podcast RSS feed
python -m data_loading.db_load https://feeds.libsyn.com/121695/rss behind-the-tech

5. Start the Server

python app-aiohttp.py
# Access at http://localhost:8000/

Full Documentation →

API Reference

Endpoints

  • POST /ask - Standard response format
  • POST /mcp - MCP client-compatible format

Request Parameters

Parameter Type Required Description
query string Natural language query
mode string list, summarize, or generate (default: list)
site string Backend subset/site token
prev string Comma-separated previous queries for context
streaming boolean Enable streaming (default: true)

Example Request

curl -X POST http://localhost:8000/ask \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are the latest tech podcasts?",
    "mode": "list"
  }'

Example Response

{
  "query_id": "abc123",
  "results": [
    {
      "name": "Behind the Tech Episode 42",
      "url": "https://example.com/episode-42",
      "site": "behind-the-tech",
      "score": 0.95,
      "description": "Discussion about AI and the future of technology",
      "schema_object": {
        "@type": "PodcastEpisode",
        "name": "Behind the Tech Episode 42",
        "datePublished": "2024-01-15"
      }
    }
  ]
}