Create initial dev doc for Kagi: kagi.rst

2025-01-21 03:44:36 +00:00 · 2025-01-21 03:44:36 +00:00 · 2a421825be
commit 2a421825be
parent 44f5c299be
1 changed files with 105 additions and 0 deletions
--- a/docs/dev/engines/online/kagi.rst
+++ b/docs/dev/engines/online/kagi.rst
@ -0,0 +1,105 @@
+.. _kagi engine:
+
+Kagi
+====
+
+The Kagi engine scrapes search results from Kagi's HTML search interface.
+
+Example
+-------
+
+Configuration
+~~~~~~~~~~~~
+
+.. code:: yaml
+
+   - name: kagi
+     engine: kagi
+     shortcut: kg
+     categories: [general, web]
+     timeout: 4.0
+     api_key: "YOUR-KAGI-TOKEN"  # required
+     about:
+       website: https://kagi.com
+       use_official_api: false
+       require_api_key: true
+       results: HTML
+
+
+Parameters
+~~~~~~~~~~
+
+``api_key`` : required
+  The Kagi API token used for authentication. Can be obtained from your Kagi account settings.
+
+``pageno`` : optional
+  The page number for paginated results. Defaults to 1.
+
+Example Request
+~~~~~~~~~~~~~~
+
+.. code:: python
+
+   params = {
+       'api_key': 'YOUR-KAGI-TOKEN',
+       'pageno': 1,
+       'headers': {
+           'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
+           'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
+           'Accept-Language': 'en-US,en;q=0.5',
+           'DNT': '1'
+       }
+   }
+   query = 'test query'
+   request_params = kagi.request(query, params)
+
+Example Response
+~~~~~~~~~~~~~~
+
+.. code:: python
+
+   [
+       # Search result
+       {
+           'url': 'https://example.com/',
+           'title': 'Example Title',
+           'content': 'Example content snippet...',
+           'domain': 'example.com'
+       }
+   ]
+
+Implementation
+-------------
+
+The engine performs the following steps:
+
+1. Constructs a GET request to ``https://kagi.com/html/search`` with:
+ - ``q`` parameter for the search query
+ - ``token`` parameter for authentication
+ - ``batch`` parameter for pagination
+
+2. Parses the HTML response using XPath to extract:
+ - Result titles
+ - URLs
+ - Content snippets
+ - Domain information
+
+3. Handles various error cases:
+ - 401: Invalid API token
+ - 429: Rate limit exceeded
+ - Other non-200 status codes
+
+Dependencies
+-----------
+
+- lxml: For HTML parsing and XPath evaluation
+- urllib.parse: For URL handling and encoding
+- searx.utils: For text extraction and XPath helpers
+
+Notes
+-----
+
+- The engine requires a valid Kagi API token to function
+- Results are scraped from Kagi's HTML interface rather than using an official API
+- Rate limiting may apply based on your Kagi subscription level
+- The engine sets specific browser-like headers to ensure reliable scraping