Create initial dev doc for Kagi: kagi.rst

This commit is contained in:
SentientTapeDrive 2025-01-21 03:44:36 +00:00 committed by GitHub
parent 44f5c299be
commit 2a421825be
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -0,0 +1,105 @@
.. _kagi engine:
Kagi
====
The Kagi engine scrapes search results from Kagi's HTML search interface.
Example
-------
Configuration
~~~~~~~~~~~~
.. code:: yaml
- name: kagi
engine: kagi
shortcut: kg
categories: [general, web]
timeout: 4.0
api_key: "YOUR-KAGI-TOKEN" # required
about:
website: https://kagi.com
use_official_api: false
require_api_key: true
results: HTML
Parameters
~~~~~~~~~~
``api_key`` : required
The Kagi API token used for authentication. Can be obtained from your Kagi account settings.
``pageno`` : optional
The page number for paginated results. Defaults to 1.
Example Request
~~~~~~~~~~~~~~
.. code:: python
params = {
'api_key': 'YOUR-KAGI-TOKEN',
'pageno': 1,
'headers': {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'DNT': '1'
}
}
query = 'test query'
request_params = kagi.request(query, params)
Example Response
~~~~~~~~~~~~~~
.. code:: python
[
# Search result
{
'url': 'https://example.com/',
'title': 'Example Title',
'content': 'Example content snippet...',
'domain': 'example.com'
}
]
Implementation
-------------
The engine performs the following steps:
1. Constructs a GET request to ``https://kagi.com/html/search`` with:
- ``q`` parameter for the search query
- ``token`` parameter for authentication
- ``batch`` parameter for pagination
2. Parses the HTML response using XPath to extract:
- Result titles
- URLs
- Content snippets
- Domain information
3. Handles various error cases:
- 401: Invalid API token
- 429: Rate limit exceeded
- Other non-200 status codes
Dependencies
-----------
- lxml: For HTML parsing and XPath evaluation
- urllib.parse: For URL handling and encoding
- searx.utils: For text extraction and XPath helpers
Notes
-----
- The engine requires a valid Kagi API token to function
- Results are scraped from Kagi's HTML interface rather than using an official API
- Rate limiting may apply based on your Kagi subscription level
- The engine sets specific browser-like headers to ensure reliable scraping