[fix] engine: re-implement mullvad leta integration

Re-writes the Mullvad Leta integration to work with the new breaking changes.

Mullvad Leta is a search engine proxy.  Currently Leta only offers text search
results not image, news or any other types of search result.  Leta acts as a
proxy to Google and Brave search results.

- Remove docstring comments regarding requiring the use of Mullvad VPN, which is
  no longer a hard requirement.

- configured two engines: ``mullvadleta`` (uses google) and
  ``mullvadleta brave`` (uses brave)

- since leta may not provide up-to-date search results, both search engines are
  disabled by default.

.. hint::

   Leta caches each search for up to 30 days.  For example, if you use search
   terms like ``news``, contrary to your intention you'll get very old results!

Co-authored-by: Markus Heiser <markus.heiser@darmarit.de>
Signed-off-by: Grant Lanham <contact@grantlanham.com>
This commit is contained in:
Grant Lanham 2025-04-12 22:17:09 -04:00 committed by Markus Heiser
parent 07a94d4d2e
commit 851c0e5cc0
3 changed files with 216 additions and 148 deletions

View File

@ -4,10 +4,5 @@
Mullvad-Leta Mullvad-Leta
============ ============
.. contents:: Contents
:depth: 2
:local:
:backlinks: entry
.. automodule:: searx.engines.mullvad_leta .. automodule:: searx.engines.mullvad_leta
:members: :members:

View File

@ -1,46 +1,61 @@
# SPDX-License-Identifier: AGPL-3.0-or-later # SPDX-License-Identifier: AGPL-3.0-or-later
"""Mullvad Leta is a search engine proxy. Currently Leta only offers text
"""This is the implementation of the Mullvad-Leta meta-search engine. search results not image, news or any other types of search result. Leta acts
as a proxy to Google and Brave search results. You can select which backend
This engine **REQUIRES** that searxng operate within a Mullvad VPN search engine you wish to use, see (:py:obj:`leta_engine`).
If using docker, consider using gluetun for easily connecting to the Mullvad
- https://github.com/qdm12/gluetun
Otherwise, follow instructions provided by Mullvad for enabling the VPN on Linux
- https://mullvad.net/en/help/install-mullvad-app-linux
.. hint:: .. hint::
The :py:obj:`EngineTraits` is empty by default. Maintainers have to run Leta caches each search for up to 30 days. For example, if you use search
``make data.traits`` (in the Mullvad VPN / :py:obj:`fetch_traits`) and rebase terms like ``news``, contrary to your intention you'll get very old results!
the modified JSON file ``searx/data/engine_traits.json`` on every single
update of SearXNG!
Configuration
=============
The engine has the following additional settings:
- :py:obj:`leta_engine` (:py:obj:`LetaEnginesType`)
You can configure one Leta engine for Google and one for Brave:
.. code:: yaml
- name: mullvadleta
engine: mullvad_leta
leta_engine: google
shortcut: ml
- name: mullvadleta brave
engine: mullvad_leta
network: mullvadleta # use network from engine "mullvadleta" configured above
leta_engine: brave
shortcut: mlb
Implementations
===============
""" """
from __future__ import annotations from __future__ import annotations
from typing import TYPE_CHECKING import typing
from urllib.parse import urlencode
import babel
from httpx import Response from httpx import Response
from lxml import html from lxml import html
from searx.enginelib.traits import EngineTraits from searx.enginelib.traits import EngineTraits
from searx.locales import region_tag, get_official_locales from searx.locales import get_official_locales, language_tag, region_tag
from searx.utils import eval_xpath, extract_text, eval_xpath_list from searx.utils import eval_xpath_list
from searx.exceptions import SearxEngineResponseException from searx.result_types import EngineResults, MainResult
if TYPE_CHECKING: if typing.TYPE_CHECKING:
import logging import logging
logger = logging.getLogger() logger = logging.getLogger()
traits: EngineTraits traits: EngineTraits
use_cache: bool = True # non-cache use only has 100 searches per day!
leta_engine: str = 'google'
search_url = "https://leta.mullvad.net" search_url = "https://leta.mullvad.net"
# about # about
@ -54,154 +69,205 @@ about = {
} }
# engine dependent config # engine dependent config
categories = ['general', 'web'] categories = ["general", "web"]
paging = True paging = True
max_page = 50 max_page = 10
time_range_support = True time_range_support = True
time_range_dict = { time_range_dict = {
"day": "d1", "day": "d",
"week": "w1", "week": "w",
"month": "m1", "month": "m",
"year": "y1", "year": "y",
} }
available_leta_engines = [ LetaEnginesType = typing.Literal["google", "brave"]
'google', # first will be default if provided engine is invalid """Engine types supported by mullvadleta."""
'brave',
] leta_engine: LetaEnginesType = "google"
"""Select Leta's engine type from :py:obj:`LetaEnginesType`."""
def is_vpn_connected(dom: html.HtmlElement) -> bool: def init(_):
"""Returns true if the VPN is connected, False otherwise""" l = typing.get_args(LetaEnginesType)
connected_text = extract_text(eval_xpath(dom, '//main/div/p[1]')) if leta_engine not in l:
return connected_text != 'You are not connected to Mullvad VPN.' raise ValueError(f"leta_engine '{leta_engine}' is invalid, use one of {', '.join(l)}")
def assign_headers(headers: dict) -> dict: class DataNodeQueryMetaDataIndices(typing.TypedDict):
"""Assigns the headers to make a request to Mullvad Leta""" """Indices into query metadata."""
headers['Accept'] = "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8"
headers['Content-Type'] = "application/x-www-form-urlencoded" success: int
headers['Host'] = "leta.mullvad.net" q: int # pylint: disable=invalid-name
headers['Origin'] = "https://leta.mullvad.net" country: int
return headers language: int
lastUpdated: int
engine: int
items: int
infobox: int
news: int
timestamp: int
altered: int
page: int
next: int # if -1, there no more results are available
previous: int
class DataNodeResultIndices(typing.TypedDict):
"""Indices into query resultsdata."""
link: int
snippet: int
title: int
favicon: int
def request(query: str, params: dict): def request(query: str, params: dict):
country = traits.get_region(params.get('searxng_locale', 'all'), traits.all_locale) # type: ignore params["method"] = "GET"
args = {
result_engine = leta_engine
if leta_engine not in available_leta_engines:
result_engine = available_leta_engines[0]
logger.warning(
'Configured engine "%s" not one of the available engines %s, defaulting to "%s"',
leta_engine,
available_leta_engines,
result_engine,
)
params['url'] = search_url
params['method'] = 'POST'
params['data'] = {
"q": query, "q": query,
"gl": country if country is str else '', "engine": leta_engine,
'engine': result_engine, "x-sveltekit-invalidated": "001", # hardcoded from all requests seen
} }
# pylint: disable=undefined-variable
if use_cache:
params['data']['oc'] = "on"
# pylint: enable=undefined-variable
if params['time_range'] in time_range_dict: country = traits.get_region(params.get("searxng_locale"), traits.all_locale) # type: ignore
params['dateRestrict'] = time_range_dict[params['time_range']] if country:
else: args["country"] = country
params['dateRestrict'] = ''
if params['pageno'] > 1: language = traits.get_language(params.get("searxng_locale"), traits.all_locale) # type: ignore
# Page 1 is n/a, Page 2 is 11, page 3 is 21, ... if language:
params['data']['start'] = ''.join([str(params['pageno'] - 1), "1"]) args["language"] = language
if params['headers'] is None: if params["time_range"] in time_range_dict:
params['headers'] = {} args["lastUpdated"] = time_range_dict[params["time_range"]]
if params["pageno"] > 1:
args["page"] = params["pageno"]
params["url"] = f"{search_url}/search/__data.json?{urlencode(args)}"
assign_headers(params['headers'])
return params return params
def extract_result(dom_result: list[html.HtmlElement]): def response(resp: Response) -> EngineResults:
# Infoboxes sometimes appear in the beginning and will have a length of 0 json_response = resp.json()
if len(dom_result) == 3:
[a_elem, h3_elem, p_elem] = dom_result
elif len(dom_result) == 4:
[_, a_elem, h3_elem, p_elem] = dom_result
else:
return None
return { nodes = json_response["nodes"]
'url': extract_text(a_elem.text), # 0: is None
'title': extract_text(h3_elem), # 1: has "connected=True", not useful
'content': extract_text(p_elem), # 2: query results within "data"
}
data_nodes = nodes[2]["data"]
# Instead of nested object structure, all objects are flattened into a
# list. Rather, the first object in data_node provides indices into the
# "data_nodes" to access each searchresult (which is an object of more
# indices)
#
# Read the relative TypedDict definitions for details
query_meta_data: DataNodeQueryMetaDataIndices = data_nodes[0]
query_items_indices = query_meta_data["items"]
results = EngineResults()
for idx in data_nodes[query_items_indices]:
query_item_indices: DataNodeResultIndices = data_nodes[idx]
results.add(
MainResult(
url=data_nodes[query_item_indices["link"]],
title=data_nodes[query_item_indices["title"]],
content=data_nodes[query_item_indices["snippet"]],
)
)
return results
def extract_results(search_results: html.HtmlElement): def fetch_traits(engine_traits: EngineTraits) -> None:
for search_result in search_results: """Fetch languages and regions from Mullvad-Leta"""
dom_result = eval_xpath_list(search_result, 'div/div/*')
result = extract_result(dom_result)
if result is not None:
yield result
def extract_table_data(table):
for row in table.xpath(".//tr")[2:]:
cells = row.xpath(".//td | .//th") # includes headers and data
if len(cells) > 1: # ensure the column exists
cell0 = cells[0].text_content().strip()
cell1 = cells[1].text_content().strip()
yield [cell0, cell1]
def response(resp: Response):
"""Checks if connected to Mullvad VPN, then extracts the search results from
the DOM resp: requests response object"""
dom = html.fromstring(resp.text)
if not is_vpn_connected(dom):
raise SearxEngineResponseException('Not connected to Mullvad VPN')
search_results = eval_xpath(dom.body, '//main/div[2]/div')
return list(extract_results(search_results))
def fetch_traits(engine_traits: EngineTraits):
"""Fetch languages and regions from Mullvad-Leta
.. warning::
Fetching the engine traits also requires a Mullvad VPN connection. If
not connected, then an error message will print and no traits will be
updated.
"""
# pylint: disable=import-outside-toplevel # pylint: disable=import-outside-toplevel
# see https://github.com/searxng/searxng/issues/762 # see https://github.com/searxng/searxng/issues/762
from searx.network import post as http_post from searx.network import get as http_get
# pylint: enable=import-outside-toplevel # pylint: enable=import-outside-toplevel
resp = http_post(search_url, headers=assign_headers({}))
resp = http_get(f"{search_url}/documentation")
if not isinstance(resp, Response): if not isinstance(resp, Response):
print("ERROR: failed to get response from mullvad-leta. Are you connected to the VPN?") print("ERROR: failed to get response from mullvad-leta. Are you connected to the VPN?")
return return
if not resp.ok: if not resp.ok:
print("ERROR: response from mullvad-leta is not OK. Are you connected to the VPN?") print("ERROR: response from mullvad-leta is not OK. Are you connected to the VPN?")
return return
dom = html.fromstring(resp.text) dom = html.fromstring(resp.text)
if not is_vpn_connected(dom):
print('ERROR: Not connected to Mullvad VPN')
return
# supported region codes
options = eval_xpath_list(dom.body, '//main/div/form/div[2]/div/select[1]/option')
if options is None or len(options) <= 0:
print('ERROR: could not find any results. Are you connected to the VPN?')
for x in options:
eng_country = x.get("value")
sxng_locales = get_official_locales(eng_country, engine_traits.languages.keys(), regional=True) # There are 4 HTML tables on the documentation page for extracting information:
# 0. Keyboard Shortcuts
# 1. Query Parameters (shoutout to Mullvad for accessible docs for integration)
# 2. Country Codes [Country, Code]
# 3. Language Codes [Language, Code]
tables = eval_xpath_list(dom.body, "//table")
if tables is None or len(tables) <= 0:
print("ERROR: could not find any tables. Was the page updated?")
if not sxng_locales: language_table = tables[3]
print( lang_map = {
"ERROR: can't map from Mullvad-Leta country %s (%s) to a babel region." "zh-hant": "zh_Hans",
% (x.get('data-name'), eng_country) "zh-hans": "zh_Hant",
) "jp": "ja",
}
for language, code in extract_table_data(language_table):
locale_tag = lang_map.get(code, code).replace("-", "_") # type: ignore
try:
locale = babel.Locale.parse(locale_tag)
except babel.UnknownLocaleError:
print(f"ERROR: Mullvad-Leta language {language} ({code}) is unknown by babel")
continue continue
for sxng_locale in sxng_locales: sxng_tag = language_tag(locale)
engine_traits.regions[region_tag(sxng_locale)] = eng_country engine_traits.languages[sxng_tag] = code
country_table = tables[2]
country_map = {
"cn": "zh-CN",
"hk": "zh-HK",
"jp": "ja-JP",
"my": "ms-MY",
"tw": "zh-TW",
"uk": "en-GB",
"us": "en-US",
}
for country, code in extract_table_data(country_table):
sxng_tag = country_map.get(code)
if sxng_tag:
engine_traits.regions[sxng_tag] = code
continue
try:
locale = babel.Locale.parse(f"{code.lower()}_{code.upper()}")
except babel.UnknownLocaleError:
locale = None
if locale:
engine_traits.regions[region_tag(locale)] = code
continue
official_locales = get_official_locales(code, engine_traits.languages.keys(), regional=True)
if not official_locales:
print(f"ERROR: Mullvad-Leta country '{code}' ({country}) could not be mapped as expected.")
continue
for locale in official_locales:
engine_traits.regions[region_tag(locale)] = code

View File

@ -1420,14 +1420,21 @@ engines:
require_api_key: false require_api_key: false
results: JSON results: JSON
# read https://docs.searxng.org/dev/engines/online/mullvad_leta.html # https://docs.searxng.org/dev/engines/online/mullvad_leta.html
# - name: mullvadleta - name: mullvadleta
# engine: mullvad_leta engine: mullvad_leta
# leta_engine: google # choose one of the following: google, brave disabled: true
# use_cache: true # Only 100 non-cache searches per day, suggested only for private instances leta_engine: google
# search_url: https://leta.mullvad.net categories: [general, web]
# categories: [general, web] shortcut: ml
# shortcut: ml
- name: mullvadleta brave
engine: mullvad_leta
network: mullvadleta
disabled: true
leta_engine: brave
categories: [general, web]
shortcut: mlb
- name: odysee - name: odysee
engine: odysee engine: odysee