1829 Commits

Author SHA1 Message Date
Markus Heiser
9006866019
[fix] engine archlinux: avoid Anubis challenge by User-Agent "SearXNG" (#4779)
Of the archlinux wikis only wiki.archlinux.org has a has Anubis challenge.

About Anubis[1]:

> Anubis decides to present a challenge using this logic:
>
> - User-Agent contains "Mozilla"
> ...
> This should ensure that git clients, RSS readers, and other low-harm clients
> can get through without issue ..

[1] 6c0ff3f4d5/docs/docs/design/how-anubis-works.mdx (challenge-presentation)


Suggested-by: @unixfox https://github.com/searxng/searxng/issues/4646#issuecomment-2855322406
Closes: https://github.com/searxng/searxng/issues/4646

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2025-05-13 10:18:28 +02:00
Markus Heiser
bdfe1c2a15 [mod] engines: migration of the individual cache solutions to EngineCache
The EngineCache class replaces all previously individual solutions for caches in
the context of the engines.

- demo_offline.py
- duckduckgo.py
- radio_browser.py
- soundcloud.py
- startpage.py
- wolframalpha_api.py
- wolframalpha_noapi.py

Search term to test most of the modified engines::

    !ddg !rb !sc !sp !wa test

    !ddg !rb !sc !sp !wa foo

For introspection of the DB, jump into developer environment and run command to
show cache state::

    $ ./manage pyenv.cmd bash --norc --noprofile
    (py3) python -m searx.enginelib cache state

    cache tables and key/values
    ===========================
    [demo_offline        ] 2025-04-22 11:32:50 count        --> (int) 4
    [startpage           ] 2025-04-22 12:32:30 SC_CODE      --> (str) fSOBnhEMlDfE20
    [duckduckgo          ] 2025-04-22 12:32:31 4dff493e.... --> (str) 4-128634958369380006627592672385352473325
    [duckduckgo          ] 2025-04-22 12:40:06 3e2583e2.... --> (str) 4-263126175288871260472289814259666848451
    [radio_browser       ] 2025-04-23 11:33:08 servers      --> (list) ['https://de2.api.radio-browser.info',  ...]
    [soundcloud          ] 2025-04-29 11:40:06 guest_client_id --> (str) EjkRJG0BLNEZquRiPZYdNtJdyGtTuHdp
    [wolframalpha        ] 2025-04-22 12:40:06 code         --> (str) 5aa79f86205ad26188e0e26e28fb7ae7
    number of tables: 6
    number of key/value pairs: 7

In the "cache tables and key/values" section, the table name (engine name) is at
first position on the second there is the calculated expire date and on the
third and fourth position the key/value is shown.

About duckduckgo: The *vqd coode* of ddg depends on the query term and therefore
the key is a hash value of the query term (to not to store the raw query term).

In the "properties of ENGINES_CACHE" section all properties of the SQLiteAppl /
ExpireCache and their last modification date are shown::

    properties of ENGINES_CACHE
    ===========================
    [last modified: 2025-04-22 11:32:27] DB_SCHEMA           : 1
    [last modified: 2025-04-22 11:32:27] LAST_MAINTENANCE    :
    [last modified: 2025-04-22 11:32:27] crypt_hash          : ca612e3566fdfd7cf7efe2b1c9349f461158d07cb78a3750e5c5be686aa8ebdc
    [last modified: 2025-04-22 11:32:30] CACHE-TABLE--demo_offline: demo_offline
    [last modified: 2025-04-22 11:32:30] CACHE-TABLE--startpage: startpage
    [last modified: 2025-04-22 11:32:31] CACHE-TABLE--duckduckgo: duckduckgo
    [last modified: 2025-04-22 11:33:08] CACHE-TABLE--radio_browser: radio_browser
    [last modified: 2025-04-22 11:40:06] CACHE-TABLE--soundcloud: soundcloud
    [last modified: 2025-04-22 11:40:06] CACHE-TABLE--wolframalpha: wolframalpha

These properties provide information about the state of the ExpireCache and
control the behavior.  For example, the maintenance intervals are controlled by
the last modification date of the LAST_MAINTENANCE property and the hash value
of the password can be used to detect whether the password has been changed (in
this case the DB entries can no longer be decrypted and the entire cache must be
discarded).

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2025-05-03 08:39:12 +02:00
Bnyro
590b211652 [fix] semantic scholar: method not allowed / engine doesn't work
Fixes the semantic scholar engine by extracting a ui version token.

BTW: remove html tags from the content.

Author's checklist:

- they are ratelimiting very fast, if you do approx more than 2 requests per
  minute, you have to wait some time again...

- they also have an official api at api.semanticscholar.org, but it's ratelimits
  are even harder

Closes: https://github.com/searxng/searxng/issues/4685
2025-05-02 16:46:38 +02:00
BrandonStudio
d47cf9db24 [feat] engine ChinaSo: support source filter for ChinaSo-News
* filtering ChinaSo-News results by source, option ``chinaso_news_source``
* add ChinaSo engine to the online docs https://docs.searxng.org/dev/engines/online/chinaso.html
* fix SearXNG categories in the settings.yml
* deactivate ChinaSo engines ``inactive: true`` until [1] is fixed
* configure network of the ChinaSo engines

[1] https://github.com/searxng/searxng/issues/4694

Signed-off-by: @BrandonStudio
Co-authored-by: Markus Heiser <markus.heiser@darmarit.de>
2025-05-02 14:22:51 +02:00
Bnyro
fd33559cfb [fix] brave: fix images and videos engines 2025-04-30 08:28:04 +02:00
Denperidge
60e31eacfc [fix] pdia: dynamically fetch API key config file location
As suggested by @Bnyro at
https://github.com/searxng/searxng/pull/4652#discussion_r2055760390 !
2025-04-29 20:45:08 +02:00
Markus Heiser
c20038e7c3 [fix] engine yahoo: replace fetch_traits by a list of languages
The Yahoo engine's fetch_traits function has been encountering an error in CI
jobs for several months [1], thus aborting the process for all other engines as
well.

The language selection dialog (which fetch_traits calls) requires an `EuConsent`
cookie. Strangely, the cookie is not needed for searching, which is why the
engine itself still works.

Since Yahoo won't be conquering any new marketplaces in the foreseeable future,
it should be sufficient to hard-implement the list of currently available
languages ​​(`yahoo_languages`).

[1] https://github.com/searxng/searxng/actions/runs/14720458830/job/41313149268

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2025-04-29 08:48:56 +02:00
Zhijie He
8595e467ce [fix] fix Quark engine calling 2025-04-24 16:17:34 +02:00
Markus Heiser
f45d4145e6 [fix] typo in soundcloud engine
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2025-04-23 18:42:40 +02:00
Grant Lanham
851c0e5cc0 [fix] engine: re-implement mullvad leta integration
Re-writes the Mullvad Leta integration to work with the new breaking changes.

Mullvad Leta is a search engine proxy.  Currently Leta only offers text search
results not image, news or any other types of search result.  Leta acts as a
proxy to Google and Brave search results.

- Remove docstring comments regarding requiring the use of Mullvad VPN, which is
  no longer a hard requirement.

- configured two engines: ``mullvadleta`` (uses google) and
  ``mullvadleta brave`` (uses brave)

- since leta may not provide up-to-date search results, both search engines are
  disabled by default.

.. hint::

   Leta caches each search for up to 30 days.  For example, if you use search
   terms like ``news``, contrary to your intention you'll get very old results!

Co-authored-by: Markus Heiser <markus.heiser@darmarit.de>
Signed-off-by: Grant Lanham <contact@grantlanham.com>
2025-04-23 14:06:32 +02:00
Zhijie He
808dcaf1e2 [feat] engine: add Steam engine 2025-04-18 09:30:17 +02:00
Zhijie He
f94802f2d2 [feat] engines: add Hugging Face engine 2025-04-17 16:43:32 +02:00
Tommaso Colella
d1c584b961 [feat] engine: add engine for italian press agency ansa 2025-04-17 15:33:57 +02:00
RobinFrcd
087da66565 [feat] add SensCritique (FR) engine
Closes: https://github.com/searxng/searxng/issues/4623
2025-04-17 10:19:22 +02:00
Tommaso Colella
391bb1268d [feat] engine: add microsoft learn engine 2025-04-12 11:14:13 +02:00
grasdk
8ee51cc0f3 [fix] engine dokuwiki: basedir duplication
Dokuwiki searches behind reverse proxy had duplicate base path in the url,
creating a wrong url.

This patch exchanges string concat of URLs with urljoin [1] from urllib.parse.  This
eliminates the dual problem, while retaining the old functionality designed to
concatenate protocol, hostname and port (as base_url) with path.

[1] https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urljoin

Closes: https://github.com/searxng/searxng/issues/4598
2025-04-11 09:47:25 +02:00
Markus Heiser
15384e8fc5 [fix] make docs - ERROR: Unknown target name: "auth_key"
BTW: fix a bug with sys.path: repo-root (not util) needs to added to generate
autodoc from scripts in ./searxng_extra

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2025-04-09 17:28:18 +02:00
Markus Heiser
b146b745a7 [fix] Meilisearch engine: Authorization Token When Integrating Meilisearch
`X-Meili-API-Key` has  been changed to `Authorization` [1]

[1] https://www.meilisearch.com/docs/reference/api/overview#authorization

Suggested-by: https://github.com/searxng/searxng/issues/4416#issuecomment-2781254841
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2025-04-07 08:44:00 +02:00
Markus Heiser
8c8aba8cf5 [fix] engine radio browser: get servers from DNS api.radio-browser.info
Do a DNS-lookup of 'all.api.radio-browser.info', add reverse lookup and select
randomly a URL from available servers

Closes: https://github.com/searxng/searxng/issues/4576
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2025-04-06 18:59:10 +02:00
Markus Heiser
e6308b8167 [fix] hardening against arguments of type None, where str or dict is expected
On a long-running server, the tracebacks below can be found (albeit rarely),
which indicate problems with NoneType where a string or another data type is
expected.

result.img_src::

    File "/usr/local/searxng/searxng-src/searx/templates/simple/result_templates/images.html", line 13, in top-level template code
      <img src="" data-src="{{ image_proxify(result.img_src) }}" alt="{{ result.title|striptags }}">{{- "" -}}
      ^
    File "/usr/local/searxng/searxng-src/searx/webapp.py", line 284, in image_proxify
      if url.startswith('//'):
         ^^^^^^^^^^^^^^
    AttributeError: 'NoneType' object has no attribute 'startswith'

result.content::

    File "/usr/local/searxng/searxng-src/searx/result_types/_base.py", line 105, in _normalize_text_fields
      result.content = WHITESPACE_REGEX.sub(" ", result.content).strip()
                       ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^
    TypeError: expected string or bytes-like object, got 'NoneType'

html_to_text, when html_str is a NoneType::

    File "/usr/local/searxng/searxng-src/searx/engines/wikipedia.py", line 190, in response
      title = utils.html_to_text(api_result.get('titles', {}).get('display') or api_result.get('title'))
    File "/usr/local/searxng/searxng-src/searx/utils.py", line 158, in html_to_text
      html_str = html_str.replace('\n', ' ').replace('\r', ' ')
                 ^^^^^^^^^^^^^^^^
    AttributeError: 'NoneType' object has no attribute 'replace'

presearch engine, when json_resp is a NoneType::

    File "/usr/local/searxng/searxng-src/searx/engines/presearch.py", line 221, in response
      results = parse_search_query(json_resp.get('results'))
    File "/usr/local/searxng/searxng-src/searx/engines/presearch.py", line 161, in parse_search_query
      for item in json_results.get('specialSections', {}).get('topStoriesCompact', {}).get('data', []):
                  ^^^^^^^^^^^^^^^^
    AttributeError: 'NoneType' object has no attribute 'get'

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2025-04-01 11:13:47 +02:00
Zhijie He
7b4612e862 [feat] engines: add Ollama engine 2025-03-30 14:25:58 +02:00
Bnyro
9ffa9fb730 [feat] engines: add reuters news engine 2025-03-30 13:56:09 +02:00
Tommaso Colella
5daa4f0460 [feat] engine: add engine for italian online newspaper "il post" 2025-03-30 13:45:06 +02:00
Zhijie He
33661cc5c3 [feat] engines: add Quark engine
Co-authored-by: Bnyro <bnyro@tutanota.com>
2025-03-30 13:20:35 +02:00
Zhijie He
b231cb4b59 [feat] engines: add Niconico videos engine
Co-authored-by: Bnyro <bnyro@tutanota.com>
2025-03-30 12:42:31 +02:00
naughtymommy42069
c8b419fcbb [feat] engine: add bitchute 2025-03-30 12:41:43 +02:00
Aadniz
ecee73eafd [fix] presearch engine: Unexpected crash if duration not in videos 2025-03-28 16:26:39 +01:00
Markus Heiser
150b2e21fd [fix] make docs -> ERROR: Unknown target name: "google: max 50 pages".
Fix the issues reported by sphinx build::

    docstring of searx.engines.google.max_page:1: ERROR: Unknown target name: "google: max 50 pages".
    docstring of searx.engines.google_images.max_page:1: ERROR: Unknown target name: "google: max 50 pages".
    docstring of searx.engines.google_scholar.max_page:1: ERROR: Unknown target name: "google: max 50 pages".

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2025-03-27 06:57:28 +01:00
Aadniz
02f5002a5f [fix] baidu engine: properly decoding HTML escape codes 2025-03-27 06:11:39 +01:00
Bnyro
4dfc47584d [refactor] duration strings: move parsing logic to utils.py 2025-03-25 16:48:44 +01:00
Bnyro
c28d35c7fc [fix] duckduckgo news: unescaped html sequences in description 2025-03-25 16:14:36 +01:00
Ikko Eltociear Ashimine
2482646323 [fix] typo in doc-str: offical -> official 2025-03-21 11:05:54 +01:00
Bnyro
b75e56afe6 [fix] duckduckgo: answer sometimes contains faulty (duplicated) url 2025-03-21 07:48:30 +01:00
Bnyro
3668c7012e [fix] presearch videos: item description and duration are located in metadata field 2025-03-20 20:55:09 +01:00
Aadniz
556db857aa [fix] presearch engine: News and Videos formatted incorrectly 2025-03-20 20:44:43 +01:00
Tan Yong Sheng
40feede51e [fix] engine: core.ac.uk implement API v3 / v2 is no longer supported 2025-03-19 17:51:00 +01:00
Bnyro
babbe9e1ae [fix] duckduckgo: show proper source url of answers 2025-03-18 05:31:28 +01:00
Bnyro
885d02c8c3 [feat] engine: add selfh.st/icons for logos of common self-hosted programs 2025-03-17 20:23:54 +01:00
Bnyro
bbb2894b04 [engine] elasticsearch: add pagination support 2025-03-16 22:10:05 +01:00
Markus Heiser
a1d5add718 fixup! [fix] fix invalid escape error in Baidu Images & default config typo 2025-03-15 17:14:54 +01:00
Zhijie He
38caa49540 [fix] fix invalid escape error in Baidu Images & default config typo 2025-03-15 17:14:54 +01:00
Zhijie He
4ce7f1accc [feat]: engines add images & kaifa from baidu.com 2025-03-15 17:14:54 +01:00
Markus Heiser
f49b2c94a9 [mod] migrate all key-value.html templates to KeyValue type
The engines now all use KeyValue results and return the results in a
EngineResults object.

The sqlite engine can return MainResult results in addition to KeyValue
results (based on engine's config in settings.yml),

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2025-03-15 10:36:33 +01:00
Aadniz
a88b4d7036 [fix] presearch engine: domain sometimes included in beginning of titles 2025-03-08 12:39:16 +01:00
Austin-Olacsi
73d50f5748 [feat] add bilibili support to get get_embeded_stream_url 2025-03-08 10:47:30 +01:00
Aadniz
4884747508 [fix] presearch engine: Title showing <em> html code 2025-03-07 21:24:35 +01:00
Markus Heiser
eb3633629a [fix] set language for engines from chinese market (no i18n index nor UI)
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2025-03-07 19:59:13 +01:00
Loris
02b76c8389 [fix] engine qwant: add tgp and llm arguments to avoid CAPTCHA 2025-03-07 18:58:45 +01:00
Markus Heiser
08a90d46d6 [doc] add missing docs for the search.max_page setting
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2025-03-07 10:07:41 +01:00
Bubu
b8671c7a4a [feat] engines: add baidu (general) 2025-03-07 06:59:28 +01:00