12 Commits

Author SHA1 Message Date
Tommaso Colella
99ad69450d [fix] Result type: remove rstrip "/" form url normalization 2025-04-17 10:24:05 +02:00
Markus Heiser
e6308b8167 [fix] hardening against arguments of type None, where str or dict is expected
On a long-running server, the tracebacks below can be found (albeit rarely),
which indicate problems with NoneType where a string or another data type is
expected.

result.img_src::

    File "/usr/local/searxng/searxng-src/searx/templates/simple/result_templates/images.html", line 13, in top-level template code
      <img src="" data-src="{{ image_proxify(result.img_src) }}" alt="{{ result.title|striptags }}">{{- "" -}}
      ^
    File "/usr/local/searxng/searxng-src/searx/webapp.py", line 284, in image_proxify
      if url.startswith('//'):
         ^^^^^^^^^^^^^^
    AttributeError: 'NoneType' object has no attribute 'startswith'

result.content::

    File "/usr/local/searxng/searxng-src/searx/result_types/_base.py", line 105, in _normalize_text_fields
      result.content = WHITESPACE_REGEX.sub(" ", result.content).strip()
                       ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^
    TypeError: expected string or bytes-like object, got 'NoneType'

html_to_text, when html_str is a NoneType::

    File "/usr/local/searxng/searxng-src/searx/engines/wikipedia.py", line 190, in response
      title = utils.html_to_text(api_result.get('titles', {}).get('display') or api_result.get('title'))
    File "/usr/local/searxng/searxng-src/searx/utils.py", line 158, in html_to_text
      html_str = html_str.replace('\n', ' ').replace('\r', ' ')
                 ^^^^^^^^^^^^^^^^
    AttributeError: 'NoneType' object has no attribute 'replace'

presearch engine, when json_resp is a NoneType::

    File "/usr/local/searxng/searxng-src/searx/engines/presearch.py", line 221, in response
      results = parse_search_query(json_resp.get('results'))
    File "/usr/local/searxng/searxng-src/searx/engines/presearch.py", line 161, in parse_search_query
      for item in json_results.get('specialSections', {}).get('topStoriesCompact', {}).get('data', []):
                  ^^^^^^^^^^^^^^^^
    AttributeError: 'NoneType' object has no attribute 'get'

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2025-04-01 11:13:47 +02:00
Bnyro
9ffa9fb730 [feat] engines: add reuters news engine 2025-03-30 13:56:09 +02:00
Markus Heiser
50f92779bd [refactor] migrate plugins from "module" to class SXNGPlugin
This patch brings two major changes:

- ``Result.filter_urls(..)`` to pass a filter function for URL fields
- The ``enabled_plugins:`` section in SearXNG's settings do no longer exists.

To understand plugin development compile documentation:

    $ make docs.clean docs.live

and read http://0.0.0.0:8000/dev/plugins/development.html

There is no longer a distinction between built-in and external plugin, all
plugins are registered via the settings in the ``plugins:`` section.

In SearXNG, plugins can be registered via a fully qualified class name.  A
configuration (`PluginCfg`) can be transferred to the plugin, e.g. to activate
it by default / *opt-in* or *opt-out* from user's point of view.

built-in plugins
================

The built-in plugins are all located in the namespace `searx.plugins`.

.. code:: yaml

    plugins:

      searx.plugins.calculator.SXNGPlugin:
        active: true

      searx.plugins.hash_plugin.SXNGPlugin:
        active: true

      searx.plugins.self_info.SXNGPlugin:
        active: true

      searx.plugins.tracker_url_remover.SXNGPlugin:
        active: true

      searx.plugins.unit_converter.SXNGPlugin:
        active: true

      searx.plugins.ahmia_filter.SXNGPlugin:
        active: true

      searx.plugins.hostnames.SXNGPlugin:
        active: true

      searx.plugins.oa_doi_rewrite.SXNGPlugin:
        active: false

      searx.plugins.tor_check.SXNGPlugin:
        active: false

external plugins
================

SearXNG supports *external plugins* / there is no need to install one, SearXNG
runs out of the box.

- Only show green hosted results: https://github.com/return42/tgwf-searx-plugins/

To get a developer installation in a SearXNG developer environment:

.. code:: sh

   $ git clone git@github.com:return42/tgwf-searx-plugins.git
   $ ./manage pyenv.cmd python -m \
         pip install -e tgwf-searx-plugins

To register the plugin in SearXNG add ``only_show_green_results.SXNGPlugin`` to
the ``plugins:``:

.. code:: yaml

    plugins:
      # ...
      only_show_green_results.SXNGPlugin:
        active: false

Result.filter_urls(..)
======================

The ``Result.filter_urls(..)`` can be used to filter and/or modify URL fields.
In the following example, the filter function ``my_url_filter``:

.. code:: python

   def my_url_filter(result, field_name, url_src) -> bool | str:
       if "google" in url_src:
           return False              # remove URL field from result
       if "facebook" in url_src:
           new_url = url_src.replace("facebook", "fb-dummy")
           return new_url            # return modified URL
       return True                   # leave URL in field unchanged

is applied to all URL fields in the :py:obj:`Plugin.on_result` hook:

.. code:: python

   class MyUrlFilter(Plugin):
       ...
       def on_result(self, request, search, result) -> bool:
           result.filter_urls(my_url_filter)
           return True

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2025-03-29 10:16:43 +01:00
Markus Heiser
96a6e3dcb2 [fix] Results.url: don't normalize www.example.com to example.com
Hostname "www" in URL results can't be normalized to an empty string:

- https://www.tu-darmstadt.de/
- https://tu-darmstadt.de/

Reported-By: @Bnyro <bnyro@tutanota.com>
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2025-03-21 10:28:34 +01:00
Markus Heiser
af5dbdf768 [mod] typification of SearXNG: add new result type KeyValue
This patch adds a new result type: KeyValue

- Python class:   searx/result_types/keyvalue.py
- Jinja template: searx/templates/simple/result_templates/keyvalue.html
- CSS (less)      client/simple/src/less/result_types/keyvalue.less

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2025-03-15 10:36:33 +01:00
Markus Heiser
8769b7c6d6 [refactor] typification of SearXNG (MainResult) / result items (part 2)
The class ReslutContainer has been revised, it can now handle the typed Result
items of classes:

- MainResult
- LegacyResult (a dict wrapper for backward compatibility)

Due to the now complete typing of theses three clases, instead of the *getitem*
accesses, the fields can now be accessed directly via attributes (which is also
supported by the IDE).

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2025-03-15 10:36:33 +01:00
Markus Heiser
a235c54f8c [mod] rudimentary implementation of a MainResult type
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2025-01-29 05:04:41 +01:00
Markus Heiser
df3344e5d5 [fix] JSON & CSV format return error 500
For CSV and JSON output, the LegacyResult and the Result objects needs to be
converted to a python dictionary.

Closes: https://github.com/searxng/searxng/issues/4244
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2025-01-29 05:04:41 +01:00
Markus Heiser
906b9e7d4c [fix] hostnames plugin: AttributeError: 'NoneType' object has no attribute 'netloc'
Closes: https://github.com/searxng/searxng/issues/4245
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2025-01-28 16:28:12 +01:00
Markus Heiser
36a1ef1239 [refactor] typification of SearXNG / EngineResults
In [1] and [2] we discussed the need of a Result.results property and how we can
avoid unclear code.  This patch implements a class for the reslut-lists of
engines::

    searx.result_types.EngineResults

A simple example for the usage in engine development::

    from searx.result_types import EngineResults
    ...
    def response(resp) -> EngineResults:
        res = EngineResults()
        ...
        res.add( res.types.Answer(answer="lorem ipsum ..", url="https://example.org") )
        ...
        return res

[1] https://github.com/searxng/searxng/pull/4183#pullrequestreview-257400034
[2] https://github.com/searxng/searxng/pull/4183#issuecomment-2614301580
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2025-01-28 07:07:08 +01:00
Markus Heiser
edfbf1e118 [refactor] typification of SearXNG (initial) / result items (part 1)
Typification of SearXNG
=======================

This patch introduces the typing of the results.  The why and how is described
in the documentation, please generate the documentation ..

    $ make docs.clean docs.live

and read the following articles in the "Developer documentation":

- result types --> http://0.0.0.0:8000/dev/result_types/index.html

The result types are available from the `searx.result_types` module.  The
following have been implemented so far:

- base result type: `searx.result_type.Result`
  --> http://0.0.0.0:8000/dev/result_types/base_result.html

- answer results
  --> http://0.0.0.0:8000/dev/result_types/answer.html

including the type for translations (inspired by #3925).  For all other
types (which still need to be set up in subsequent PRs), template documentation
has been created for the transition period.

Doc of the fields used in Templates
===================================

The template documentation is the basis for the typing and is the first complete
documentation of the results (needed for engine development).  It is the
"working paper" (the plan) with which further typifications can be implemented
in subsequent PRs.

- https://github.com/searxng/searxng/issues/357

Answer Templates
================

With the new (sub) types for `Answer`, the templates for the answers have also
been revised, `Translation` are now displayed with collapsible entries (inspired
by #3925).

    !en-de dog

Plugins & Answerer
==================

The implementation for `Plugin` and `Answer` has been revised, see
documentation:

- Plugin: http://0.0.0.0:8000/dev/plugins/index.html
- Answerer: http://0.0.0.0:8000/dev/answerers/index.html

With `AnswerStorage` and `AnswerStorage` to manage those items (in follow up
PRs, `ArticleStorage`, `InfoStorage` and .. will be implemented)

Autocomplete
============

The autocompletion had a bug where the results from `Answer` had not been shown
in the past.  To test activate autocompletion and try search terms for which we
have answerers

- statistics: type `min 1 2 3` .. in the completion list you should find an
  entry like `[de] min(1, 2, 3) = 1`

- random: type `random uuid` .. in the completion list, the first item is a
  random UUID

Extended Types
==============

SearXNG extends e.g. the request and response types of flask and httpx, a module
has been set up for type extensions:

- Extended Types
  --> http://0.0.0.0:8000/dev/extended_types.html

Unit-Tests
==========

The unit tests have been completely revised.  In the previous implementation,
the runtime (the global variables such as `searx.settings`) was not initialized
before each test, so the runtime environment with which a test ran was always
determined by the tests that ran before it.  This was also the reason why we
sometimes had to observe non-deterministic errors in the tests in the past:

- https://github.com/searxng/searxng/issues/2988 is one example for the Runtime
  issues, with non-deterministic behavior ..

- https://github.com/searxng/searxng/pull/3650
- https://github.com/searxng/searxng/pull/3654
- https://github.com/searxng/searxng/pull/3642#issuecomment-2226884469
- https://github.com/searxng/searxng/pull/3746#issuecomment-2300965005

Why msgspec.Struct
==================

We have already discussed typing based on e.g. `TypeDict` or `dataclass` in the past:

- https://github.com/searxng/searxng/pull/1562/files
- https://gist.github.com/dalf/972eb05e7a9bee161487132a7de244d2
- https://github.com/searxng/searxng/pull/1412/files
- https://github.com/searxng/searxng/pull/1356

In my opinion, TypeDict is unsuitable because the objects are still dictionaries
and not instances of classes / the `dataclass` are classes but ...

The `msgspec.Struct` combine the advantages of typing, runtime behaviour and
also offer the option of (fast) serializing (incl. type check) the objects.

Currently not possible but conceivable with `msgspec`: Outsourcing the engines
into separate processes, what possibilities this opens up in the future is left
to the imagination!

Internally, we have already defined that it is desirable to decouple the
development of the engines from the development of the SearXNG core / The
serialization of the `Result` objects is a prerequisite for this.

HINT: The threads listed above were the template for this PR, even though the
implementation here is based on msgspec.  They should also be an inspiration for
the following PRs of typification, as the models and implementations can provide
a good direction.

Why just one commit?
====================

I tried to create several (thematically separated) commits, but gave up at some
point ... there are too many things to tackle at once / The comprehensibility of
the commits would not be improved by a thematic separation. On the contrary, we
would have to make multiple changes at the same places and the goal of a change
would be vaguely recognizable in the fog of the commits.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2025-01-28 07:07:08 +01:00