..  include:: /Includes.rst.txt

..  _crawlers:

========
Crawlers
========

The extension uses crawlers to visit all URLs configured for cache
warmup. While visiting a URL, the appropriate page cache gets warmed.
Learn more about which crawlers are available by default and how to
implement a custom crawler on this page.

..  php:namespace:: EliasHaeussler\CacheWarmup\Crawler

..  php:interface:: Crawler

    Interface for crawlers used to crawl and warm up URLs.

    ..  php:method:: crawl($urls)

        Crawl a given list of URLs.

        :param array $urls: List of URLs to be crawled.
        :returntype: :php:`\EliasHaeussler\CacheWarmup\Result\CacheWarmupResult`

..  _default-crawlers:

Default crawlers
================

The extension ships with two default crawlers:

-   :php:`\EliasHaeussler\Typo3Warming\Crawler\ConcurrentUserAgentCrawler`:
    Used for cache warmup triggered within the **TYPO3 backend**
-   :php:`\EliasHaeussler\Typo3Warming\Crawler\OutputtingUserAgentCrawler`:
    Used for cache warmup executed from the **command-line**

Both crawlers use a custom `User-Agent` header for all warmup
requests. By using this custom header, it is possible to exclude
warmup requests from the statistics of analysis tools, for example.
The header is generated by a HMAC hash of the string
`TYPO3/tx_warming_crawler`.

The generated header value can be copied form the cache warmup modal
in the TYPO3 backend. Alternatively, a command `warming:showuseragent`
is available which can be used to read the current `User-Agent` header.

..  _implement-a-custom-crawler:

Implement a custom crawler
==========================

..  _available-interfaces:

Available interfaces
--------------------

The actual cache warmup is done via the library :composer:`eliashaeussler/cache-warmup`.
It provides the :php:interface:`\EliasHaeussler\CacheWarmup\Crawler\Crawler`
interface, which must be implemented when developing your own crawler.

..  _verbose-crawlers:

Verbose crawlers
----------------

There is also a :php:interface:`\EliasHaeussler\CacheWarmup\Crawler\VerboseCrawler`
interface that redirects user-oriented output to an instance of
:php:`\Symfony\Component\Console\Output\OutputInterface`.

..  php:namespace:: EliasHaeussler\CacheWarmup\Crawler

..  php:interface:: VerboseCrawler

    Interface that redirects user-oriented output to a given output.

    ..  php:method:: setOutput($output)

        Set output where to redirect user-oriented output.

        :param \Symfony\Component\Console\Output\OutputInterface $output: Output where to redirect user-oriented output.

..  _configurable-crawlers:

Configurable crawlers
---------------------

Custom crawlers can also implement the
:php:interface:`\EliasHaeussler\CacheWarmup\Crawler\ConfigurableCrawler`,
interface allowing users to configure warmup requests themselves.

..  php:namespace:: EliasHaeussler\CacheWarmup\Crawler

..  php:interface:: ConfigurableCrawler

    Interface allowing users to configure warmup requests themselves.

    ..  php:method:: setOptions($options)

        Set custom crawler options.

        :param array $options: Associative array of custom crawler options.

..  seealso::

    `Feature #59 - Introduce configurable crawlers <https://github.com/eliashaeussler/cache-warmup/pull/59>`__
    of :composer:`eliashaeussler/cache-warmup` library

..  _logging-crawlers:

Logging crawlers
----------------

Crawling results can be logged using a dedicated PSR-3 logger. For this, crawlers
must implement the :php:interface:`\EliasHaeussler\CacheWarmup\Crawler\LoggingCrawler` interface
and inject an appropriate PSR-3 logger. In TYPO3 context, this is mostly done using
TYPO3's log manager. Read more about logging in the :ref:`official documentation <t3coreapi:logging>`.

..  php:namespace:: EliasHaeussler\CacheWarmup\Crawler

..  php:interface:: LoggingCrawler

    Interface that allows crawling results to be logged using a dedicated PSR-3 logger.

    ..  php:method:: setLogger($logger)

        Inject PSR-3 compatible logger.

        :param \Psr\Log\LoggerInterface $logger: PSR-3 compatible logger.

    ..  php:method:: setLogLevel($logLevel)

        Set minimum log level.

        :param string $logLevel: The minimum log level.

..  seealso::

    `Feature #271 - Introduce support for PSR-3 loggers <https://github.com/eliashaeussler/cache-warmup/pull/271>`__
    of :composer:`eliashaeussler/cache-warmup` library

..  _stoppable-crawlers:

Stoppable crawlers
------------------

Crawlers implementing the :php:interface:`\EliasHaeussler\CacheWarmup\Crawler\StoppableCrawler`
interface may cancel a cache warmup prematurely if any crawling failure occurs. This
can be especially useful for validation purposes to check whether any page within an
XML sitemap is inaccessible or failing.

..  php:namespace:: EliasHaeussler\CacheWarmup\Crawler

..  php:interface:: StoppableCrawler

    Interface that may cancel a cache warmup prematurely if any crawling failure occurs.

    ..  php:method:: stopOnFailure($stopOnFailure)

        Configure crawler to cancel cache warmup on failure.

        :param bool $stopOnFailure: Cancel cache warmup on failure.

..  seealso::

    `Feature #302 - Introduce stoppable crawler and --stop-on-failure option <https://github.com/eliashaeussler/cache-warmup/pull/302>`__
    of :composer:`eliashaeussler/cache-warmup` library

..  _streamable-crawlers:

Streamable crawlers
-------------------

When running cache warmup from the TYPO3 backend, the current crawling progress is
streamed to the cache warmup progress modal. However, this is only supported for
crawlers implementing the :php:interface:`\EliasHaeussler\Typo3Warming\Crawler\StreamableCrawler`
interface.

Those crawlers will then get an :php:`\EliasHaeussler\SSE\Stream\EventStream`
injected. It can be used to send events to the current event stream. The following
events are currently available:

-   :php:`\EliasHaeussler\Typo3Warming\Http\Message\Event\WarmupFinishedEvent`
-   :php:`\EliasHaeussler\Typo3Warming\Http\Message\Event\WarmupProgressEvent`

By default, when implementing a streamable crawler, there's no need to trigger these
events by your own. Instead, it's better to use the provided
:php:`\EliasHaeussler\Typo3Warming\Http\Message\Handler\StreamResponseHandler`
which takes care of sending appropriate events.

..  php:namespace:: EliasHaeussler\Typo3Warming\Crawler

..  php:interface:: StreamableCrawler

    Interface that allows streaming of cache warmup events using an EventStream.

    ..  php:method:: setStream($stream)

        Set event stream used to send cache warmup events.

        :param \EliasHaeussler\SSE\Stream\EventStream $stream: Event stream used to send cache warmup events.

..  _steps-to-implement-a-new-crawler:

Steps to implement a new crawler
--------------------------------

..  rst-class:: bignums

1.  Create a new crawler

    The new crawler must implement at least one of the following interfaces:

    -   :php:interface:`\EliasHaeussler\CacheWarmup\Crawler\Crawler`
    -   :php:interface:`\EliasHaeussler\CacheWarmup\Crawler\ConfigurableCrawler`
    -   :php:interface:`\EliasHaeussler\CacheWarmup\Crawler\LoggingCrawler`
    -   :php:interface:`\EliasHaeussler\CacheWarmup\Crawler\StoppableCrawler`
    -   :php:interface:`\EliasHaeussler\CacheWarmup\Crawler\VerboseCrawler`
    -   :php:interface:`\EliasHaeussler\Typo3Warming\Crawler\StreamableCrawler`

2.  Configure the new crawler

    Add the new crawler to the :ref:`extension configuration <extension-configuration>`.
    Note that you should configure either the `crawler` or
    `verboseCrawler` option, depending on what interface you have
    implemented.

3.  Flush system caches

    Finally, flush all system caches to ensure the correct crawler
    class is used for further cache warmup requests.

..  seealso::
    View the sources on GitHub:

    -   `Crawler <https://github.com/eliashaeussler/cache-warmup/blob/main/src/Crawler/Crawler.php>`__
    -   `ConfigurableCrawler <https://github.com/eliashaeussler/cache-warmup/blob/main/src/Crawler/ConfigurableCrawler.php>`__
    -   `LoggingCrawler <https://github.com/eliashaeussler/cache-warmup/blob/main/src/Crawler/LoggingCrawler.php>`__
    -   `StoppableCrawler <https://github.com/eliashaeussler/cache-warmup/blob/main/src/Crawler/StoppableCrawler.php>`__
    -   `StreamableCrawler <https://github.com/eliashaeussler/typo3-warming/blob/main/Classes/Crawler/StreamableCrawler.php>`__
    -   `VerboseCrawler <https://github.com/eliashaeussler/cache-warmup/blob/main/src/Crawler/VerboseCrawler.php>`__
    -   `ConcurrentUserAgentCrawler <https://github.com/eliashaeussler/typo3-warming/blob/main/Classes/Crawler/ConcurrentUserAgentCrawler.php>`__
    -   `OutputtingUserAgentCrawler <https://github.com/eliashaeussler/typo3-warming/blob/main/Classes/Crawler/OutputtingUserAgentCrawler.php>`__
    -   `StreamResponseHandler <https://github.com/eliashaeussler/typo3-warming/blob/main/Classes/Http/Message/Handler/StreamResponseHandler.php>`__
    -   `WarmupFinishedEvent <https://github.com/eliashaeussler/typo3-warming/blob/main/Classes/Http/Message/Event/WarmupFinishedEvent.php>`__
    -   `WarmupProgressEvent <https://github.com/eliashaeussler/typo3-warming/blob/main/Classes/Http/Message/Event/WarmupProgressEvent.php>`__
