Rate Limiting
=============

* :ref:`Overview`
* :ref:`Quick Start`
* :ref:`Configuration Reference`
* :ref:`Per-Endpoint Limits`
* :ref:`Excluding Paths`
* :ref:`Proxy and Load Balancer Deployments`
* :ref:`Multi-Instance Deployments`

Overview
--------

AppKernel ships with built-in rate limiting to protect against brute-force
login attempts, credential stuffing, and API enumeration. The throttle runs as
a Starlette middleware that sits in front of the security middleware, stopping
excessive traffic before JWT validation is even attempted.

The implementation uses a **fixed-window counter** per client IP and endpoint
group. All state is held in-process — no external dependency is required. See
:ref:`Multi-Instance Deployments` if you run more than one instance behind a
load balancer.

Quick Start
-----------

Enable rate limiting after registering security::

    from appkernel import AppKernelEngine

    kernel = AppKernelEngine('my-app', cfg_dir='./config')
    kernel.enable_security()       # add JWT/RBAC middleware
    kernel.enable_rate_limiting()  # add rate-limit middleware (runs first)
    kernel.register(User, methods=['GET', 'POST', 'PUT', 'DELETE'])
    kernel.run()

With the defaults, each client IP is allowed **100 requests per 60-second
window** across the entire API surface. Requests that exceed the limit receive::

    HTTP 429 Too Many Requests
    Retry-After: 43

    {
        "_type": "ErrorMessage",
        "code": 429,
        "message": "Too many requests. Please slow down and retry after the indicated delay."
    }

The ``Retry-After`` value is the number of seconds remaining in the current
window.

.. important::

   Always call ``enable_rate_limiting()`` **after** ``enable_security()``.
   Starlette applies middlewares in reverse registration order (last added =
   outermost = first to execute). Adding rate limiting last ensures it runs
   before authentication, so brute-force attempts are stopped without incurring
   the cost of JWT validation.

Configuration Reference
-----------------------

Pass a :class:`~appkernel.RateLimitConfig` instance to customise behaviour::

    from appkernel import AppKernelEngine, RateLimitConfig

    kernel.enable_rate_limiting(
        RateLimitConfig(
            requests_per_window=100,     # global limit per client IP
            window_seconds=60,           # window length in seconds
            endpoint_limits={},          # per-prefix overrides (see below)
            exclude_paths=[],            # paths that bypass limiting
            trust_proxy_headers=False,   # honour X-Forwarded-For
        )
    )

.. list-table::
   :header-rows: 1
   :widths: 25 10 65

   * - Parameter
     - Default
     - Description
   * - ``requests_per_window``
     - 100
     - Maximum requests a single IP may make within ``window_seconds``.
       Exceeded requests receive HTTP 429.
   * - ``window_seconds``
     - 60
     - Duration of the counting window. The counter resets when the window
       expires — it does **not** slide continuously.
   * - ``endpoint_limits``
     - ``{}``
     - Per-path-prefix overrides. First matching prefix wins. See
       :ref:`Per-Endpoint Limits`.
   * - ``exclude_paths``
     - ``[]``
     - Path prefixes that bypass rate limiting entirely. See
       :ref:`Excluding Paths`.
   * - ``trust_proxy_headers``
     - ``False``
     - When ``True``, the client IP is read from ``X-Forwarded-For`` rather
       than the TCP peer address. Enable only behind a trusted reverse proxy.
       See :ref:`Proxy and Load Balancer Deployments`.

Recommended profiles:

============  ======================  ========================
Traffic       requests_per_window     window_seconds
============  ======================  ========================
Low / auth    10–20                   60
Medium        100                     60 (default)
High          500                     60
============  ======================  ========================

Per-Endpoint Limits
-------------------

Authentication endpoints typically need tighter limits than the general API.
Use ``endpoint_limits`` to override the global limit for specific path prefixes::

    kernel.enable_rate_limiting(
        RateLimitConfig(
            requests_per_window=200,          # generous global limit
            endpoint_limits={
                '/auth': 10,                  # brute-force protection
                '/users/change_password': 5,  # password-reset protection
                '/admin': 20,                 # admin surface
            }
        )
    )

The first matching prefix wins, so order matters for overlapping prefixes.
Requests whose path does not match any prefix fall back to
``requests_per_window``.

Excluding Paths
---------------

Health checks, readiness probes, and metrics endpoints should not be rate
limited as they are called by infrastructure at high frequency::

    kernel.enable_rate_limiting(
        RateLimitConfig(
            exclude_paths=['/health', '/ready', '/metrics']
        )
    )

Prefix matching is used — ``'/health'`` excludes ``/health``, ``/health/live``,
and ``/healthz``.

Proxy and Load Balancer Deployments
-------------------------------------

When AppKernel runs behind a reverse proxy (nginx, AWS ALB, Cloudflare), the
TCP peer address seen by the application is the proxy IP, not the real client
IP. All requests would share a single rate-limit bucket, making the throttle
ineffective.

Set ``trust_proxy_headers=True`` to read the real IP from the first address in
the ``X-Forwarded-For`` header::

    kernel.enable_rate_limiting(
        RateLimitConfig(trust_proxy_headers=True)
    )

.. warning::

   Only enable ``trust_proxy_headers`` when AppKernel sits behind a proxy that
   you control and that strips or overwrites ``X-Forwarded-For``. If the header
   can be set by end users, an attacker can forge any IP and trivially bypass
   per-IP limits.

Multi-Instance Deployments
---------------------------

The default limiter stores all counters in the memory of the running process.
If you run multiple AppKernel instances behind a load balancer, each instance
tracks its own counters independently — a client could hit every instance at
the configured limit, effectively multiplying their allowed throughput by the
number of instances.

For multi-instance deployments, replace the in-process limiter with a
Redis-backed implementation. The middleware accepts any object that implements
the same ``check(request) -> (allowed, retry_after)`` interface as
:class:`~appkernel.rate_limit.RateLimiter`::

    from appkernel.rate_limit import RateLimitConfig, RateLimitMiddleware

    class RedisRateLimiter:
        def __init__(self, redis_client, cfg: RateLimitConfig):
            self._redis = redis_client
            self._cfg = cfg

        def check(self, request) -> tuple[bool, int]:
            # implement sliding window in Redis using INCR + EXPIRE
            ...

    limiter = RedisRateLimiter(redis_client, RateLimitConfig(requests_per_window=100))
    kernel.app.add_middleware(RateLimitMiddleware, limiter=limiter)