update: docs

This commit is contained in:
nate 2026-04-08 13:14:01 +04:00
parent 1fab19e03a
commit b8f502fdb3
1 changed files with 169 additions and 2 deletions

View File

@ -57,6 +57,9 @@
<div class="nav-section">API Reference</div>
<a href="#account" class="nav-link">Account</a>
<a href="#monitors" class="nav-link">Monitors</a>
<a href="#reliability" class="nav-link">Reliability</a>
<a href="#notifications" class="nav-link">Notifications</a>
<a href="#webhook-payload" class="nav-link">Webhook payload</a>
<div class="nav-section">Query Language</div>
<a href="#ql-fields" class="nav-link">Fields</a>
@ -135,6 +138,11 @@
<span class="k">"request_body"</span>: <span class="s">"{\"ping\": true}"</span>, <span class="c">// optional — Content-Type defaults to application/json</span>
<span class="k">"regions"</span>: [<span class="s">"eu-central"</span>, <span class="s">"us-west"</span>], <span class="c">// optional — default: all regions</span>
<span class="k">"timeout_ms"</span>: <span class="n">10000</span>, <span class="c">// optional — default: 10000</span>
<span class="k">"max_retries"</span>: <span class="n">2</span>, <span class="c">// optional — retry N times before declaring DOWN. Default: 0</span>
<span class="k">"retry_interval_s"</span>: <span class="n">30</span>, <span class="c">// optional — seconds between retries. Default: 30</span>
<span class="k">"resend_interval"</span>: <span class="n">10</span>, <span class="c">// optional — re-alert every Nth consecutive DOWN beat. 0 = never. Default: 0</span>
<span class="k">"cert_alert_days"</span>: <span class="n">14</span>, <span class="c">// optional — alert when TLS cert is within N days of expiry. 0 disables. Default: 14</span>
<span class="k">"channel_ids"</span>: [<span class="s">"&lt;uuid&gt;"</span>], <span class="c">// optional — notification channels to attach</span>
<span class="k">"query"</span>: { ... } <span class="c">// optional — see Query Language below</span>
}</pre>
</div>
@ -149,6 +157,11 @@
<tr><td>request_body</td><td>string</td><td>Request body (Content-Type defaults to application/json)</td></tr>
<tr><td>regions</td><td>string[]</td><td>Regions to ping from: <code>eu-central</code>, <code>us-west</code>. Default: all</td></tr>
<tr><td>timeout_ms</td><td>number</td><td>Request timeout in milliseconds (default: 10000)</td></tr>
<tr><td>max_retries</td><td>number</td><td>Retry a failing check this many times before posting a DOWN result. Default: 0. Max: 10. See <a href="#reliability">Reliability</a>.</td></tr>
<tr><td>retry_interval_s</td><td>number</td><td>Seconds between retries. Default: 30.</td></tr>
<tr><td>resend_interval</td><td>number</td><td>If a monitor stays DOWN, re-fire a notification every Nth consecutive down beat. 0 disables resend. Default: 0.</td></tr>
<tr><td>cert_alert_days</td><td>number</td><td>Fire a separate <code>cert</code> notification when the TLS certificate is within N days of expiring. 0 disables. Default: 14.</td></tr>
<tr><td>channel_ids</td><td>string[]</td><td>Notification channel IDs to attach. See <a href="#notifications">Notifications</a>.</td></tr>
<tr><td>query</td><td>object</td><td>Query conditions — see below</td></tr>
</tbody>
</table>
@ -170,13 +183,167 @@
<h3>Ping History</h3>
<div class="endpoint"><span class="method get">GET</span><span class="path">/monitors/:id/pings?limit=100</span></div>
<p class="endpoint-desc">Returns recent ping results for a monitor. Max 1000.</p>
<p class="endpoint-desc">Returns recent ping results for a monitor. Max 1000. Each ping carries an <code>important</code> boolean — true on status transitions and resend ticks (the beats that triggered notifications).</p>
</div>
<!-- Reliability -->
<div id="reliability" class="section">
<h2>Reliability &amp; alert noise</h2>
<p>PingQL doesn't immediately fire on a single failed check. Three knobs let you tune how reactive vs. how stable the alerting is:</p>
<h3>Retries before DOWN</h3>
<p>If a check fails and <code>max_retries</code> is greater than zero, the runner waits <code>retry_interval_s</code> seconds and retries up to that many times <em>before</em> recording a DOWN result. A successful retry posts a single UP ping with <code>meta.retries</code> noting how many attempts it took. This kills almost all flapping caused by transient TCP resets, brief 5xx blips, or network jitter.</p>
<h3>Important beats &amp; transitions</h3>
<p>Every check is recorded, but the <code>important</code> flag on a ping is only set when the monitor's state changes (UP↔DOWN) <em>for that region</em>. Notifications fire on important beats only — never on every routine check. State is tracked independently per region: if <code>us-west</code> goes DOWN, only a subsequent <code>us-west</code> UP clears it. <code>eu-central</code> being healthy will not silence a <code>us-west</code> outage.</p>
<h3>Resend interval</h3>
<p>For long outages, set <code>resend_interval</code> to re-fire the notification every Nth consecutive DOWN beat. With <code>resend_interval: 10</code>, a still-broken monitor produces an extra alert every 10 down checks. <code>0</code> (the default) means: alert once on the transition, then stay quiet until recovery.</p>
<h3>Cert expiry alerting</h3>
<p>For HTTPS monitors PingQL extracts the TLS leaf certificate's days-until-expiry on every check. When that drops at or below <code>cert_alert_days</code> for the first time, a separate <code>cert</code> notification fires (one per region). The flag clears when the cert is renewed, so each renewal cycle gets exactly one alert. Set <code>cert_alert_days: 0</code> to disable.</p>
<h3>Default empty query</h3>
<p>If you don't supply a <code>query</code>, the monitor is considered up only on a <strong style="color:#4ade80">2xx</strong> response. Redirects (3xx), client errors (4xx) and server errors (5xx) all count as DOWN. Use the QL if you want different behaviour.</p>
</div>
<!-- Notifications -->
<div id="notifications" class="section">
<h2>Notifications</h2>
<p>Notification channels are reusable destinations attached to monitors. When an important beat fires (DOWN, recovery, or cert), each attached channel is dispatched. PingQL ships with a <strong>webhook</strong> provider; more (Discord, Slack, Email) are designed as drop-ins.</p>
<h3>List channels</h3>
<div class="endpoint"><span class="method get">GET</span><span class="path">/notifications/channels</span></div>
<p class="endpoint-desc">Returns all channels for the authenticated account.</p>
<h3>Create channel</h3>
<div class="endpoint"><span class="method post">POST</span><span class="path">/notifications/channels</span></div>
<div class="cb">
<div class="cb-header"><span class="cb-lang">json — request body</span></div>
<pre>{
<span class="k">"name"</span>: <span class="s">"On-call webhook"</span>,
<span class="k">"kind"</span>: <span class="s">"webhook"</span>,
<span class="k">"config"</span>: {
<span class="k">"url"</span>: <span class="s">"https://hooks.example.com/pingql"</span>,
<span class="k">"headers"</span>: { <span class="s">"X-Team"</span>: <span class="s">"infra"</span> }, <span class="c">// optional</span>
<span class="k">"secret"</span>: <span class="s">"shared-hmac-secret"</span> <span class="c">// optional — signs payloads</span>
},
<span class="k">"enabled"</span>: <span class="n">true</span> <span class="c">// optional — default true</span>
}</pre>
</div>
<table>
<thead><tr><th>Field</th><th>Type</th><th>Description</th></tr></thead>
<tbody>
<tr><td>name</td><td>string</td><td>Display name (up to 200 chars)</td></tr>
<tr><td>kind</td><td>string</td><td>Provider type. Currently only <code>webhook</code>.</td></tr>
<tr><td>config</td><td>object</td><td>Provider-specific config. For <code>webhook</code>, requires <code>url</code> (http/https). Optional <code>headers</code> object and <code>secret</code> for HMAC signing.</td></tr>
<tr><td>enabled</td><td>boolean</td><td>Disabled channels are skipped during dispatch but remain attached.</td></tr>
</tbody>
</table>
<h3>Update channel</h3>
<div class="endpoint"><span class="method patch">PATCH</span><span class="path">/notifications/channels/:id</span></div>
<p class="endpoint-desc">All fields optional. Provide a partial body to change name, config, or enabled state.</p>
<h3>Delete channel</h3>
<div class="endpoint"><span class="method delete">DELETE</span><span class="path">/notifications/channels/:id</span></div>
<p class="endpoint-desc">Removes the channel and all monitor attachments to it.</p>
<h3>Test channel</h3>
<div class="endpoint"><span class="method post">POST</span><span class="path">/notifications/channels/:id/test</span></div>
<p class="endpoint-desc">Sends a synthetic <code>test</code> event through the channel and returns whether the provider accepted it. Useful to verify the URL and HMAC are wired correctly without waiting for a real outage.</p>
<h3>Attaching channels to monitors</h3>
<p>Pass <code>channel_ids</code> as an array of channel UUIDs when creating or patching a monitor. The PATCH replaces the full set; pass an empty array to detach all channels. Channels can only be attached to monitors in the same account.</p>
<div class="cb">
<div class="cb-header"><span class="cb-lang">http</span></div>
<pre>PATCH /monitors/abc123def456
Authorization: Bearer &lt;key&gt;
Content-Type: application/json
{ <span class="k">"channel_ids"</span>: [<span class="s">"5fb1c0bf-…"</span>, <span class="s">"a72e0d91-…"</span>] }</pre>
</div>
</div>
<!-- Webhook payload -->
<div id="webhook-payload" class="section">
<h2>Webhook payload</h2>
<p>Webhook channels POST a JSON body to the configured URL on every event. The HTTP method is <code>POST</code>, the request times out after 5 seconds, and PingQL does not retry — the next important beat is the retry. Failures are logged but never block ingest.</p>
<h3>Headers</h3>
<table>
<thead><tr><th>Header</th><th>Description</th></tr></thead>
<tbody>
<tr><td>content-type</td><td><code>application/json</code></td></tr>
<tr><td>user-agent</td><td><code>PingQL-Notifier/1</code></td></tr>
<tr><td>x-pingql-signature</td><td>Hex-encoded HMAC-SHA256 of the raw request body, keyed by <code>config.secret</code>. Only present when a secret is configured. Verify it server-side to confirm the request came from PingQL.</td></tr>
<tr><td><em>custom</em></td><td>Any headers from <code>config.headers</code> are forwarded as-is.</td></tr>
</tbody>
</table>
<h3>Body shape</h3>
<p>Every payload has the same envelope:</p>
<div class="cb">
<div class="cb-header"><span class="cb-lang">json</span></div>
<pre>{
<span class="k">"channel"</span>: { <span class="k">"id"</span>: <span class="s">"&lt;uuid&gt;"</span>, <span class="k">"name"</span>: <span class="s">"On-call webhook"</span> },
<span class="k">"event"</span>: { <span class="k">"kind"</span>: <span class="s">"down"</span> | <span class="s">"up"</span> | <span class="s">"cert"</span> | <span class="s">"test"</span>, ... }
}</pre>
</div>
<h3>Event types</h3>
<p><code>down</code> — fired on the first DOWN important beat for a region, and again every <code>resend_interval</code>th consecutive down if configured.</p>
<div class="cb"><div class="cb-header"><span class="cb-lang">json — event</span></div>
<pre>{
<span class="k">"kind"</span>: <span class="s">"down"</span>,
<span class="k">"monitor"</span>: {
<span class="k">"id"</span>: <span class="s">"abc123def456"</span>,
<span class="k">"name"</span>: <span class="s">"My API"</span>,
<span class="k">"url"</span>: <span class="s">"https://api.example.com/health"</span>,
<span class="k">"region"</span>: <span class="s">"us-west"</span> <span class="c">// "" for unspecified/single-region monitors</span>
},
<span class="k">"ping"</span>: {
<span class="k">"status_code"</span>: <span class="n">503</span>,
<span class="k">"latency_ms"</span>: <span class="n">412</span>,
<span class="k">"error"</span>: <span class="n">null</span>,
<span class="k">"checked_at"</span>: <span class="s">"2026-04-08T14:23:00.000Z"</span>
}
}</pre></div>
<p><code>up</code> — fired on recovery, only when <em>that same region</em> transitions back from DOWN. Same shape as <code>down</code>.</p>
<p><code>cert</code> — fired once per renewal cycle when the TLS leaf cert drops at or below <code>cert_alert_days</code> for a region.</p>
<div class="cb"><div class="cb-header"><span class="cb-lang">json — event</span></div>
<pre>{
<span class="k">"kind"</span>: <span class="s">"cert"</span>,
<span class="k">"monitor"</span>: { <span class="k">"id"</span>: <span class="s">"…"</span>, <span class="k">"name"</span>: <span class="s">"…"</span>, <span class="k">"url"</span>: <span class="s">"…"</span>, <span class="k">"region"</span>: <span class="s">"us-west"</span> },
<span class="k">"days"</span>: <span class="n">9</span> <span class="c">// days until certificate expires</span>
}</pre></div>
<p><code>test</code> — synthetic event from <code>POST /notifications/channels/:id/test</code>. The <code>monitor</code> object is a placeholder.</p>
<div class="cb"><div class="cb-header"><span class="cb-lang">json — event</span></div>
<pre>{
<span class="k">"kind"</span>: <span class="s">"test"</span>,
<span class="k">"monitor"</span>: { <span class="k">"id"</span>: <span class="s">"test"</span>, <span class="k">"name"</span>: <span class="s">"Test event"</span>, <span class="k">"url"</span>: <span class="s">"https://example.com"</span>, <span class="k">"region"</span>: <span class="s">""</span> }
}</pre></div>
<h3>Verifying the signature</h3>
<div class="cb">
<div class="cb-header"><span class="cb-lang">node</span></div>
<pre><span class="k">import</span> { createHmac, timingSafeEqual } <span class="k">from</span> <span class="s">"crypto"</span>;
<span class="k">function</span> verify(rawBody, headerSig, secret) {
<span class="k">const</span> expected = createHmac(<span class="s">"sha256"</span>, secret).update(rawBody).digest(<span class="s">"hex"</span>);
<span class="k">return</span> timingSafeEqual(Buffer.from(expected), Buffer.from(headerSig));
}</pre>
</div>
<p>Always verify against the <em>raw</em> request body before parsing JSON.</p>
</div>
<!-- QL Fields -->
<div id="ql-fields" class="section">
<h2>Query Language — Fields</h2>
<p>A PingQL query is a JSON object evaluated against each ping. If it returns <code style="color:#4ade80;background:#052e16;padding:0.1em 0.35em;border-radius:0.2rem;font-size:0.78rem">true</code>, the monitor is <strong style="color:#4ade80">up</strong>. Default (no query): up when status &lt; 400.</p>
<p>A PingQL query is a JSON object evaluated against each ping. If it returns <code style="color:#4ade80;background:#052e16;padding:0.1em 0.35em;border-radius:0.2rem;font-size:0.78rem">true</code>, the monitor is <strong style="color:#4ade80">up</strong>. Default (no query): up only on a <strong>2xx</strong> status. Redirects and errors all count as DOWN.</p>
<table>
<thead><tr><th>Field</th><th>Type</th><th>Description</th></tr></thead>
<tbody>