{"id":1460,"date":"2026-04-17T11:43:11","date_gmt":"2026-04-17T11:43:11","guid":{"rendered":"https:\/\/dww.abilit.eu\/?page_id=1460"},"modified":"2026-04-17T11:43:12","modified_gmt":"2026-04-17T11:43:12","slug":"watchdog-jvm-level-auto-restart-health-supervisor","status":"publish","type":"page","link":"https:\/\/abilit.eu\/index.php\/watchdog-jvm-level-auto-restart-health-supervisor\/","title":{"rendered":""},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">Watchdog \u2014 JVM-level Auto-Restart &amp; Health Supervisor<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">A lightweight, production-hardened daemon that supervises SCADA\u2011LTS JVM processes, performs health checks, and executes controlled restarts or failover actions when required.<\/p>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-c7ebd8d6 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:66.66%\">\n<h3 class=\"wp-block-heading\">Key capabilities<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Periodic JVM responsiveness checks (HTTP, JMX, custom probes).<\/li>\n\n\n\n<li>Latency &amp; thread\u2011stall detection with configurable thresholds.<\/li>\n\n\n\n<li>Graceful restart sequence with pre\/post hooks and safe rollback.<\/li>\n\n\n\n<li>Integration with HA\/failover orchestrators and load\u2011balancers.<\/li>\n\n\n\n<li>Alerting to Prometheus\/Grafana, syslog and SMS gateway.<\/li>\n<\/ul>\n<\/div>\n\n\n\n<div class=\"wp-block-column has-background is-layout-flow wp-block-column-is-layout-flow\" style=\"border-top-left-radius:42px;border-top-right-radius:42px;border-bottom-left-radius:42px;border-bottom-right-radius:42px;background-color:#f8fbff;padding-top:0;padding-bottom:0;flex-basis:33.33%\">\n<div class=\"wp-block-group has-global-padding is-layout-constrained wp-container-core-group-is-layout-094d544d wp-block-group-is-layout-constrained\" style=\"border-top-left-radius:27px;border-top-right-radius:27px;border-bottom-left-radius:27px;border-bottom-right-radius:27px;padding-top:var(--wp--preset--spacing--x-small);padding-right:var(--wp--preset--spacing--x-small);padding-bottom:var(--wp--preset--spacing--x-small);padding-left:var(--wp--preset--spacing--x-small)\">\n<h4 class=\"wp-block-heading\">Quick facts<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Package:<\/strong> Watchdog vX.Y<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Compatibility:<\/strong> SCADA\u2011LTS approved binaries only<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Download:<\/strong><a href=\"https:\/\/dww.abilit.eu\/&lt;!-- TODO: DOWNLOAD LINK --&gt;\">Get package<\/a><\/p>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<h3 class=\"wp-block-heading\">How it works<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">The Watchdog runs as an OS service and executes a configurable set of probes (HTTP health endpoints, JMX checks, thread dump heuristics). When a probe fails or latency rules are breached, the daemon attempts a controlled graceful restart:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Attempt soft recovery steps (clear caches, trigger GC, restart a worker process).<\/li>\n\n\n\n<li>If unresolved, trigger a coordinated JVM restart with graceful shutdown hooks.<\/li>\n\n\n\n<li>If restart fails, mark instance unhealthy and inform load\u2011balancer \/ HA orchestrator.<\/li>\n\n\n\n<li>Execute post\u2011restart validation and, on failure, roll back to previous known-good binary if available.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Configuration highlights<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Probe types: HTTP, TCP, JMX, script hook, custom plugin.<\/li>\n\n\n\n<li>Threshold profiles: staging \/ production \/ critical (separate cooldown times).<\/li>\n\n\n\n<li>Retry and backoff policies configurable per probe.<\/li>\n\n\n\n<li>Pre\/post hooks: run maintenance scripts, notify teams, trigger snapshots.<\/li>\n\n\n\n<li>Secure credentials store for JMX and API probes (integrates with secret managers).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Alerts &amp; Integrations<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Watchdog sends metrics and events to Prometheus, pushes alerts to Alertmanager, supports syslog\/rsyslog, and can call our SMS gateway for critical notifications. Webhook integrations for ticketing (Jira, ServiceNow) are available.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Deployment &amp; Support<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">We provide packaged installers for supported Linux distributions, containerized images, and configuration templates. Installation, tuning and runbooks are part of our service plans.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/dww.abilit.eu\/tools\/watchdog\/install-guide\">Install Guide<\/a><a href=\"https:\/\/dww.abilit.eu\/service\"> <\/a><a href=\"https:\/\/dww.abilit.eu\/service\">Service Plans<\/a><a href=\"mailto:ops@abilit.eu?subject=Watchdog%20Inquiry\"> <\/a><a href=\"mailto:ops@abilit.eu?subject=Watchdog%20Inquiry\">Contact Ops<\/a><\/p>\n\n\n\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button\"><a class=\"wp-block-button__link wp-element-button\" href=\"https:\/\/dww.abilit.eu\/tools\/watchdog\/install-guide\">Install Guide<\/a><\/div>\n\n\n\n<div class=\"wp-block-button\"><a class=\"wp-block-button__link wp-element-button\" href=\"https:\/\/dww.abilit.eu\/service\">Service Plans<\/a><\/div>\n\n\n\n<div class=\"wp-block-button\"><a class=\"wp-block-button__link wp-element-button\" href=\"mailto:info@abilit.eu?subject=Watchdog%20Inquiry\">Contact Ops<\/a><\/div>\n<\/div>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Abil\u2019I.T. \u2014 Watchdog<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Contact: <a href=\"mailto:info@abilit.eu\">info@abilit.eu<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Watchdog \u2014 JVM-level Auto-Restart &amp; Health Supervisor A lightweight, production-hardened daemon that supervises SCADA\u2011LTS JVM processes, performs health checks, and executes controlled restarts or failover actions when required. Key capabilities Quick facts Package: Watchdog vX.Y Compatibility: SCADA\u2011LTS approved binaries only Download:Get package How it works The Watchdog runs as an OS service and executes a [&hellip;]<\/p>\n","protected":false},"author":4,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-1460","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/abilit.eu\/index.php\/wp-json\/wp\/v2\/pages\/1460","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/abilit.eu\/index.php\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/abilit.eu\/index.php\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/abilit.eu\/index.php\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/abilit.eu\/index.php\/wp-json\/wp\/v2\/comments?post=1460"}],"version-history":[{"count":6,"href":"https:\/\/abilit.eu\/index.php\/wp-json\/wp\/v2\/pages\/1460\/revisions"}],"predecessor-version":[{"id":1898,"href":"https:\/\/abilit.eu\/index.php\/wp-json\/wp\/v2\/pages\/1460\/revisions\/1898"}],"wp:attachment":[{"href":"https:\/\/abilit.eu\/index.php\/wp-json\/wp\/v2\/media?parent=1460"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}