{"id":1625,"date":"2026-04-11T15:06:04","date_gmt":"2026-04-11T15:06:04","guid":{"rendered":"https:\/\/abilit.eu\/?page_id=1625"},"modified":"2026-04-12T08:42:02","modified_gmt":"2026-04-12T08:42:02","slug":"datapoint-cleaner-data-hygiene-normalization-retention","status":"publish","type":"page","link":"https:\/\/abilit.eu\/index.php\/offer\/concept-area\/datapoint-cleaner-data-hygiene-normalization-retention\/","title":{"rendered":"Datapoint Cleaner \u2014 Data Hygiene, Normalization &#038; Retention"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n<h2 class=\"wp-block-post-title\">Datapoint Cleaner \u2014 Data Hygiene, Normalization &#038; Retention<\/h2>\n\n\n<p class=\"wp-block-paragraph\">Service that cleans, normalizes and retires time\u2011series datapoints before they reach long\u2011term storage or analytics. Designed to reduce noise, storage costs and false alerts while preserving auditability and compliance.<\/p>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-c7ebd8d6 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:66.66%\">\n<h3 class=\"wp-block-heading\">Purpose &amp; scope<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Filter out obvious noise (zero\u2011spikes, duplicate writes, format errors) and tag doubtful datapoints for review.<\/li>\n\n\n\n<li>Normalize timestamps, units and labels to canonical schemas before ingestion (Prometheus, Influx, OpenTelemetry metrics).<\/li>\n\n\n\n<li>Apply retention and TTL rules (hot storage vs. cold archive), plus aggregation rollups to reduce long\u2011term footprint.<\/li>\n\n\n\n<li>Respect audit and compliance needs: keep tamper\u2011evident logs of cleaning decisions and enable replay to restore raw data if required.<\/li>\n<\/ul>\n<\/div>\n\n\n\n<div class=\"wp-block-column has-background is-layout-flow wp-block-column-is-layout-flow\" style=\"border-top-left-radius:42px;border-top-right-radius:42px;border-bottom-left-radius:42px;border-bottom-right-radius:42px;background-color:#f8fbff;padding-top:0;padding-bottom:0;flex-basis:33.33%\">\n<div class=\"wp-block-group has-global-padding is-layout-constrained wp-container-core-group-is-layout-094d544d wp-block-group-is-layout-constrained\" style=\"border-top-left-radius:27px;border-top-right-radius:27px;border-bottom-left-radius:27px;border-bottom-right-radius:27px;padding-top:var(--wp--preset--spacing--x-small);padding-right:var(--wp--preset--spacing--x-small);padding-bottom:var(--wp--preset--spacing--x-small);padding-left:var(--wp--preset--spacing--x-small)\">\n<h4 class=\"wp-block-heading\">Quick facts<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Package:<\/strong> Datapoint Cleaner v<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Modes:<\/strong> realtime stream, batch cleanup, review queue<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Download \/ Repo:<\/strong><\/p>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Core features<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Validation rules engine: JSON\/YAML rule sets for range checks, allowed value lists, unit checks and heartbeat detection.<\/li>\n\n\n\n<li>Adaptive noise filters: automatic detection and suppression of single\u2011sample spikes, short runtime flaps and duplicated series.<\/li>\n\n\n\n<li>Schema normalization: label canonicalization, unit conversion, timestamp rounding (configurable windows).<\/li>\n\n\n\n<li>Review queue &amp; human-in-the-loop: flag borderline datapoints to a dashboard for operator review with contextual logs and sample history.<\/li>\n\n\n\n<li>Retention &amp; rollup: configure hot window (raw), aggregated medium window (1m\/5m rollups), and long\u2011term cold store (hourly\/daily summaries).<\/li>\n\n\n\n<li>Auditability: append immutable cleaning decisions to an append\u2011only log (WORM or signed ledger) for compliance and forensics.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Recommended rules &amp; examples<\/h3>\n\n\n\n<pre class=\"wp-block-preformatted has-contrast-color has-text-color has-background has-link-color wp-elements-bdac0696542cd84d7254fbffb76bfcf5\" style=\"background-color:#f6f9ff\"> # Example rule: temperature sensors - metric: env.temperature unit: C min: -40 max: 85 spike_threshold: 20 # suppress changes &gt; 20\u00b0C within 1 sample unless repeated duplicate_window: 10s retention: hot: 7d aggregated: 90d cold: 365d\nExample: probe heartbeat\n\n    metric: probe.heartbeat expect_interval: 60s alert_on_missing: 180s action: mark_probe_stale <\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Operational patterns<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Run Cleaner at the gateway edge (low latency) for early filtering; perform a second pass centrally for enrichment and rollups.<\/li>\n\n\n\n<li>Use deterministic idempotent cleaning operations so replay of raw logs yields the same cleaned output (important for audits).<\/li>\n\n\n\n<li>Store decisions with correlation IDs so incidents can trace back how a datapoint was transformed or dropped.<\/li>\n\n\n\n<li>Provide a &#8220;restore raw&#8221; path: keep raw payloads for the hot window with cryptographic checksums to enable recovery if cleaning rules were too aggressive.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Security, privacy &amp; compliance<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Mask or remove PII fields before logs leave the local network. Avoid storing raw PII in long\u2011term archives.<\/li>\n\n\n\n<li>Use signing and checksums on logs; maintain WORM storage or write\u2011once append\u2011only logs for audit trails where required.<\/li>\n\n\n\n<li>Define retention policies aligning with GDPR: specify how long raw payloads and cleaned outputs are kept and who can request deletion.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Deployment &amp; scaling<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Edge deployments: lightweight agents written in Go\/ Rust for minimal overhead and predictable memory footprint.<\/li>\n\n\n\n<li>Central processors: horizontally scalable stream processors (Kafka \/ Pulsar + Flink\/Beam) for enrichment, rollups and audit logging.<\/li>\n\n\n\n<li>Backpressure handling: when downstream is slow, buffer with bounded queues + prioritized review queue for flagged datapoints.<\/li>\n\n\n\n<li>Monitoring: expose metrics (cleaned_count, dropped_count, flagged_count, avg_processing_ms) to Prometheus and include Grafana dashboards.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Runbook &amp; CLI examples<\/h3>\n\n\n\n<pre class=\"wp-block-preformatted has-contrast-color has-text-color has-background has-link-color wp-elements-6e9deb52e7feef51b1bc458c929f7b71\" style=\"background-color:#f6f9ff\">Run a dry\u2011run batch cleanup\n\ndatacleanerctl run --mode dryrun --input \/data\/raw\/2026-01-01.log --rules \/etc\/datacleaner\/rules.yaml\nPush changes to ruleset and reload (graceful)\n\ndatacleanerctl rules push --file rules.yaml\ndatacleanerctl service reload\nInspect flagged datapoints\n\ndatacleanerctl review list --state flagged --limit 50\n<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Abil\u2019I.T. \u2014 Datapoint Cleaner<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Contact: <a href=\"mailto:ops@abilit.eu\">ops@abilit.eu<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Service that cleans, normalizes and retires time\u2011series datapoints before they reach long\u2011term storage or analytics. Designed to reduce noise, storage costs and false alerts while preserving auditability and compliance. Purpose &amp; scope Quick facts Package: Datapoint Cleaner v Modes: realtime stream, batch cleanup, review queue Download \/ Repo: Core features Recommended rules &amp; examples # [&hellip;]<\/p>\n","protected":false},"author":4,"featured_media":0,"parent":1547,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-1625","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/abilit.eu\/index.php\/wp-json\/wp\/v2\/pages\/1625","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/abilit.eu\/index.php\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/abilit.eu\/index.php\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/abilit.eu\/index.php\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/abilit.eu\/index.php\/wp-json\/wp\/v2\/comments?post=1625"}],"version-history":[{"count":2,"href":"https:\/\/abilit.eu\/index.php\/wp-json\/wp\/v2\/pages\/1625\/revisions"}],"predecessor-version":[{"id":1651,"href":"https:\/\/abilit.eu\/index.php\/wp-json\/wp\/v2\/pages\/1625\/revisions\/1651"}],"up":[{"embeddable":true,"href":"https:\/\/abilit.eu\/index.php\/wp-json\/wp\/v2\/pages\/1547"}],"wp:attachment":[{"href":"https:\/\/abilit.eu\/index.php\/wp-json\/wp\/v2\/media?parent=1625"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}