The Protocol Baseline Problem: Why Most OT IDS Tools Get the Foundation Wrong

You're called to a substation after a strange Modbus write was logged. The control engineer wants to know: has this PLC ever sent that command before? Was it normal three months ago? Six months ago? Did the behavior change after the last firmware update?

Your IDS - the one that cost the organization seven figures - can't answer any of those questions. It's not that it doesn't see Modbus. It does. It's not that it didn't log the event. It did. But it can't tell you whether this is anomalous because it doesn't have a real baseline to compare against. It has signatures, statistical thresholds, and a thirty-day rolling window. That's not a baseline. That's a heuristic.

This article is about why protocol baseline is the most under-appreciated capability in OT cybersecurity, why most tools fall short on it, and what becomes possible when you actually do it right.

What “baseline” should mean

In vendor pitches, “baseline” gets used loosely. Sometimes it means “the list of devices we discovered in the first week of deployment.” Sometimes it means “the average packet rate over the last seven days.” Sometimes it means “the set of source-destination pairs we've seen.”

None of those are baselines in the way the word should be used. A protocol baseline is the complete, faithful record of what protocol behavior looks like over time - every field of every message, on every flow, with full history. That sounds expensive. It is. It's also the foundation on which every interesting OT cybersecurity capability is built. And most tools, even the expensive ones, don't do it.

Where most IDS tools fall short

After years of evaluating OT IDS products in real engagements, the gaps cluster in five places.

1. Statistical baselines, not exhaustive ones

Most tools observe traffic for some window - seven, thirty, ninety days - and reduce it to summary statistics. Mean packet rate per flow. Top-N protocols. Source-destination matrix. Then they alert on deviations from those summaries.

That works for coarse anomalies - a host that suddenly talks to a new destination, a tenfold traffic spike. It does not work for the questions that actually matter in incident response. “Has this specific Modbus function code ever been issued against this specific holding register on this specific slave?” is not answerable from summary statistics. The data was thrown away when the summary was built.

2. Shallow protocol parsing

Open any commercial OT IDS product and look at the protocols it claims to support. The brochure says “50+ protocols.” Look more carefully and the support is uneven. Modbus, DNP3, IEC 61850, BACnet - these usually get good parsing. EtherNet/IP and PROFINET often get adequate parsing. The long tail - vendor-specific protocols, less common standards, proprietary engineering tools - get partial parsing at best, or are completely silent.

This matters because attackers don't restrict themselves to well-supported protocols. If your IDS decodes Modbus deeply but only sees Siemens S7 as “unknown TCP/102,” your baseline is built on the parts of the truth you happen to be able to see. The standard for what should be parsed is set by Wireshark, the tool every practitioner already trusts. If your IDS doesn't parse every protocol at Wireshark-level fidelity, you have a gap - and you should know exactly where the gap is.

3. Rolling windows discard history

Even tools that record packet-level detail typically keep only a rolling window of full data. Thirty days is common. Some keep ninety. After that, only the summaries remain.

This is fine for storage economics and terrible for cybersecurity. Most interesting questions in OT cybersecurity are time-displaced:

“What did this substation's traffic look like before the last patching cycle?” The patching cycle was six months ago. Your data went away five months ago.
“Did we have this same alarm pattern before the malware incident at the sister site?” The sister site got hit a year ago.
“The vendor says they updated controller firmware in February. Did the protocol behavior actually change?” It's December. February is gone.
“Was this baseline normal, or a slow-rolling compromise that started a year ago?” You'll never know.

A baseline that ages out isn't really a baseline. It's a sliding average. The difference is the difference between “we keep records” and “we keep statistics about records.”

4. Coverage gaps create blind spots

Even tools that store everything and parse deeply almost always have coverage gaps in two dimensions: protocols they don't parse, and traffic they don't see.

The first is the protocol-coverage problem above. The second is more subtle. Many OT environments have engineering networks, vendor-specific control buses, safety-instrumented system protocols, and legacy serial-over-IP that don't get monitored at all. The IDS does its best on the level-2 OT LAN and is silent everywhere else. That's a baseline of part of the truth, which is sometimes worse than no baseline - you build confidence in coverage you don't actually have.

5. The “behavioral baseline” sleight of hand

A common pitch is “machine-learning behavioral baseline.” This sounds impressive and is sometimes useful, but it's not a substitute for an actual protocol baseline. ML behavioral baselining typically learns flow timing, packet-size distributions, and connection patterns. It does not learn - and cannot, because it was never given - the actual protocol-message-level history needed to answer “has this exact command sequence been observed before?”

Behavioral baseline and exhaustive protocol baseline are complementary, not substitutes. Most tools offer the first because it's cheaper to build. Few offer the second.

What a real baseline lets you do

Once you have full-fidelity, full-history, all-protocol coverage, the use cases open up - and they're not the ones the IDS vendor demos.

Forensic answers in minutes, not days. Investigator asks “has this PLC ever issued this exact write?” The answer is a query, not a months-long packet-capture archaeology project. Compliance auditors love this; so does anyone who's ever been on the wrong side of an incident response.

Engineering-change auditing. The vendor pushed a firmware update last month. Has the controller's network behavior actually changed? A diff between baseline windows around the change date and any window before shows you, protocol by protocol, what's different. You'd be surprised how often the vendor's “no impact” claim doesn't survive the diff.

Compliance evidence that's actually evidence. IEC 62443 foundational requirements expect you to know what your normal operating state looks like. A summary-statistics document is not evidence; a queryable historical record is. Same for NERC CIP and NIS2 in their regions.

Pen-test before/after deltas. Run a pen test, restore the system to its claimed pre-test state, then diff the protocol baseline. If anything was missed in the restore - a leftover account, a residual setting, an open port from a tool - the diff catches it.

Slow-rolling threat detection. The most worrying OT compromises don't make noise. They make small, slow changes - a single extra GOOSE message every other day, a new function code seen against one device, a slight shift in DNP3 polling pattern. None of these trip a statistical alert. All are obvious if you can query the history.

Vendor lock-out detection. You're considering replacing a control system vendor. They've told you the current system is interoperable. A protocol baseline lets you verify that claim with data, not vendor PowerPoint.

And the big one: feeding a Digital Twin

This is the use case that pulled me into the topic, and the one I think about most.

I've been building a digital twin initiative - combining OT network traffic with machine learning - that aims to do what conventional digital twins can't. Traditional digital twins are fed from ICS/SCADA telemetry: tags, setpoints, alarms, the operational data your control system exposes. That's useful. It's also a sliver of the total ground truth.

A more interesting digital twin is fed from the network protocol level - every message, every field, every flow, over time. That's where the unfiltered behavior lives. The OT-tag data tells you what the operator was told about the process; the protocol data tells you what the controller and devices were actually doing.

A digital twin built on a real protocol baseline can do things the SCADA-fed kind can't:

Predict failures from anomalies in protocol pacing, not just from setpoint deviations.
Identify silent firmware drift across a fleet without anyone telling it which firmware version is on which device.
Provide forensic context the operator HMI never captured.
Cross-correlate behaviors across protocols and across sites at a level no single SCADA system can.

But - and this is the punchline - a digital twin is only as good as the baseline that feeds it. If your protocol baseline is statistical, the twin produces statistical insights. If your baseline is sampled, the twin guesses. If your baseline ages out, the twin's historical reasoning is fiction.

To get a digital twin that's worth its compute, you need the foundation: every protocol, every field, every flow, full history.

What to look for in tools

If you're evaluating OT IDS or visibility tools, three questions cut through the marketing:

What's your full protocol coverage list, at what depth? Push past “we support fifty protocols.” Ask which are parsed at Wireshark-level fidelity - every field decoded - versus header-only versus “we see the packets but don't decode them.” The honest tools give you a matrix. The dishonest ones get defensive. The best of the current crop go past 250 protocols at full depth, and yes, that number is achievable - just rare.
How long do you keep packet-level history? Not summary statistics - actual decoded protocol data. If the answer is less than the lifecycle of the equipment you're trying to protect, the tool is solving a smaller problem than you have. Real baseline tools keep history indefinitely, or until you decide to stop.
Can I query any window against any other? A baseline that lets you compare today to last month is good. One that lets you compare any chosen window in the past against any other - for any protocol, on any flow - is what you actually want.

Most tools will fail at least one of those questions. The ones that don't are rare, and they're the ones worth your time.

The blunt summary. OT cybersecurity is starved for foundations. We've spent a decade buying detection without doing the boring underlying work of actually knowing what our networks look like. Every layer of capability above that - anomaly detection, threat hunting, compliance, digital twins - is built on a foundation most of us haven't actually laid.

Protocol baselining at full fidelity, with full history, across every protocol your environment carries, is that foundation. It's not glamorous. It's not the keynote demo at vendor events. It's the thing that, once you have it, makes everything else work.

If you're scoping a security program for a plant, a fleet, or a national grid, this is the question to ask first - not last.

I cover protocol analysis, asset visibility, and the foundations of an OT cybersecurity program in Chapters 6, 8, 13, and 14 of the OT Cybersecurity Professional course. See the curriculum. The digital-twin initiative referenced here is described in more detail on the research page.

The Protocol Baseline Problem: Why Most OT IDS Tools Get the Foundation Wrong

What “baseline” should mean

Where most IDS tools fall short

1. Statistical baselines, not exhaustive ones

2. Shallow protocol parsing

3. Rolling windows discard history

4. Coverage gaps create blind spots

5. The “behavioral baseline” sleight of hand

What a real baseline lets you do

And the big one: feeding a Digital Twin

What to look for in tools

More from the OT cybersecurity field

OT Pen Testing - Worth the Risk?

Patching or Pausing? The OT Vulnerability Dilemma