Supervisor meeting · University of Amsterdam

Feedback, a generalization, and what I built

Jonathan van den Heuvel · with dr. Cyril Hsu & dr. Chrysa Papagianni
Stable-Edge Filtering for Passive OT Device Classification · 8 June 2026

My framing: three things today, in order. Feedback I processed, a generalization I want a steer on, and the live sites. I keep part 1 brisk; the conversation I most want is part 2.

Today

Three things

1How I processed your last feedback, the 8.0 bar
2A brainstorm, generalizing the problem, I want your read
3The websites I built

The thesis as it stands is the clean post-feedback baseline. The generalization in part 2 is a proposal I have deliberately not written into the thesis yet, so you can steer whether it belongs there.

Set expectations: part 2 is the one that could move the grade, so I leave time for it.

Part one

Processing your feedback

From a negative result, to a negative result with a fix.

One sentence: the shape of the contribution changed, it now ends with a working solution. I will show the first three asks as actual before/after diffs of the thesis.

The 3 June feedback

Your asks, in order

Demonstrate a solution, not just a negative result → diff
Isolate the cause with controls → diff
A falsifiable hypothesis; RQ1 as a sanity check → diff
Hourglass problem statement + an abstract, general problem
Deepen the writing: GNN §2.3, scenario table, metrics, why random forest
Fix the figures and citations; add the code link

The next three slides show the first three as before → after changes to the thesis text itself.

Walk the list once. Say the top three I will show literally, as diffs; the rest are the coherence and writing asks, summarised after.

Feedback → revision

The central claim

Your ask: sharpen the over-general claim, and demonstrate a fix.

−before"…temporal persistence is the wrong abstraction for hardening passive OT classification."
+after"The failure is a property of content-agnostic edge filtering: a controller's class-defining polls are low-volume and event-sensitive, so any structural rule strips them."
+after · the fix"A content-aware filter that keeps control-protocol edges removes the failure mode entirely (Δ = +0.000)."

Read the before line, then advance to reveal the after. Same evidence, sharper claim, and now a demonstrated fix.

A correction I found

How the filter is defined

Working through the pipeline you asked me to learn, I found the method described one thing and the code did another.

−before"…a simple phase-local filter based on temporal persistence." (presence within a single phase)
+after"Persistence is measured over the observation window a passive monitor actually captures." The realistic case: at one tap you cannot segment phases.
+after · bonusThe phase-local filter now removes zero edges (Δ0), so it became a clean counterfactual that proves the cause is the observation window.

This is the credibility moment. I found it myself, I disclosed it, and it makes the mechanism cleaner rather than weaker.

Feedback → revision

From open question to falsified hypothesis

Your ask: state a falsifiable hypothesis, and make RQ1 a sanity check.

−before"This thesis asks whether removing transient edges… improves classification and makes it more robust." (an open question)
+after"This thesis tests the hypothesis that removing non-persistent edges improves classification… The hypothesis is falsified."
+after · controls"A battery of controls localises the cause: random removal is harmless, a byte-volume filter is equally harmful, and the penalty is classifier-independent."

The claim is now testable and tested. RQ1 (steady state) is the no-op baseline, the filter removes zero edges there.

The controls you asked for

Isolating the cause

Random, same count removed → harmless. So it is not that removing edges hurts.
Byte-volume → also harmful. A second content-blind proxy strips the low-volume polls.
Phase-local (the idealised filter) → removes nothing. The penalty is the observation window.

Maintenance Δ macro-F1, same count removed per window, four selection rules

The shape is the argument: two amber bars down (persistence, byte-volume), one green up (random, harmless), one at zero (phase-local). It is which edges, not how many.

The writing and structure asks

And the rest, done

✓ Hourglass problem statement + an abstract, general problem
✓ §2.3 extended: GNN message-passing equation + architecture figure
✓ §3.3 scenario table with edge-level ground truth
✓ §3.9 metrics: why macro-F1, higher = better, chance = 0.20
✓ Why random forest is competitive, justified in prose (p = 0.037)
✓ Figures and citations fixed; GitHub link added

Don't read every line. Point at the list and say all the coherence asks are addressed; happy to open any of them in the thesis.

Part two · a brainstorm

Generalizing the problem

Not in the thesis yet. I want your read before I write it in.

Flag clearly: this is a proposal. The question is whether it belongs in the thesis and how heavy to go.

The idea

My finding is a known failure mode

A label-agnostic filter whose criterion is correlated with the class performs informative deletion. The harm comes from that correlation, not from removing edges as such.

Direction. The filter never reads the label, so by the Data Processing Inequality it can only lose label information: I(X';Y) ≤ I(X;Y)
When it bites. The loss is strict exactly when deletion is class-correlated, the MNAR condition: a controller's polls stop because it is a paused controller

Plain English first. DPI = filtering can only lose label info. MNAR = it bites when the deletion depends on the class. Both names are canonical (Cover & Thomas; Rubin).

Why it is compelling

The principle predicts my three controls

Random, count-matched harmless ≈ 0 deletion independent of the class = MCAR → no informative loss
Byte-volume filter harmful −0.060 a different class-correlated criterion = MNAR → same loss
Graph-free random forest same loss information destroyed in the representation, upstream of any model (DPI)

Random = MCAR (safe), persistence and volume = two MNAR selectors (harmful), classifier-independence = the loss is in the data. One principle, all three results.

This is the strongest slide of the brainstorm. The framing is not just a label, it retro-predicts every control I ran.

What it does not claim

The DPI gives the direction of the loss, not the severity, that stays empirical
Needs the premise that deleted edges carry non-redundant class signal
I would not lean on graph-sparsification work, recent results there cut against me

Questions for you

Does this strengthen the thesis, or is it formalism the committee won't reward?
If worth it, light touch (a paragraph + the vocabulary) or a full lemma in intro and discussion?
Is "informative deletion / MNAR" a clean analogy, or a stretch for deterministic, global deletion?

End part 2 on the questions. I genuinely want the steer here before committing it to the thesis.

Part three

What I built

Five live sites, one domain.

Quick tour, all reachable now. I can open any of them live.

Live now

Five sites, one project

jvdhthesis.tech — project home, the public landing page
progress.jvdhthesis.tech — the supervisor progress deck
defense.jvdhthesis.tech — the MSc defence deck, 7 July 2026, newly re-skinned blue
lab.jvdhthesis.tech — a live Grafana dashboard of the running OT lab, read-only
notes.jvdhthesis.tech — a public speaker-notes companion

Self-hosted, Caddy with automatic TLS, the lab reachable over a private overlay network. This deck runs on the same engine.

Optional: open lab.jvdhthesis.tech live to show the factory dashboard moving.

What I'd like from today

A steer on the generalization, and a check that your feedback is fully addressed.

Thank you. · Jonathan van den Heuvel · 8 June 2026

Close on the two asks: go / no-go (or go lightly) on the framing, and anything still missing from the feedback.