Reinforcement learning models for adaptive and automated optimization of firewall placement, rule configuration, and network performance

Zahrasadat Torabi

doi:10.82348/our-archive.00099

Back

Doctoral Thesis

Reinforcement learning models for adaptive and automated optimization of firewall placement, rule configuration, and network performance

Zahrasadat Torabi

Doctor of Philosophy - PhD, University of Otago

13/04/2026

DOI:

https://doi.org/10.82348/our-archive.00099

Handle:

https://hdl.handle.net/10523/50419

Abstract

Reinforcement Learning (RL)

Software-Defined Networking (SDN)

Firewall Placement Optimization

Firewall Rule Configuration

Network Security Requirements (NSRs)

Q-learning

Network Performance Optimization

Dynamic Network Topology

Adaptive Network Security

Scalability

QoS-aware Routing

Incremental Learning

Traffic Engineering

Ensuring robust security in virtual networks, especially software-defined networks (SDNs), without compromising performance remains a key challenge, as manual firewall configuration is error-prone and poorly suited to the dynamic nature of virtual network topologies.

Traditional graph-based path computation methods such as Dijkstra’s lack integration with real-time performance metrics and do not adapt to dynamic network conditions. In parallel, configuration tools that automate firewall configuration and deployment provide correctness guarantees and verification but often lack adaptability or network performance-driven optimization. More recent reinforcement learning (RL)-based approaches address aspects of traffic performance and routing efficiency; however, they do not focus on firewall placement and policy enforcement, as these aspects fall outside their scope, thus highlighting the need for a unified framework. This thesis presents two RL-based models to unify these tasks.

The first model, RL_OFPC (Optimal Firewall Placement and Configuration), employs a dual-phase Q-learning algorithm incorporating Atomic Predicate (AP) and Maximal Flow (MF) techniques to achieve high empirical coverage of Network Security Requirements (NSRs) while reducing firewall count and rule complexity. Evaluations conducted on Internet2 and GEANT topologies demonstrate that RL_OFPC achieves significant runtime and memory usage improvements over VEREFOO in scenarios with a high number of NSRs and a relatively small number of Allocation Points (APs), with reductions of up to 36%, 49%, and 27.9%. These results indicate improved scalability and efficiency, subject to successful learning convergence.

The second model, RL_ORFD (Optimised Routing and Firewall Deployment), extends RL_OFPC by integrating network performance metrics such as delay, bandwidth, and packet loss into routing decisions. In experimental evaluations, RL_ORFD was benchmarked against (i) a static baseline using shortest-path routing computed via Dijkstra’s algorithm, and (ii) RSIR, an adaptive RL-based model that also utilises link-state information and Q-learning for optimal routing. Compared to the static Dijkstra-based baseline, RL_ORFD reduced delay and packet loss by up to 35% and 28% during peak traffic conditions. While it achieved over 89% similarity with RSIR in traffic optimization metrics, key differences emerged under tie-breaking conditions: RL_ORFD computes a composite QoS score when multiple paths share equal weights, enabling it to outperform RSIR by up to 8% in delay, 11% in packet loss, and 6% in bandwidth utilisation. These performance improvements contribute to more efficient routing and facilitate more reliable security-aware policy deployment by reducing congestion and instability along policy-enforced paths, rather than directly strengthening the underlying security guarantees.

Files and links (1)

pdf

Final-Zahra-thesis-560617714.97 MB

Embargoed Access, Embargo ends: 01/05/2027 2: Abstract Only

Metrics

3 Record Views

Details

Record Identifier: 9926854267401891
Title: Reinforcement learning models for adaptive and automated optimization of firewall placement, rule configuration, and network performance
Creators: Zahrasadat Torabi
Contributors: David Eyers (Advisor / Supervisor) - University of Otago, School of Computing
Veronica Joachim (Advisor / Supervisor) - University of Otago, School of Computing
Academic Unit: School of Computing
Degree Awarded: Doctor of Philosophy - PhD
Project Type: Thesis - Doctoral
Awarding Institution: University of Otago
Language: English
Resource Type ; Subtype: Doctoral Thesis