Logo image
Reinforcement learning models for adaptive and automated optimization of firewall placement, rule configuration, and network performance
Doctoral Thesis

Reinforcement learning models for adaptive and automated optimization of firewall placement, rule configuration, and network performance

Zahrasadat Torabi
Doctor of Philosophy - PhD, University of Otago
13/04/2026
DOI:
https://doi.org/10.82348/our-archive.00099
Handle:
https://hdl.handle.net/10523/50419

Abstract

Reinforcement Learning (RL) Software-Defined Networking (SDN) Firewall Placement Optimization Firewall Rule Configuration Network Security Requirements (NSRs) Q-learning Network Performance Optimization Dynamic Network Topology Adaptive Network Security Scalability QoS-aware Routing Incremental Learning Traffic Engineering

Ensuring robust security in virtual networks, especially software-defined networks (SDNs), without compromising performance remains a key challenge, as manual firewall configuration is error-prone and poorly suited to the dynamic nature of virtual network topologies.

Traditional graph-based path computation methods such as Dijkstra’s lack integration with real-time performance metrics and do not adapt to dynamic network conditions. In parallel, configuration tools that automate firewall configuration and deployment provide correctness guarantees and verification but often lack adaptability or network performance-driven optimization. More recent reinforcement learning (RL)-based approaches address aspects of traffic performance and routing efficiency; however, they do not focus on firewall placement and policy enforcement, as these aspects fall outside their scope, thus highlighting the need for a unified framework. This thesis presents two RL-based models to unify these tasks.

The first model, RL_OFPC (Optimal Firewall Placement and Configuration), employs a dual-phase Q-learning algorithm incorporating Atomic Predicate (AP) and Maximal Flow (MF) techniques to achieve high empirical coverage of Network Security Requirements (NSRs) while reducing firewall count and rule complexity. Evaluations conducted on Internet2 and GEANT topologies demonstrate that RL_OFPC achieves significant runtime and memory usage improvements over VEREFOO in scenarios with a high number of NSRs and a relatively small number of Allocation Points (APs), with reductions of up to 36%, 49%, and 27.9%. These results indicate improved scalability and efficiency, subject to successful learning convergence.

The second model, RL_ORFD (Optimised Routing and Firewall Deployment), extends RL_OFPC by integrating network performance metrics such as delay, bandwidth, and packet loss into routing decisions. In experimental evaluations, RL_ORFD was benchmarked against (i) a static baseline using shortest-path routing computed via Dijkstra’s algorithm, and (ii) RSIR, an adaptive RL-based model that also utilises link-state information and Q-learning for optimal routing. Compared to the static Dijkstra-based baseline, RL_ORFD reduced delay and packet loss by up to 35% and 28% during peak traffic conditions. While it achieved over 89% similarity with RSIR in traffic optimization metrics, key differences emerged under tie-breaking conditions: RL_ORFD computes a composite QoS score when multiple paths share equal weights, enabling it to outperform RSIR by up to 8% in delay, 11% in packet loss, and 6% in bandwidth utilisation. These performance improvements contribute to more efficient routing and facilitate more reliable security-aware policy deployment by reducing congestion and instability along policy-enforced paths, rather than directly strengthening the underlying security guarantees.

pdf
Final-Zahra-thesis-560617714.97 MB
Embargoed Access, Embargo ends: 01/05/2027 2: Abstract Only

Metrics

3 Record Views

Details

Logo image