Logo image
Reinforcement Learning-Based Optimal Firewall Placement and Configuration (RL_OFPC)
Conference proceeding   Open access   Peer reviewed

Reinforcement Learning-Based Optimal Firewall Placement and Configuration (RL_OFPC)

Zahra S. Torabi, David Eyers and Veronica Liesaputra
ICT Systems Security and Privacy Protection: 41st IFIP TC11 International Conference, SEC 2026, Proceedings, pp.449-462
IFIP International Conference on ICT Systems Security and Privacy Protection (SEC 2026), 41st (Perth, Australia, 09/06/2026–11/06/2026)
IFIP Advances in Information and Communication Technology, 787
03/06/2026
Handle:
https://hdl.handle.net/10523/51273

Abstract

adaptability firewall placement network security requirement reinforcement learning scalability virtual firewalls
Firewalls remain foundational to cybersecurity, yet their traditional perimeter-based role is challenged by the dynamic nature of modern zero-trust and virtualised networks. In these environments, virtual firewalls—software-defined security functions deployed within service graphs—provide flexible, fine-grained control over traffic flows. However, their scalability and performance are often constrained by sub-optimal placement and rule configuration, especially in large or rapidly evolving topologies. This research introduces the Reinforcement Learning–based Optimised Firewall Placement and Configuration (RL_(O)FPC) model, which addresses these challenges through two cooperating reinforcement learning agents. The FRC-Agent manages path computation and rule enforcement to satisfy hard security constraints, while the FPO-Agent determines optimal firewall locations that minimise the number of deployed firewalls and rule instances while maintaining proximity to critical network components. The model is evaluated against the state-of-the-art VEREFOO framework using both the Maximum Flow (MF) and Atomic Predicate (AP) algorithms across 120 synthetic topologies. Results demonstrate that RL_(O)FPC achieves up to 97.6% accuracy in Network Security Requirement (NSR) satisfaction, improves runtime efficiency by up to 27% in high-NSR environments compared with VEREFOO. However, as the number of Allocation Points (APs) increases, the model’s exploration overhead grows, occasionally surpassing VEREFOO’s scalability performance. Despite this, RL_(O)FPC consistently adapts better to topology modifications through localised Q-learning updates rather than full recomputation, confirming its suitability for dynamic, high-assurance network environments.
url
https://rdcu.be/fnSpEView
Published (Version of record) Free to read via Springer Nature SharedIt Initiative Open All Rights Reserved

Metrics

1 Record Views

Details

Logo image