TY - GEN
T1 - Modeling, monitoring and scheduling techniques for network recovery from massive failures
AU - Tootaghaj, Diman Zad
AU - La Porta, Thomas
AU - He, Ting
N1 - Funding Information:
In this paper, we presented the contributions of the thesis [4] and provided comprehensive solutions to recover a network after massive disruption. We proposed novel schemes to monitor and recover a network under uncertain knowledge of failure while targeting four main goals: (1) minimizing the number of necessary repaired elements, (2) minimizing the amount of demand loss, (3) minimizing the recovery time and (4) minimizing the cost of monitoring probes. These critical goals were in conflict with each other and we studied the trade-off among them. The recovery approach and failure detection mechanism with incomplete information is one of the first steps towards understanding disruption management techniques under uncertainty and opens up the area of designing reliable systems under incomplete on noisy information. We then studied the disruption caused by updating flow rules in software defined networks. We then proposed two randomized rounding algorithms with bounded approximation on congestion and demand loss. ACKNOWLEDGEMENT This research was sponsored by the Army Research Laboratory and was accomplished under Cooperative Agreement Number W911NF-13-2-0045 (ARL Cyber Security CRA). The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Laboratory or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes not with standing any copyright notation here on. Special thanks to Prof. Novella Barolini and Dr. Hana Khamfroush for the co-supervision of this work and all coauthors of all papers that have been written during this work.
Publisher Copyright:
© 2019 IFIP.
Copyright:
Copyright 2019 Elsevier B.V., All rights reserved.
PY - 2019/5/16
Y1 - 2019/5/16
N2 - Large-scale failures in communication networks due to natural disasters or malicious attacks can severely affect critical communications and threaten lives of people in the affected area. In the absence of a proper communication infrastructure, rescue operation becomes extremely difficult. Progressive and timely network recovery is, therefore, a key to minimizing losses and facilitating rescue missions. To this end, we focus on network recovery assuming partial and uncertain knowledge of the failure locations. We proposed a progressive multi-stage recovery approach that uses the incomplete knowledge of failure to find a feasible recovery schedule. Next, we focused on failure recovery of multiple interconnected networks. In particular, we focused on the interaction between a power grid and a communication network. Then, we focused on network monitoring techniques that can be used for diagnosing the performance of individual links for localizing soft failures (e.g. highly congested links) in a communication network. We studied the optimal selection of the monitoring paths to balance identifiability and probing cost. Finally, we addressed, a minimum disruptive routing framework in software defined networks. Extensive experimental and simulation results show that our proposed recovery approaches have a lower disruption cost compared to the state-of-the-art while we can configure our choice of trade-off between the identifiability, execution time, the repair/probing cost, congestion and the demand loss.
AB - Large-scale failures in communication networks due to natural disasters or malicious attacks can severely affect critical communications and threaten lives of people in the affected area. In the absence of a proper communication infrastructure, rescue operation becomes extremely difficult. Progressive and timely network recovery is, therefore, a key to minimizing losses and facilitating rescue missions. To this end, we focus on network recovery assuming partial and uncertain knowledge of the failure locations. We proposed a progressive multi-stage recovery approach that uses the incomplete knowledge of failure to find a feasible recovery schedule. Next, we focused on failure recovery of multiple interconnected networks. In particular, we focused on the interaction between a power grid and a communication network. Then, we focused on network monitoring techniques that can be used for diagnosing the performance of individual links for localizing soft failures (e.g. highly congested links) in a communication network. We studied the optimal selection of the monitoring paths to balance identifiability and probing cost. Finally, we addressed, a minimum disruptive routing framework in software defined networks. Extensive experimental and simulation results show that our proposed recovery approaches have a lower disruption cost compared to the state-of-the-art while we can configure our choice of trade-off between the identifiability, execution time, the repair/probing cost, congestion and the demand loss.
UR - http://www.scopus.com/inward/record.url?scp=85067060130&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85067060130&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85067060130
T3 - 2019 IFIP/IEEE Symposium on Integrated Network and Service Management, IM 2019
SP - 695
EP - 700
BT - 2019 IFIP/IEEE Symposium on Integrated Network and Service Management, IM 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 IFIP/IEEE Symposium on Integrated Network and Service Management, IM 2019
Y2 - 8 April 2019 through 12 April 2019
ER -