Multi-policy optimization in decentralized autonomic systems

Dusparic, Ivana

Autonomic computing systems are those that are capable of managing themselves based only on highlevel objectives given by humans. In such systems the details of how to meet their objectives, even in the face of changing operating conditions, are left to the systems themselves. Therefore, autonomic systems are required to be able to self-optimize, self-heal, self-protect, and self-configure. Enabling autonomic behaviour is particularly challenging in decentralized autonomic systems, where central control is not tractable or even possible, due to the large number and geographical dispersion of the entities involved. In such systems, entities only have local views of their immediate environments and no global view of the system exists. Decentralized autonomic systems can be implemented as multiagent systems, in which each entity is modelled as an intelligent agent. These agents can self-organize based only on local actions and interactions, so that the global behaviour of the system, required to meet its objectives, emerges from the agents’ local behaviours. This thesis addresses self-optimization in decentralized autonomic systems. Examples of techniques used to self-optimize autonomic systems include ant-colony optimization, evolutionary algorithms, neural networks, and reinforcement learning (RL). RL is considered particularly suitable for use in large-scale autonomic systems, as it does not require a predefined model of the environment. However, most applications of RL in decentralized autonomic computing address systems that optimize their behaviours towards only a single policy, while in reality management of most autonomic systems requires optimization towards multiple, often conflicting policies. These policies can be heterogeneous (i.e., implemented on different sets of agents, be active at different times and have different levels of priority), leading to the heterogeneity of the agents of which the system is composed. The cooperation required for self-optimization is particularly challenging in such heterogeneous multi-agent environments, as agents might not be aware of other agents’ policies and their relative priority for the system. Additionally, since agents operate in the same shared environment, dependencies can arise between their performance and therefore between policy implementations as well. To address self-optimization in such decentralized autonomic systems in the presence of agent heterogeneity, policy dependency and lack of global knowledge, this thesis proposes Distributed WLearning (DWL). DWL is an RL-based algorithm for agent-based self-optimization that enables collaboration between heterogeneous agents in order to simultaneously satisfy multiple heterogeneous system policies. DWL learns and exploits the dependencies between neighbouring agents and between policies to improve performance while respecting the relative priorities of the policies. Instead of always executing the locally-best action, DWL agents take into account how their actions affect their immediate neighbours; if a neighbouring agent has a very strong preference on an agent’s local action, that agent will defer to it and execute the action nominated by that neighbour. In particular, suggestions by neighbouring agents will be executed if their importance exceeds the importance of the local action when scaled using a cooperation coefficient, which can be predefined, or can be learnt to maximize the reward received in the immediate neighbourhood. By selecting actions in this manner, DWL does not require central control or a global view, as it relies solely on local actions and interactions with one-hop neighbours. We have evaluated the DWL algorithm in a simulation of an urban traffic control (UTC) system, a canonical example of the class of decentralized autonomic systems that we are addressing. We show that DWL is a suitable technique for optimization in UTC, as it outperforms the currently mostwidely deployed fixed-time and simple adaptive controllers in our simulation. Collaborative DWL scenarios can outperform non-collaborative scenarios, depending on the level of collaboration, when the cooperation coefficient is fixed. When DWL agents learn the level of cooperation individually, the learnt scenarios outperform all non-collaborative scenarios and either outperform or perform as well as with predefined collaboration coefficients. We also show that addressing two policies simultaneously using DWL can, based on policy dependencies, improve the performance of both policies over corresponding single-policy implementations. These results hold for a variety of traffic loads and patterns and therefore show DWL’s wide applicability in UTC, as well as suggesting that DWL might be suitable for optimization in other large-scale autonomic systems with similar characteristics.

Multi-policy optimization in decentralized autonomic systems

File Type:

Item Type:

Date:

Author:

Access:

Citation:

Download Item:

Abstract:

URI:

Advisor:

Qualification name:

Publisher:

Note:

Type of material:

URI:

Collections

Availability:

Keywords:

Metadata

Browse

My Account

Multi-policy optimization in decentralized autonomic systems

File Type:

Item Type:

Date:

Author:

Access:

Citation:

Download Item:

Abstract:

URI:

Advisor:

Qualification name:

Publisher:

Note:

Type of material:

URI:

Collections

Availability:

Keywords:

Metadata