Multi-policy optimization in decentralized autonomic systems
Citation:
Ivana Dusparic, 'Multi-policy optimization in decentralized autonomic systems', [thesis], Trinity College (Dublin, Ireland). School of Computer Science & Statistics, 2010, pp 174Download Item:
Abstract:
Autonomic computing systems are those that are capable of managing themselves based only on highlevel
objectives given by humans. In such systems the details of how to meet their objectives, even in
the face of changing operating conditions, are left to the systems themselves. Therefore, autonomic
systems are required to be able to self-optimize, self-heal, self-protect, and self-configure. Enabling
autonomic behaviour is particularly challenging in decentralized autonomic systems, where central
control is not tractable or even possible, due to the large number and geographical dispersion of the
entities involved. In such systems, entities only have local views of their immediate environments and
no global view of the system exists. Decentralized autonomic systems can be implemented as multiagent
systems, in which each entity is modelled as an intelligent agent. These agents can self-organize
based only on local actions and interactions, so that the global behaviour of the system, required to
meet its objectives, emerges from the agents’ local behaviours.
This thesis addresses self-optimization in decentralized autonomic systems. Examples of techniques
used to self-optimize autonomic systems include ant-colony optimization, evolutionary algorithms,
neural networks, and reinforcement learning (RL). RL is considered particularly suitable for use in
large-scale autonomic systems, as it does not require a predefined model of the environment. However,
most applications of RL in decentralized autonomic computing address systems that optimize
their behaviours towards only a single policy, while in reality management of most autonomic systems
requires optimization towards multiple, often conflicting policies. These policies can be heterogeneous
(i.e., implemented on different sets of agents, be active at different times and have different levels of
priority), leading to the heterogeneity of the agents of which the system is composed. The cooperation
required for self-optimization is particularly challenging in such heterogeneous multi-agent environments,
as agents might not be aware of other agents’ policies and their relative priority for the system.
Additionally, since agents operate in the same shared environment, dependencies can arise between
their performance and therefore between policy implementations as well.
To address self-optimization in such decentralized autonomic systems in the presence of agent heterogeneity, policy dependency and lack of global knowledge, this thesis proposes Distributed WLearning
(DWL). DWL is an RL-based algorithm for agent-based self-optimization that enables collaboration
between heterogeneous agents in order to simultaneously satisfy multiple heterogeneous
system policies. DWL learns and exploits the dependencies between neighbouring agents and between
policies to improve performance while respecting the relative priorities of the policies. Instead
of always executing the locally-best action, DWL agents take into account how their actions affect
their immediate neighbours; if a neighbouring agent has a very strong preference on an agent’s local
action, that agent will defer to it and execute the action nominated by that neighbour. In particular,
suggestions by neighbouring agents will be executed if their importance exceeds the importance
of the local action when scaled using a cooperation coefficient, which can be predefined, or can be
learnt to maximize the reward received in the immediate neighbourhood. By selecting actions in this
manner, DWL does not require central control or a global view, as it relies solely on local actions and
interactions with one-hop neighbours.
We have evaluated the DWL algorithm in a simulation of an urban traffic control (UTC) system,
a canonical example of the class of decentralized autonomic systems that we are addressing. We show
that DWL is a suitable technique for optimization in UTC, as it outperforms the currently mostwidely
deployed fixed-time and simple adaptive controllers in our simulation. Collaborative DWL
scenarios can outperform non-collaborative scenarios, depending on the level of collaboration, when
the cooperation coefficient is fixed. When DWL agents learn the level of cooperation individually,
the learnt scenarios outperform all non-collaborative scenarios and either outperform or perform as
well as with predefined collaboration coefficients. We also show that addressing two policies simultaneously
using DWL can, based on policy dependencies, improve the performance of both policies
over corresponding single-policy implementations. These results hold for a variety of traffic loads and
patterns and therefore show DWL’s wide applicability in UTC, as well as suggesting that DWL might
be suitable for optimization in other large-scale autonomic systems with similar characteristics.
Author: Dusparic, Ivana
Advisor:
Cahill, VinnyQualification name:
Doctor of Philosophy (Ph.D.)Publisher:
Trinity College (Dublin, Ireland). School of Computer Science & StatisticsNote:
TARA (Trinity's Access to Research Archive) has a robust takedown policy. Please contact us if you have any concerns: rssadmin@tcd.ieType of material:
thesisCollections
Availability:
Full text availableMetadata
Show full item recordLicences: