Community Detection

In biology, sociology, engineering and beyond, many systems are represented and studied as networks (e.g. protein networks, social networks, power grids, transport networks, the internet, etc). In the past decade the field of community detection attracted a lot of interest considering community structures as important features of real-world networks. Given a network of any kind, looking for communities refers to finding groups of nodes that are more densely connected internally than with the rest of the network. The concept considers the inhomogeneity within the connections between nodes in order to derive a partitioning of the network.

As opposed to clustering methods which commonly involve a given number of clusters, communities within networks are usually unknown, can be of unequal size and density and often have hierarchies. Finding community sets covering a network can give information about its underlying structure and its functioning. It can also be used as a more compact representation of the network, for instance for visualisations. (Often the term partition is used to describe what is called here a community set. When considering crisp boundaries, a community set covering the whole network is a partition. However as communities can also overlap, the term partition is then incorrect. For consistency over all cases, the term community set is preferred and used here.)

Detecting community structure in networks can be split into two subtasks: how to explore a graph, and how to measure the quality of a community set. Partitioning graphs is an NP-hard task and hence finding the optimum community partition is at least as hard. (The number of community sets is at least as large as the number of partitions. It is the same with crisp communities and larger when overlapping.) Heuristics based algorithms have thus been devised to reduce the complexity while still providing acceptable solutions.

Considering the sizes of some real-world network (e.g. web, protein networks) much effort is put into finding efficient algorithms able to deal with larger and larger networks. These algorithm often use the quality measure called modularity (Newman, 2004). Yet this measure has drawbacks and alternative measures have been suggested. Also many community detection algorithms return community sets with crisp boundaries (i.e. no overlapping) whereas networks communities can also be overlapping. Finally communities may be uncovered at various scales (or resolutions) in a given network (e.g. micro and macro scales). Some quality measures offer the possibility to explore various scales, others do not. Therefore the problem of community analysis in networks can be tackled in various ways, addressing some of these criteria, hardly all at once.

My work investigated the use of the recently introduced partition quality stability (Delvenne et al., 2010) as an optimisation criterion to enable multi-scale community detection. Stability considers a network as a Markov chain and exploits random walks on the network to analyse community partitions. The length of the random walk acts as a scale parameter.

I also created and developed a method for the fast detection of communities across multiple scales on large networks. The method is applied to global criteria such as stability optimisation, Reichardt and Bornholdt's criterion, Arenas et al.'s criterion, Ronhovde and Nussinov's criterion, and also to local criteria such as Lancichinetti et al.’s criterion and Huang et al.'s criterion (see the publications from the authors for information about their respective criteria). This results in two algorithms, one designed for global criteria and the other for local criteria. The implementation has been done in C++ and has been carefully designed to provide efficiency and scalability to very large networks. Another implementation has also been done in Matlab.

The publications about this work are available on the publications page. All the C++ and Matlab code is available for download here.