# Similarity Based Networks

**SIMILARITY BASED NETWORKS SYLLABUS**

**Level:**Doctoral

**Course Status:**Elective

**Full description: **

**Background and overall aim of the course.**

Network theory describes complex systems by focusing on the structure and dynamics of networks of elements whose links are originated by direct relationships among the elements of a set. For example, a network of actors is built by connecting all pairs of actors having played together in at least one movie. In the present course we will present a parallel approach using networks as interpretative tools of complex systems of social and economic nature. This will be achieved by considering the properties and the tools used in similarity based networks. A similarity based network is a network obtained starting from a similarity measure characterizing a set of elements belonging to a complex systems. These elements can be as different as, for example, time series of asset returns traded in a financial market, answers of political candidates to a survey, crimes committed by a criminal suspect, etc . The basic similarity measure used is a correlation measure although other similarity measures are possible. Starting from the similarity measure it is possible to obtain classes of similarity networks that can be investigated by using tools and concepts of networks theory. Similarity based networks turns out to be informative about the considered complex systems providing unsupervised approaches to the filtering of relevant information present in them.

The course presents how to obtain, validate and interpret the simplest similarity based networks and discusses their ability in filtering relevant information present in social and economic complex systems. Along with the introduction of the concepts underlying similarity based networks, the course also presents the algorithms that students can actively use to apply the concepts to complex systems of their interest.

**The learning outcomes of the course.**

By successfully completing the course the students will be able to:

- Understand the concept of similarity based networks;

- Analyze and model the hierarchical organization of social and economic complex systems;

- Evaluate the statistical robustness of information present in the multivariate dynamics of social and economic complex systems;

- Use or write dedicated programs for obtaining and analyzing similarity based networks.

**More detailed presentation of course contents.**

Week by week breakdown.

Week Lecture Comments

1 Introduction

Standard multivariate approaches

2 Hierarchical clustering

Minimum spanning trees

3 Planar graphs

* ** *Software description and numerical tools First assignment

4 Reliability of empirical estimations

Information theory

5 Random matrix theory

Community detection

6 Bipartite graphs and validated networks

Attributes of a cluster Project assignment

**Breakdown by topic:**

Detailed description:

*Introduction*

Overview of the course. Similarity measures. The correlation matrix as a basic similarity measure. Similarity measure in the empirical modeling of complex systems.

*Standard multivariate approaches*

Classical multivariate approach. The principal component analysis. Explanatory factors. Empirical estimation of similarity measures.

*Hierarchical clustering*

Basic concepts of hierarchical clustering. Classic methods. Single linkage, Average linkage, Complete linkage. Hierarchical trees. The concept of filtering of a similarity matrix. Hierarchical trees in economic and social systems.

*Minimum spanning trees*

The most basic similarity network. Minimum spanning tree in graph theory. The information content of a minimum spanning tree. Minimum spanning trees of economic and social systems.

*Planar graphs*

Similarity networks with loops and cliques. The planar graph embedding the minimum spanning tree. Capturing layers of information present in a similarity matrix.

*Software description and numerical tools*

Description and discussion of available software, Numerical tools used to obtain similarity based networks and to visualize them.

*Reliability of empirical estimations*

Estimation errors. Role of the network topology and of the correlation strength in the estimation errors. Bootstrap. Bootstrapping evaluation of the reliability of empirical estimations.

*Random matrix theory*

Basic concepts of Random Matrix Theory. Spectral density of eigenvalues of a correlation matrix. The spectral density of a random Gaussian multivariate process. Informative eigenvalues. The estimation of the most statistically reliable part of the correlation matrix.

*Information theory*

Evaluating a distance between the empirical correlation matrix and the correlation matrix of an underlying model. Evaluating a distance between two distinct empirical estimation of a correlation matrix. The Kullback Leibler distance.

*Community detection*

Community (cluster) detection in similarity based networks. The meaning of different approaches in different settings. Aspects common to ordinary networks and aspects which are specific to similarity based networks.

*Bipartite graphs and validated networks*

Bipartite networks. Projected networks. Heterogeneity in the networks of complex systems. Co-occurrence networks, categorical variables, multiple hypothesis test correction.* *

*Attributes of a cluster *

Characterizing a cluster of elements belonging to a system. Over-expression and under-expression of attributes.

**Suggested reading:**

K.V. Mardia, J.T. Kent and J.M. Bibby, Multivariate analysis, Academic press, 1979.

A.K. Jain, M.N. Murty, and P.J. Flynn, Data clustering: a review, ACM computing surveys, 1999.

M. Tumminello, F. Lillo, R.N. Mantegna, Correlation, hierarchies, and networks in financial markets, Journal of Economic Behavior & Organization, 75, 40-58 (2010)

The pdf files of the lectures will be made available.

Teaching format: The bulk of the course will be provided in lectures.

There will be one assignment. An analysis of a chosen complex system has to be prepared. While the task has to be performed independently, students are encouraged to form study groups. There will be a final project.

**Assessment**:

- Assignment (40%)

- Final project (60%)

**Such further items as the course website (e-learning site), assessment deadlines, office hours, contact details etc.**

Contact: Prof. Rosario Nunzio Mantegna (MantegnaR@ceu.hu)

Consultation by arrangement

Further details will be given during the course.