# A Basic Introduction to Graph Neural Networks

Graph neural networks has gain much popularity recently. Many domains can be readily modeled as graphs, such as social networks, molecular graph structures, and recommender system. In this article, I will briefly introduce the graph neural networks.

**Outline**

- what is graph network
- the general architecture of graph neural networks

**1. What is graph network**

Graph network is a structure that composed of **vertices** and **edges**.

Usually vertices are instances. People in social networks, products in recommender system, and institutions in organizations can all be vertices. In order to embed more information in the graph, the vertices can include many attributes as well. For example, the vertex in social network not only represents a specific person, but also have attributes about the person, such as age, sex, and height.

Edges which connect different vertices represent the relations between the start vertex and the end vertex. In social networks, people who are friends are connected by the edge. A edge connecting a customer and a product in recommender system represents the user bought the product.

A graph is formally defined as

For each edge, it connects two vertices. A adjacent matrix A is used to represent the relation among vertices. If the v1 and v2 are connected in the graph, the value of A(1,2) is 1 otherwise 0. A graph of figure 1 can be represented by an adjacent matrix of figure 2. The example shown below is an undirected graph, therefore the corresponding adjacency matrix is symmetric diagonally. A graph can also be directed whose adjacency matrix will be asymmetric.

**2.The general architecture of graph neural networks**

The goal of graph neural networks is to learn the representation of nodes in the graph such that the representation of nodes can save original information as much as possible.

In [1], the authors decompose graph neural networks into four components. All the variants of graph neural networks can be described in this unified framework.

**A pairwise proximity function**is defined over the graph, which measures how closely connected two nodes are in the graph. For example, in recommender system, the similarity between a product and a user who bought the product will be 1, otherwise 0. The result of this pairwise proximity function is treated as the ground truth.**An encoder function**, ENC, that generates the node embeddings with d dimensions. This function contains a number of trainable parameters that are optimized during the training phase.**A decoder function**, DEC, which reconstructs pairwise proximity values from the generated embeddings, i.e., cosine similarity.**A loss function**, L, which determines how the quality of the pairwise reconstructions is evaluated to train the model, i.e., how DEC(zi, zj) is compared to the true sG(vi, vj) values.

The pairwise proximity function and decoder function are determined by ourselves. The pairwise proximity function measures the similarity of nodes in original graph. In practice, random walk is usually used to calculate the probability of two nodes showing in the same walk route. The higher the probability, the closer two nodes are.

The encoder function is the main part of this framework. In direct encoding methods, The encoder function accepts a node as input, the node can be present in index number or in the form of one-hot encoding, and outputs a vector with dimension d as the node’s embedding:

In contrast, graph neural networks incorporate graph structure into the encoder function: instead of generating a fixed size vector based on its own id, graph neural networks utilize the node’s local neighborhood and generate the node’s embedding by incorporating its neighbors’ information. The nodes can be initialized with its attributes if any, otherwise the graph statistics(e.g., node degrees) or the node id. These methods are often called convolutional because they represent a node as a function of its surrounding neighborhood, in a manner similar to the receptive field of a center-surround convolutional kernel in computer vision.

The basic algorithm for this encoding phase is shown below:

- the node embeddings are initialized to be equal to the input node attributes at the beginning
- Then at each iteration, nodes aggregate the embeddings of their neighbors, using an aggregation function that operates over sets of vectors.
- After the aggregation, every node is assigned a new embedding, equal to its aggregated neighborhood vector combined with its previous embedding from the last iteration.
- The process repeats K times. The more times the process iterates, more information from further reaches of the graph is contained in the nodes.

The decoder function evaluates how close two nodes are with their embeddings generated by the previous encoder function. The common adopted approaches for decoder function are *Multi Layer Perceptron(MLP) *and *cosine similarity*. A single number indicating the score of similarity is produced by the decoder function. The process of decoder function can be seen as the process of similarity reconstruction of two nodes in the original graph:

The objective of this process is to minimize the error of the reconstruction, which can be measured by the loss function L:

The parameters of the encoder are optimized by backpropagation to encode relevant information in the nodes’ embedding vectors for downstream tasks.

**References:**

[1] Representation Learning on Graphs: Methods and Applications