Schedule rides in advance

Meta-Graph: Few-Shot Link Prediction Using Meta-Learning

29 May 2020 / Global

Share
Facebook
Twitter
Linkedin
Envelope
This article is based on the paper “Meta-Graph: Few Shot Link Prediction via Meta Learning” by Joey Bose, Ankit Jain, Piero Molino, and William L. Hamilton
Many real-world data sets are structured as graphs, and as such, machine learning on graphs has been an active area of research in the academic community for many years. One popular machine learning task on graphs is link prediction, which involves the prediction of missing relationships/edges between the nodes in the graph. For instance, in a social network we may use link prediction to power a friendship recommendation system, or in the case of biological network data, we might use link prediction to infer possible relationships between drugs, proteins, and diseases. 2,10 However, despite its popularity, previous work on link prediction generally focuses only on one particular problem setting: it generally assumes that link prediction is to be performed on a single large graph and that this graph is relatively complete, i.e., that at least 50 percent of true edges are observed during training. 3,9
In this work, we consider the more challenging setting of few-shot link prediction, where the goal is to perform link prediction on multiple graphs that contain only a small fraction of their true, underlying edges. This task is inspired by applications where we have access to multiple graphs from a single domain but where each of these individual graphs contains only a small fraction of the true, underlying edges. For example, in the biological setting, high-throughput interactomics offers the possibility to estimate thousands of biological interaction networks from different tissues, cell types, and organisms; however, these estimated relationships can be noisy and sparse, and we need learning algorithms that can leverage information across these multiple graphs in order to overcome this sparsity.16 Similarly, in the e-commerce and social network settings, link prediction can often have a large impact in cases where we must quickly make predictions on sparsely-estimated graphs, such as when a service has been recently deployed to a new locale. In other words, link prediction for a new sparse graph can benefit from transferring knowledge from other, possibly more dense, graphs assuming there is exploitable shared structure.
We introduce both a new framework called Meta-Graph, used for few shot link prediction, and a corresponding series of benchmarks for this task. We adapt the classical gradient-based meta learning formulation for few-shot classification to the graph domain.5,6 Specifically, we consider a distribution over graphs as the distribution over tasks from which a global set of parameters are learnt, and we deploy this strategy to train graph neural networks (GNNs) that are capable of few-shot link prediction. To further bootstrap fast adaptation to new graphs we also introduce a graph signature function, which learns how to map the structure of an input graph to an effective initialization point for a GNN link prediction model. We experimentally validate our approach on three link prediction benchmarks. We find that our MetaGraph approach not only achieves fast adaptation but also converges to a better overall solution in many experimental settings, with an average improvement of 5.3 percent in AUC at convergence over non-meta learning baselines.
Few-shot link prediction setup
Given a distribution over graphs p(G) from which we can sample a training graph Gi = (Vi, Ei, Xi) where Vi is a set of nodes, Ei is a set of edges, and Xi is a matrix of real-valued node attributes. We assume that each of these sample graphs, Gi, is a simple graph, meaning it contains a single type of relation and no self loops. We further assume that for each graph Gi we have access to only small number of training edges Etrain ⊂ E (with |E train| << |E|) during training. Finally, we assume p(G) to be defined over a set of related graphs, be they drawn from a common domain or application setting.
Our goal is to learn a global or meta link prediction model from a set of sampled training graphs Gi ∼ p(G) where i = 1…n. With this meta model in place, we can subsequently quickly learn a local link prediction model from a small subset of edges within newly sampled graph G∗ ∼ p(G). More specifically, we want to find a global set of parameters θ, which can generate an effective parameter initialization, φ∗ , for a local link prediction model on graph G∗.
Note that this is quite different from the standard link prediction setting where the goal is to learn from a single graph and not a distribution of graphs. It also differs from standard meta-learning for few-shot classification which generally assumes tasks where individual predictions are Independent and identically distributed, as opposed to training edges in the graph which are dependent on each other.13 
Proposed approach: Meta-Graph
Our approach, Meta-Graph, leverages graph neural networks (GNNs). In principle, it can be combined with a wide variety of link prediction approaches based on GNNs, but we adopted a specific GNN, variational graph autoencoders (VGAEs), as our base link prediction framework9. 
The key idea behind Meta-Graph is that we use gradient-based meta-learning to optimize shared global parameters θ, used to initialize the parameters of the VGAE link prediction model. At the same time, the model also learns a graph signature function, a vector representation of a graph that we use to modulate the parameters of the VGAE model. If the model ever observed a similar graph to the one it’s currently examining, it will be able to modulate model parameters accordingly. This helps the model learn effective parameters utilizing just a few steps of gradient descent. 
If we’re given a sampled training graph Gi  , we initialize the parameters for a VGAE link prediction model using a combination of two learned components: 
Global parameters θ, that are used to initialize all the parameters φ of the VGAE model. The φ parameters are optimized with n steps of gradient descent, while the global parameters, θ, are optimized via second-order gradient descent to provide an effective initialization point for any graph sampled from the distribution p(G).
A graph signature si = ψ(Gi) that is used to modulate the activations of the VGAE model. The graph signature function is obtained through another GNN. Like the global parameters θ, the graph signature model ψ is optimized via second-order gradient descent to encode similar parameter initializations for the local link prediction model for graphs that are similar for the purposes of meta-learning. For more details about our graph signature function, please refer to our paper.
We detail the overall Meta-Graph architecture in Figure 1, below:
Figure 1: Our Meta-Graph architecture incorporates a graph signature which renders input to be processed by two GCN encoders. From there, the training graphs are decoded into a final product that provides parameter optimizations for future model training. 
Overall, the algorithm for training can be summarized as follows:
Sample a batch of training graphs
Initialize VGAE link prediction models for these training graphs using global parameters and signature function
Run k steps of gradient descent to optimize each of these VGAE models
Use second order gradient descent to update the global parameters and signature function based on a held-out validation set of edges
Our full paper details several additional variants of Meta-Graph, which differ in terms of how the output of the graph signature function is used to modulate the activations of the VGAE inference models.
Experiment setup
To test how Meta-Graph might work in a real-world setting, we designed three novel benchmarks for few-shot link prediction. All of these benchmarks contain a set of graphs drawn from a common domain. In all settings, we use 80 percent of these graphs for training and 10 percent as validation graphs, where these training and validation graphs are used to optimize the global model parameters (for Meta-Graph) or pre-train weights (for various baseline approaches). The remaining 10 percent served as test graphs, and our goal was to train a model on these test graphs to achieve high link prediction accuracy. 
In this few-shot link prediction setting, there are train/val/test splits at both the edge level and the graph level. We used the training edges to predict the likelihood of the test edges on a per-graph basis, but we also trained the model on multiple graphs simultaneously with the goal of facilitating fast adaptation to new graphs via the global model parameters.
Two of our benchmarks are derived from standard multi-graph data sets from protein-protein interaction (PPI) networks and 3D point cloud data (FirstMM-DB).10,11 The third is a novel multi-graph data set based upon the AMINER citation data, where each node corresponds to a paper and links represent citations12. For all data sets, we perform link prediction by training on a small subset (i.e., a percentage) of the edges and then attempting to predict the unseen edges (with 20percent of the held-out edges used for validation).
Several baselines correspond to modifications or ablations of Meta-Graph, including the straightforward adaptation of Model-Agnostic Meta-Learning (MAML), a fine-tuned baseline where we pre-train a VGAE on the training graphs observed in a sequential order and finetune on the test graphs (termed Finetune). We also consider a VGAE trained individually on each test graph (termed No Finetune). We use two other standard baselines for link prediction tasks, namely DeepWalk and Adamic-Adar to compare against to make sure Meta-Graph  provides substantial improvement. 14, 15
Results
We compared Meta-Graph and baseline models in two settings to understand how well and how fast models can adapt to new unseen test graphs. A convergence setting where we trained the models to convergence, and a fast adaptation setting where we adapted the models after performing only 5 gradient updates. In both settings we train using 10 percent, 20 percent, and 30 percent of the edges of the test graphs and test on the test edges of the test graph. We measure performance by computing the link prediction AUC.
For the convergence setting. we find that Meta-Graph achieves the highest average AUC with a relative improvement of 4.8 percent compared to the MAML approach and an improvement of 5.3 percent compared to the Finetune baseline. The performance of Meta-Graph is stronger when only using 10 percent of edges, demonstrating the approach’s potential when working with really limited data. 
In the fast adaptation setting, we again find that Meta-Graph outperforms all the baselines in all but one setting, with an average relative improvement of 9.4 percent compared to MAML and 8 percent compared to the fine tuned baseline. In other words, Meta-Graph can not only learn from limited data, but can also take in new data quickly, using only a small number of gradient steps.
The results are detailed in Figure 2 and 3, below:
Figure 2. The convergence AUC results for the different fractions of training edges across data sets. With only 10 percent of edges, Meta-Graph outperforms all the other methods across all training datasets. As the percentage of training edges increases, performance of Meta-Graph is better in all but one setting of 30 percent training edges in FirstMM DB. 
Figure 3. The AUC results with only five gradient updates for different fractions of training edges across differing data sets. Meta-Graph outperforms all the other methods across all training datasets except in one setting of using 20 percent of training edges in Ego-Aminer. 
Moving forward
In this research, we introduce Meta-Graph, a solution to the problem of few-shot link prediction, in which the goal is to accurately train ML models that quickly adapt to new sparse graph data.  Empirically, we observed substantial gains using Meta-Graph compared to strong baselines on three distinct few-shot link prediction benchmarks. 
Overall, this work is applicable wherever researchers have access to multiple graphs from a single domain but where each of these individual graphs contains only a small fraction of the true, underlying edges. For example, in the biological setting, high-throughput interactomics offers the possibility to estimate thousands of biological interaction networks from different tissues, cell types, and organisms; however, these estimated relationships can be noisy and sparse, and we need learning algorithms that can leverage information across these multiple graphs in order to overcome this sparsity. Similarly, as mentioned previously, in the e-commerce and social network settings, link prediction can often have a large impact in cases where we must quickly make predictions on sparsely-estimated graphs, such as when a service has been recently deployed to a new locale. In other words, link prediction for a new sparse graph can benefit from transferring knowledge from other, possibly more dense, graphs assuming there is exploitable shared structure.
Acknowledgments
We are grateful for the contributions of Jan Pedersen, Jason Yosinski, Sumanth Dathathri, Andrea Madotto,Thang Bui, Maxime Wabartha, Nadeem Ward, Sebastien Lachapelle, and Zhaocheng Zhu to this research.
The icons in the header are obtained from icons8.com.
References
David Liben-Nowell and Jon Kleinberg. The link prediction problem for social networks. In Proceedings of the Twelfth International Conference on Information and Knowledge Management, CIKM ’03, pp. 556–559, New York, NY, USA, 2003. ACM. ISBN 1-58113-723-0. doi: 10.1145/956863.956972. 
Luca Maria Aiello, Alain Barrat, Rossano Schifanella, Ciro Cattuto, Benjamin Markines, and Filippo Menczer. Friendship prediction and homophily in social media. ACM Trans. Web, 6 (2):9:1–9:33, June 2012. ISSN 1559-1131. doi: 10.1145/2180861.2180866.
Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 855–864. ACM, 2016
Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016a
Erik G Miller, Nicholas E Matsakis, and Paul A Viola. Learning from one example through shared densities on transforms. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No. PR00662), volume 1, pp. 464–471. IEEE, 2000
Gregory Koch, Richard Zemel, and Ruslan Salakhutdinov. Siamese neural networks for one-shot image recognition. In ICML deep learning workshop, volume 2, 2015
Jurgen Schmidhuber. Evolutionary principles in self-referential learning, or on learning how to learn: the meta-meta-… hook. PhD thesis, Technische Universitat München, 1987
Sebastian Thrun and Lorien Pratt. Learning to learn. Springer Science & Business Media, 2012
Thomas N Kipf and Max Welling. Variational graph auto-encoders. arXiv preprint arXiv:1611.07308, 2016b
Marinka Zitnik and Jure Leskovec. Predicting multicellular function through multi-layer tissue networks. Bioinformatics, 33(14):i190–i198, 2017
Marion Neumann, Plinio Moreno, Laura Antanas, Roman Garnett, and Kristian Kersting. Graph kernels for object category prediction in task-dependent robot grasping. In Online Proceedings of the Eleventh Workshop on Mining and Learning with Graphs, pp. 0–6, 2013
Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. Arnetminer: extraction and mining of academic social networks. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 990–998. ACM, 2008
Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine LearningVolume 70, pp. 1126–1135. JMLR. org, 2017
Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 701–710. ACM, 2014
Lada A Adamic and Eytan Adar. Friends and neighbors on the web. Social networks, 25(3):211–230, 2003.
Miriam Barrios-Rodiles, Kevin R Brown, Barish Ozdamar, Rohit Bose, Zhong Liu, Robert S Donovan, Fukiko Shinjo, Yongmei Liu, Joanna Dembowy, Ian W Taylor, et al. High-throughput mapping of a dynamic signaling network in mammalian cells. Science, 307(5715):1621–1625, 2005.

Ankit Jain

Ankit Jain is a former research scientist of Uber AI.

Piero Molino

Piero is a Staff Research Scientist in the Hazy research group at Stanford University. He is a former founding member of Uber AI where he created Ludwig, worked on applied projects (COTA, Graph Learning for Uber Eats, Uber’s Dialogue System) and published research on NLP, Dialogue, Visualization, Graph Learning, Reinforcement Learning and Computer Vision.

Joey Bose

Joey Bose is a PhD candidate at McGill University and the Montreal Institute for Learning algorithms working on broad problems involving graph representation learning, generative modeling of structured data and adversarial learning. He was a former intern at Uber AI.

William Hamilton

William L. Hamilton is an Assistant Professor of Computer Science at McGill University, a member of the Mila - Quebec Artificial Intelligence Institute, and a CIFAR Canada AI Chair. His research focuses on graph representation learning and natural language processing. Will’s research has received several awards—-including a best paper award from the National Academy of Sciences—and his work has been featured in outlets such as Wired, The New York Times, and the BBC.

Posted by Ankit Jain, Piero Molino, Joey Bose, William Hamilton

Category: