This article is based on the paper “MetaGraph: Few Shot Link Prediction via Meta Learning” by Joey Bose, Ankit Jain, Piero Molino, and William L. Hamilton
Many realworld data sets are structured as graphs, and as such, machine learning on graphs has been an active area of research in the academic community for many years. One popular machine learning task on graphs is link prediction, which involves the prediction of missing relationships/edges between the nodes in the graph. For instance, in a social network we may use link prediction to power a friendship recommendation system, or in the case of biological network data, we might use link prediction to infer possible relationships between drugs, proteins, and diseases. ^{2,}^{10} However, despite its popularity, previous work on link prediction generally focuses only on one particular problem setting: it generally assumes that link prediction is to be performed on a single large graph and that this graph is relatively complete, i.e., that at least 50 percent of true edges are observed during training. ^{3,9}
In this work, we consider the more challenging setting of fewshot link prediction, where the goal is to perform link prediction on multiple graphs that contain only a small fraction of their true, underlying edges. This task is inspired by applications where we have access to multiple graphs from a single domain but where each of these individual graphs contains only a small fraction of the true, underlying edges. For example, in the biological setting, highthroughput interactomics offers the possibility to estimate thousands of biological interaction networks from different tissues, cell types, and organisms; however, these estimated relationships can be noisy and sparse, and we need learning algorithms that can leverage information across these multiple graphs in order to overcome this sparsity.^{16 }Similarly, in the ecommerce and social network settings, link prediction can often have a large impact in cases where we must quickly make predictions on sparselyestimated graphs, such as when a service has been recently deployed to a new locale. In other words, link prediction for a new sparse graph can benefit from transferring knowledge from other, possibly more dense, graphs assuming there is exploitable shared structure.
We introduce both a new framework called MetaGraph, used for few shot link prediction, and a corresponding series of benchmarks for this task. We adapt the classical gradientbased meta learning formulation for fewshot classification to the graph domain.^{5,6} Specifically, we consider a distribution over graphs as the distribution over tasks from which a global set of parameters are learnt, and we deploy this strategy to train graph neural networks (GNNs) that are capable of fewshot link prediction. To further bootstrap fast adaptation to new graphs we also introduce a graph signature function, which learns how to map the structure of an input graph to an effective initialization point for a GNN link prediction model. We experimentally validate our approach on three link prediction benchmarks. We find that our MetaGraph approach not only achieves fast adaptation but also converges to a better overall solution in many experimental settings, with an average improvement of 5.3 percent in AUC at convergence over nonmeta learning baselines.
Fewshot link prediction setup
Given a distribution over graphs p(G) from which we can sample a training graph G_{i} = (V_{i}, E_{i}, X_{i}) where V_{i} is a set of nodes, E_{i} is a set of edges, and X_{i} is a matrix of realvalued node attributes. We assume that each of these sample graphs, G_{i}, is a simple graph, meaning it contains a single type of relation and no self loops. We further assume that for each graph G_{i} we have access to only small number of training edges E_{train} ⊂ E (with E _{train} << E) during training. Finally, we assume p(G) to be defined over a set of related graphs, be they drawn from a common domain or application setting.
Our goal is to learn a global or meta link prediction model from a set of sampled training graphs G_{i} ∼ p(G) where i = 1…n. With this meta model in place, we can subsequently quickly learn a local link prediction model from a small subset of edges within newly sampled graph G∗ ∼ p(G). More specifically, we want to find a global set of parameters θ, which can generate an effective parameter initialization, φ∗ , for a local link prediction model on graph G∗.
Note that this is quite different from the standard link prediction setting where the goal is to learn from a single graph and not a distribution of graphs. It also differs from standard metalearning for fewshot classification which generally assumes tasks where individual predictions are Independent and identically distributed, as opposed to training edges in the graph which are dependent on each other.^{13}
Proposed approach: MetaGraph
Our approach, MetaGraph, leverages graph neural networks (GNNs). In principle, it can be combined with a wide variety of link prediction approaches based on GNNs, but we adopted a specific GNN, variational graph autoencoders (VGAEs), as our base link prediction framework9.
The key idea behind MetaGraph is that we use gradientbased metalearning to optimize shared global parameters θ, used to initialize the parameters of the VGAE link prediction model. At the same time, the model also learns a graph signature function, a vector representation of a graph that we use to modulate the parameters of the VGAE model. If the model ever observed a similar graph to the one it’s currently examining, it will be able to modulate model parameters accordingly. This helps the model learn effective parameters utilizing just a few steps of gradient descent.
If we’re given a sampled training graph G_{i} , we initialize the parameters for a VGAE link prediction model using a combination of two learned components:
 Global parameters θ, that are used to initialize all the parameters φ of the VGAE model. The φ parameters are optimized with n steps of gradient descent, while the global parameters, θ, are optimized via secondorder gradient descent to provide an effective initialization point for any graph sampled from the distribution p(G).
 A graph signature s_{i} = ψ(G_{i}) that is used to modulate the activations of the VGAE model. The graph signature function is obtained through another GNN. Like the global parameters θ, the graph signature model ψ is optimized via secondorder gradient descent to encode similar parameter initializations for the local link prediction model for graphs that are similar for the purposes of metalearning. For more details about our graph signature function, please refer to our paper.
We detail the overall MetaGraph architecture in Figure 1, below:
Overall, the algorithm for training can be summarized as follows:

 Sample a batch of training graphs
 Initialize VGAE link prediction models for these training graphs using global parameters and signature function
 Run k steps of gradient descent to optimize each of these VGAE models
 Use second order gradient descent to update the global parameters and signature function based on a heldout validation set of edges
Our full paper details several additional variants of MetaGraph, which differ in terms of how the output of the graph signature function is used to modulate the activations of the VGAE inference models.
Experiment setup
To test how MetaGraph might work in a realworld setting, we designed three novel benchmarks for fewshot link prediction. All of these benchmarks contain a set of graphs drawn from a common domain. In all settings, we use 80 percent of these graphs for training and 10 percent as validation graphs, where these training and validation graphs are used to optimize the global model parameters (for MetaGraph) or pretrain weights (for various baseline approaches). The remaining 10 percent served as test graphs, and our goal was to train a model on these test graphs to achieve high link prediction accuracy.
In this fewshot link prediction setting, there are train/val/test splits at both the edge level and the graph level. We used the training edges to predict the likelihood of the test edges on a pergraph basis, but we also trained the model on multiple graphs simultaneously with the goal of facilitating fast adaptation to new graphs via the global model parameters.
Two of our benchmarks are derived from standard multigraph data sets from proteinprotein interaction (PPI) networks and 3D point cloud data (FirstMMDB).^{10,11} The third is a novel multigraph data set based upon the AMINER citation data, where each node corresponds to a paper and links represent citations^{12}. For all data sets, we perform link prediction by training on a small subset (i.e., a percentage) of the edges and then attempting to predict the unseen edges (with 20percent of the heldout edges used for validation).
Several baselines correspond to modifications or ablations of MetaGraph, including the straightforward adaptation of ModelAgnostic MetaLearning (MAML), a finetuned baseline where we pretrain a VGAE on the training graphs observed in a sequential order and finetune on the test graphs (termed Finetune). We also consider a VGAE trained individually on each test graph (termed No Finetune). We use two other standard baselines for link prediction tasks, namely DeepWalk and AdamicAdar to compare against to make sure MetaGraph provides substantial improvement. ^{14, 15}
Results
We compared MetaGraph and baseline models in two settings to understand how well and how fast models can adapt to new unseen test graphs. A convergence setting where we trained the models to convergence, and a fast adaptation setting where we adapted the models after performing only 5 gradient updates. In both settings we train using 10 percent, 20 percent, and 30 percent of the edges of the test graphs and test on the test edges of the test graph. We measure performance by computing the link prediction AUC.
For the convergence setting. we find that MetaGraph achieves the highest average AUC with a relative improvement of 4.8 percent compared to the MAML approach and an improvement of 5.3 percent compared to the Finetune baseline. The performance of MetaGraph is stronger when only using 10 percent of edges, demonstrating the approach’s potential when working with really limited data.
In the fast adaptation setting, we again find that MetaGraph outperforms all the baselines in all but one setting, with an average relative improvement of 9.4 percent compared to MAML and 8 percent compared to the fine tuned baseline. In other words, MetaGraph can not only learn from limited data, but can also take in new data quickly, using only a small number of gradient steps.
The results are detailed in Figure 2 and 3, below:
Moving forward
In this research, we introduce MetaGraph, a solution to the problem of fewshot link prediction, in which the goal is to accurately train ML models that quickly adapt to new sparse graph data. Empirically, we observed substantial gains using MetaGraph compared to strong baselines on three distinct fewshot link prediction benchmarks.
Overall, this work is applicable wherever researchers have access to multiple graphs from a single domain but where each of these individual graphs contains only a small fraction of the true, underlying edges. For example, in the biological setting, highthroughput interactomics offers the possibility to estimate thousands of biological interaction networks from different tissues, cell types, and organisms; however, these estimated relationships can be noisy and sparse, and we need learning algorithms that can leverage information across these multiple graphs in order to overcome this sparsity. Similarly, as mentioned previously, in the ecommerce and social network settings, link prediction can often have a large impact in cases where we must quickly make predictions on sparselyestimated graphs, such as when a service has been recently deployed to a new locale. In other words, link prediction for a new sparse graph can benefit from transferring knowledge from other, possibly more dense, graphs assuming there is exploitable shared structure.
Acknowledgments
We are grateful for the contributions of Jan Pedersen, Jason Yosinski, Sumanth Dathathri, Andrea Madotto,Thang Bui, Maxime Wabartha, Nadeem Ward, Sebastien Lachapelle, and Zhaocheng Zhu to this research.
The icons in the header are obtained from icons8.com.
References

 David LibenNowell and Jon Kleinberg. The link prediction problem for social networks. In Proceedings of the Twelfth International Conference on Information and Knowledge Management, CIKM ’03, pp. 556–559, New York, NY, USA, 2003. ACM. ISBN 1581137230. doi: 10.1145/956863.956972.
 Luca Maria Aiello, Alain Barrat, Rossano Schifanella, Ciro Cattuto, Benjamin Markines, and Filippo Menczer. Friendship prediction and homophily in social media. ACM Trans. Web, 6 (2):9:1–9:33, June 2012. ISSN 15591131. doi: 10.1145/2180861.2180866.
 Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 855–864. ACM, 2016
 Thomas N Kipf and Max Welling. Semisupervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016a
 Erik G Miller, Nicholas E Matsakis, and Paul A Viola. Learning from one example through shared densities on transforms. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No. PR00662), volume 1, pp. 464–471. IEEE, 2000
 Gregory Koch, Richard Zemel, and Ruslan Salakhutdinov. Siamese neural networks for oneshot image recognition. In ICML deep learning workshop, volume 2, 2015
 Jurgen Schmidhuber. Evolutionary principles in selfreferential learning, or on learning how to learn: the metameta… hook. PhD thesis, Technische Universitat München, 1987
 Sebastian Thrun and Lorien Pratt. Learning to learn. Springer Science & Business Media, 2012
 Thomas N Kipf and Max Welling. Variational graph autoencoders. arXiv preprint arXiv:1611.07308, 2016b
 Marinka Zitnik and Jure Leskovec. Predicting multicellular function through multilayer tissue networks. Bioinformatics, 33(14):i190–i198, 2017
 Marion Neumann, Plinio Moreno, Laura Antanas, Roman Garnett, and Kristian Kersting. Graph kernels for object category prediction in taskdependent robot grasping. In Online Proceedings of the Eleventh Workshop on Mining and Learning with Graphs, pp. 0–6, 2013
 Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. Arnetminer: extraction and mining of academic social networks. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 990–998. ACM, 2008
 Chelsea Finn, Pieter Abbeel, and Sergey Levine. Modelagnostic metalearning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine LearningVolume 70, pp. 1126–1135. JMLR. org, 2017
 Bryan Perozzi, Rami AlRfou, and Steven Skiena. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 701–710. ACM, 2014
 Lada A Adamic and Eytan Adar. Friends and neighbors on the web. Social networks, 25(3):211–230, 2003.
 Miriam BarriosRodiles, Kevin R Brown, Barish Ozdamar, Rohit Bose, Zhong Liu, Robert S Donovan, Fukiko Shinjo, Yongmei Liu, Joanna Dembowy, Ian W Taylor, et al. Highthroughput mapping of a dynamic signaling network in mammalian cells. Science, 307(5715):1621–1625, 2005.
Ankit Jain
Ankit Jain is a former research scientist of Uber AI.
Piero Molino
Piero is a Staff Research Scientist in the Hazy research group at Stanford University. He is a former founding member of Uber AI where he created Ludwig, worked on applied projects (COTA, Graph Learning for Uber Eats, Uber’s Dialogue System) and published research on NLP, Dialogue, Visualization, Graph Learning, Reinforcement Learning and Computer Vision.
Joey Bose
Joey Bose is a PhD candidate at McGill University and the Montreal Institute for Learning algorithms working on broad problems involving graph representation learning, generative modeling of structured data and adversarial learning. He was a former intern at Uber AI.
William Hamilton
William L. Hamilton is an Assistant Professor of Computer Science at McGill University, a member of the Mila  Quebec Artificial Intelligence Institute, and a CIFAR Canada AI Chair. His research focuses on graph representation learning and natural language processing. Will’s research has received several awards—including a best paper award from the National Academy of Sciences—and his work has been featured in outlets such as Wired, The New York Times, and the BBC.
Posted by Ankit Jain, Piero Molino, Joey Bose, William Hamilton
Related articles
Most popular
Selective Column Reduction for DataLake Storage Cost Efficiency
Risk Entity Watch – Using Anomaly Detection to Fight Fraud
AttributeBased Access Control at Uber
How Uber Optimized Cassandra Operations At Scale
Explore
Products
Company