Access edge data as numpy arrays¶

This tutorial will show you how to access various datasets and their corresponding edgelists in tgb

You can directly retrieve the edge data as numpy arrays, PyG and Pytorch dependencies are not necessary

The logic is implemented in dataset.py under tgb/linkproppred/ and tgb/nodeproppred/ folders respectively

In [1]:

Copied!

from tgb.linkproppred.dataset import LinkPropPredDataset
from tgb.linkproppred.dataset import LinkPropPredDataset

specifying the name of the dataset

In [2]:

Copied!

name = "tgbl-wiki"
name = "tgbl-wiki"

process and loading the dataset¶

if the dataset has been processed, it will be loaded from disc for fast access

if the dataset has not been downloaded, it will be processed automatically

In [3]:

Copied!

dataset = LinkPropPredDataset(name=name, root="datasets", preprocess=True)
type(dataset)
dataset = LinkPropPredDataset(name=name, root="datasets", preprocess=True)
type(dataset)

Will you download the dataset(s) now? (y/N)
y
Download started, this might take a while . . . 
Dataset title: tgbl-wiki
Download completed 
Dataset directory is  /mnt/f/code/TGB/tgb/datasets/tgbl_wiki
file not processed, generating processed file

Out[3]:

tgb.linkproppred.dataset.LinkPropPredDataset

Accessing the edge data¶

the edge data can be easily accessed via the property of the method as numpy arrays

In [4]:

Copied!

data = dataset.full_data  #a dictionary stores all the edge data
type(data)
data = dataset.full_data  #a dictionary stores all the edge data
type(data)

Out[4]:

dict

In [5]:

Copied!





type(data['sources'])
type(data['destinations'])
type(data['timestamps'])
type(data['edge_feat'])
type(data['w'])
type(data['edge_label']) #just all one array as all edges in the dataset are positive edges
type(data['edge_idxs']) #just index of the edges increment by 1 for each edge
type(data['sources'])
type(data['destinations'])
type(data['timestamps'])
type(data['edge_feat'])
type(data['w'])
type(data['edge_label']) #just all one array as all edges in the dataset are positive edges
type(data['edge_idxs']) #just index of the edges increment by 1 for each edge

Out[5]:

numpy.ndarray

Accessing the train, test, val split¶

the masks for training, validation, and test split can be accessed directly from the dataset as well

In [6]:

Copied!





train_mask = dataset.train_mask
val_mask = dataset.val_mask
test_mask = dataset.test_mask

type(train_mask)
type(val_mask)
type(test_mask)
train_mask = dataset.train_mask
val_mask = dataset.val_mask
test_mask = dataset.test_mask

type(train_mask)
type(val_mask)
type(test_mask)

Out[6]:

numpy.ndarray

In [ ]: