Access edge data as numpy arrays¶
This tutorial will show you how to access various datasets and their corresponding edgelists in tgb
You can directly retrieve the edge data as numpy
arrays, PyG
and Pytorch
dependencies are not necessary
The logic is implemented in dataset.py
under tgb/linkproppred/
and tgb/nodeproppred/
folders respectively
In [1]:
Copied!
from tgb.linkproppred.dataset import LinkPropPredDataset
from tgb.linkproppred.dataset import LinkPropPredDataset
specifying the name of the dataset
In [2]:
Copied!
name = "tgbl-wiki"
name = "tgbl-wiki"
process and loading the dataset¶
if the dataset has been processed, it will be loaded from disc for fast access
if the dataset has not been downloaded, it will be processed automatically
In [3]:
Copied!
dataset = LinkPropPredDataset(name=name, root="datasets", preprocess=True)
type(dataset)
dataset = LinkPropPredDataset(name=name, root="datasets", preprocess=True)
type(dataset)
Will you download the dataset(s) now? (y/N) y Download started, this might take a while . . . Dataset title: tgbl-wiki Download completed Dataset directory is /mnt/f/code/TGB/tgb/datasets/tgbl_wiki file not processed, generating processed file
Out[3]:
tgb.linkproppred.dataset.LinkPropPredDataset
Accessing the edge data¶
the edge data can be easily accessed via the property of the method as numpy
arrays
In [4]:
Copied!
data = dataset.full_data #a dictionary stores all the edge data
type(data)
data = dataset.full_data #a dictionary stores all the edge data
type(data)
Out[4]:
dict
In [5]:
Copied!
type(data['sources'])
type(data['destinations'])
type(data['timestamps'])
type(data['edge_feat'])
type(data['w'])
type(data['edge_label']) #just all one array as all edges in the dataset are positive edges
type(data['edge_idxs']) #just index of the edges increment by 1 for each edge
type(data['sources'])
type(data['destinations'])
type(data['timestamps'])
type(data['edge_feat'])
type(data['w'])
type(data['edge_label']) #just all one array as all edges in the dataset are positive edges
type(data['edge_idxs']) #just index of the edges increment by 1 for each edge
Out[5]:
numpy.ndarray
Accessing the train, test, val split¶
the masks for training, validation, and test split can be accessed directly from the dataset
as well
In [6]:
Copied!
train_mask = dataset.train_mask
val_mask = dataset.val_mask
test_mask = dataset.test_mask
type(train_mask)
type(val_mask)
type(test_mask)
train_mask = dataset.train_mask
val_mask = dataset.val_mask
test_mask = dataset.test_mask
type(train_mask)
type(val_mask)
type(test_mask)
Out[6]:
numpy.ndarray
In [ ]:
Copied!