Access edge data in Pytorch Geometric¶
This tutorial will show you how to access various datasets and their corresponding edgelists in tgb
The logic for PyG data is stored in dataset_pyg.py
in tgb/linkproppred
and tgb/nodeproppred
folders
This tutorial requires Pytorch
and PyG
, refer to README.md
for installation instructions
from tgb.linkproppred.dataset_pyg import PyGLinkPropPredDataset
specifying the name of the dataset
name = "tgbl-wiki"
Process and load the dataset¶
if the dataset has been processed, it will be loaded from disc for fast access
if the dataset has not been downloaded, it will be processed automatically
dataset = PyGLinkPropPredDataset(name=name, root="datasets")
type(dataset)
file found, skipping download Dataset directory is /mnt/f/code/TGB/tgb/datasets/tgbl_wiki loading processed file
tgb.linkproppred.dataset_pyg.PyGLinkPropPredDataset
Access edge data from TemporalData object¶
You can retrieve torch_geometric.data.temporal.TemporalData
directly from PyGLinkPropPredDataset
data = dataset.get_TemporalData()
type(data)
torch_geometric.data.temporal.TemporalData
type(data.src)
type(data.dst)
type(data.t)
type(data.msg)
torch.Tensor
Directly access edge data as Pytorch tensors¶
the edge data can be easily accessed via the property of the method, these are converted into pytorch tensors (from PyGLinkPropPredDataset
)
type(dataset.src) #same as src from above
type(dataset.dst) #same as dst
type(dataset.ts) #same as t
type(dataset.edge_feat) #same as msg
type(dataset.edge_label) #same as label used in tgn
torch.Tensor
Accessing the train, test, val split¶
the masks for training, validation, and test split can be accessed directly from the dataset
as well
train_mask = dataset.train_mask
val_mask = dataset.val_mask
test_mask = dataset.test_mask
type(train_mask)
type(val_mask)
type(test_mask)
torch.Tensor