A while ago, I started contributing open source to Pytorch.
PyTorch is a python package that provides two high-level features:
- Tensor computation (like numpy) with strong GPU acceleration
- Deep Neural Networks built on a tape-based autograd system
My task was related to torchvision. Torchvision is a PyTorch package that has datasets loaders and models for common computer vision image and video datasets (MNIST, CIFAR, ImageNet etc.). I was tasked with writing a data loader for adding Street View House Numbers (SVHN) dataset to torchvision.
SVHN datasets are available in two formats. Format 1 is full street numbers with variable resolutions and a variable number of digits in each image. Format 2 is cropped MNIST-like digits, all of a fixed 32×32 resolution. We made use of format 2. The dataset files are in .mat file format which can be read using scipy.io loadmat function. More information about SVHN dataset here.
As I was looking into the other dataset loader classes, I noticed that there is a very useful pre-defined and intuitive format for writing these data loaders. When calling the data loader class object, you can simply specify whether you want to load training or testing dataset.
Except for some SVHN-specific details, all the data loaders more or less perform the following functions:
- Checks which dataset is needed (training or testing)
- Downloads the relevant dataset: checks if you have the dataset downloaded already. Otherwise it downloads it in a default folder or you can specify one.
- Loads the dataset in the form of a ndarray or tensor.
- SVHN dataset is in .mat format which can be read using scipy.io loadmat function. It contains two key value pairs; X which is a 4-D matrix containing the images, and y which is a vector of class labels.
- Checks the integrity of file using md5 sum
- PyTorch/Torch uses data of shape: Batch x Channel x Height x Width. If the data is coming in a different shape, it is re-shaped/re-arranged to make it compatible with Torch.
- SVHN data is in shape: Height x Width x Channel x Batch. It is transposed/permuted along the axis (3,2,0,1)
- Implements __getitem__ attribute for accessing an image at any given index.
- Implements __len__ attribute to return length of data.
If you are using Pytorch in your project, you can make use of torchvision and load your datasets there. The major benefit here is that it really simplifies the process of loading the training and testing datasets for your own computer vision algorithms. It is even better if you’re aware of some datasets that are commonly used and add to the torchvision package on directly github because,
1) You’re helping people: points for philanthropy ;D
These datasets can help people working on projects that need commonly-used datasets for training and testing purposes and they can be saved from the hassle of finding and fetching that data. It can also be very beneficial for beginners who want to play around with computer vision for educational purposes.
2) They really tell you how to turn your bad code into good code: points for learning
During one of our meetings, my mentor Martijn Pieters, said: “Working at Facebook is like making windmills for storms. At Facebook, we’re building something everyday that has to go to a massive audience and receive humongous amounts of data and it should be able to withstand that kind of storm.” Naturally, for things to be that efficient, the code has to be of excellent quality. When you create a PR to contribute to Facebook projects, they suggest about a dozen little things that you can slightly change to significantly improve your code.