Thứ Sáu, 24 tháng 9, 2021

Jupyter notebook tricks

 1. Merge multiple Jupyter Notebook cells into one

a. Let imagine that you have written thousands line of codes. And suddenly, you want to run the entire cells all over again with just one or two values changed.

Guess what? You will need to scroll up -> change the value -> run the notebook time and time again. The bad thing here is that you need to wait for one loop to finish before manually scrolling up and change the value. 

SUCKS!


So here let's put all the code in a loop by after merging all the cells into one using Shift + M.

Thứ Bảy, 11 tháng 9, 2021

python OOP with pytorch

 1. Diplay properties of an object in python (e.g. DataLoader)

ipdb> dir(train_loader.dataset)
['__add__', '__class__', '__class_getitem__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__len__', '__lt__', '__module__', '__ne__', '__new__', '__orig_bases__', '__parameters__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__slots__', '__str__', '__subclasshook__', '__weakref__', '_format_transform_repr', '_is_protocol', '_repr_indent', 'class_to_idx', 'classes', 'extensions', 'extra_repr', 'find_classes', 'imgs', 'loader', 'make_dataset', 'root', 'samples', 'target_transform', 'targets', 'transform', 'transforms']
ipdb> train_loader.dataset.class_to_idx

Here class_to_idx is a nice property of you want to make sure how ImageFolder assigned unique IDs to your human-readable class IDs.

 2. DataLoader is super slow

Even if I load the data manually, the speedup comes to around 10 times. WTHH??

Thứ Hai, 6 tháng 9, 2021

pytorch frequent errors and solutions

 1. CUDA memory overflow (if you have multiple GPUs)

- If you have multiple GPUs, you are able to run pytorch operations (e.g. toch.argsort) on all of them, not just a single GPU.

- Let's take this snippet as an example:








Ok, for each 800 input samples, I sent the batch to another CPU with a specified ID by the batch_idx. 

2. Load a Python object saved to GPU to CPU

- I did a stupid job when I saved a dictionary containing some GPU tensors to a .npy file. 

- Then when I load the npy file, my GPUs had been dominated and ran out of memory.

- Hell yeah, I googled a lot but not find any solutions to load only a part of the file or load the dictionary directly to CPU.

- Suddenly, I think that, Oh, I can load the dictionary to GPU, then move the tensor back to CPU then overwrite the dictionary. Haha. How smart I am!