PyTorch ออกอัพเดทเวอร์ชัน 0.2 มาพร้อมระบบ Distributed Training

By: TigerST

on 7 August 2017 - 04:15 Tags:

Topics:

PyTorch

Deep Learning

เมื่อวันอาทิตย์ที่ 6 สิงหาคมที่ผ่านมาทางหน้าเพจ PyTorch ใน Facebook ได้ประกาศการอัพเดท PyTorch เวอร์ชัน 0.2

ก่อนเข้าเนื้อหา อยากแนะนำ PyTorch ให้ทุกคนรู้จักซักนิด

PyTorch เป็น Deep Learning Library ซึ่ง Facebook พัฒนาบนภาษา Python (เมื่อก่อนอยู่บนภาษา Lua) จุดเด่นอยู่ที่การทำ Dynamic Computation และระบบการหาอนุพันธ์แบบ Automatic Differentiation นอกจากนี้ยังเป็นไลบรารีที่เป็นระบบ Define by run (ไม่ต้องเปิด-ปิด Session เพื่อการรัน) ปัจจุบันรองรับระบบปฏิบัติการ Linux และ macOS เท่านั้น

การอัพเดทที่สำคัญมีดังนี้

1.การ Broadcasting ของ Tensor โดยฟังก์ชันนี้หลายคนจะคุ้นเคยใน Numpy Array

In[1] : x=torch.FloatTensor(5,1,4,1)
In[2] : y=torch.FloatTensor(  3,1,1)
In[3] : (x+y).size()
Out[3] : torch.Size([5, 3, 4, 1])

2.Higher order gradients

ในเวอร์ชัน 0.2 อนุญาตให้ผู้ใช้หาค่าอนุพันธ์ที่สูงกว่าขั้นที่ 1 (Diff มากกว่า 1 ครั้ง) ซึ่งใน Library อื่นจะสามารถหาได้แค่ Gradient เท่านั้น แต่ PyTorch สามารถหา Hessien Vector Product (Hessien Matrix) สามารถทำ Regularization บน Gradient ได้ (Concept คล้ายกับ Lasso Regression แต่ Lasso จะลงโทษตัวพารามิเตอร์ในสมการ)

import torch
from torchvision.models import resnet18
from torch.autograd import Variable

model = resnet18().cuda()

# dummy inputs for the example
input = Variable(torch.randn(2,3,224,224).cuda(), requires_grad=True)
target = Variable(torch.zeros(2).long().cuda())

# as usual
output = model(input)
loss = torch.nn.functional.nll_loss(output, target)

grad_params = torch.autograd.grad(loss, model.parameters(), create_graph=True)
# torch.autograd.grad does not accumuate the gradients into the .grad attributes
# It instead returns the gradients as Variable tuples.

# now compute the 2-norm of the grad_params
grad_norm = 0
for grad in grad_params:
    grad_norm += grad.pow(2).sum()
grad_norm = grad_norm.sqrt()

# take the gradients wrt grad_norm. backward() will accumulate
# the gradients into the .grad attributes
grad_norm.backward()

# do an optimization step
optimizer.step()

จาก Code ด้านบนจะพบว่าปกติเราจะทำการ Backpropagation บนค่า Loss ที่คำนวณได้ แต่ในเวอร์ชัน 0.2 เราสามารถ Backpropagation บนค่า Gradient ที่ถูก Regularization ด้วย Norm-2

3.Distributed Training

PyTorch ได้เพิ่มฟังก์ชันการ Train บนหลาย Machine เข้ามา (ซึ่งคล้ายกับ TensorFlow) แต่กรณีที่ในเครื่องเดียวกันมีการ์ดจอมากกว่า 1 ใบ สามารถใช้

model = torch.nn.parallel.DistributedDataParallel(model.cuda())

DistributedDataParallel จะทำงานโดยส่งข้อมูลไปรันบนการ์ดจอหลายใบพร้อมกันแบบ Parallel โดยมาแทน nn.DataParallel (เมื่อก่อนต้องใช้ DataParallel ครอบทีละ Layer แต่ปัจจุบันสามารถครอบได้ทั้งตัวแบบ)

4.เพิ่ม Layer แบบใหม่ๆ

ผู้ที่สนใจในรายละเอียดรวมถึงอัพเดทอื่นๆ สามารถเข้าไปเยี่ยมชมได้ที่ https://github.com/pytorch/pytorch/releases/tag/v0.2.0