Deep-Learning

Overview PyTorch is the leading deep learning framework used by researchers and industry. This guide covers the fundamentals you need to build and train neural networks. Tensors import torch # Create tensors x = torch.tensor([[1, 2], [3, 4]], dtype=torch.float32) y = torch.zeros(3, 3) z = torch.randn(2, 3) # Random normal # GPU support if torch.cuda.is_available(): x = x.cuda() Autograd x = torch.tensor([2.0], requires_grad=True) y = x ** 2 + 3 * x + 1 y.backward() print(x.grad) # tensor([7.]) = 2*x + 3 Building a Neural Network import torch.nn as nn class MLP(nn.Module): def __init__(self, input_size, hidden_size, output_size): super().__init__() self.layers = nn.Sequential( nn.Linear(input_size, hidden_size), nn.ReLU(), nn.Dropout(0.2), nn.Linear(hidden_size, output_size) ) def forward(self, x): return self.layers(x) model = MLP(784, 256, 10) Training Loop optimizer = torch.optim.Adam(model.parameters(), lr=0.001) criterion = nn.CrossEntropyLoss() for epoch in range(10): for batch_x, batch_y in dataloader: optimizer.zero_grad() outputs = model(batch_x) loss = criterion(outputs, batch_y) loss.backward() optimizer.step() print(f"Epoch {epoch}, Loss: {loss.item():.4f}") Saving and Loading Models # Save torch.save(model.state_dict(), "model.pth") # Load model.load_state_dict(torch.load("model.pth")) model.eval() Key Resources PyTorch Documentation PyTorch Tutorials

Overview The Transformer architecture, introduced in “Attention Is All You Need” (2017), revolutionized NLP and now powers all modern LLMs. Key Components Input → Embedding → Positional Encoding → Transformer Blocks → Output ↓ [Multi-Head Attention + FFN] × N Self-Attention The core mechanism that allows tokens to attend to all other tokens. import torch import torch.nn.functional as F def attention(Q, K, V, mask=None): d_k = Q.size(-1) scores = torch.matmul(Q, K.transpose(-2, -1)) / (d_k ** 0.5) if mask is not None: scores = scores.masked_fill(mask == 0, -1e9) weights = F.softmax(scores, dim=-1) return torch.matmul(weights, V) Multi-Head Attention Run attention in parallel with different learned projections: ...

Deep-Learning

PyTorch Fundamentals for Deep Learning

Transformer Architecture Explained