Back to Projects

GPT Style Language Model

A transformer-based language model built from scratch in PyTorch with custom tokenization and training pipeline.

Project Overview

Built a complete GPT-style language model from scratch, implementing all core transformer components including multi-head attention, positional encoding, and causal masking. The model supports both text generation and classification tasks.

Key Features:

  • • Custom BPE tokenizer implementation
  • • Multi-head self-attention mechanism
  • • Causal masking for autoregressive generation
  • • Residual connections and layer normalization
  • • Top-k and nucleus sampling strategies
  • • Modular architecture for different tasks
Technical Implementation

Architecture

The model follows the standard transformer decoder architecture with 12 layers, 768 hidden dimensions, and 12 attention heads. Trained on a diverse text corpus with AdamW optimizer.

Training Details

Implemented custom training loop with gradient clipping, learning rate scheduling, and checkpointing. Used cross-entropy loss with label smoothing for better generalization.

Technologies Used
PyTorch
NumPy
Matplotlib
AdamW
Transformers
CUDA
Python
Try the Model
Enter a prompt and see the model generate text
Code Example
class GPTModel(nn.Module):
    def __init__(self, vocab_size, d_model, n_heads, n_layers):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, d_model)
        self.pos_encoding = PositionalEncoding(d_model)
        self.transformer_blocks = nn.ModuleList([
            TransformerBlock(d_model, n_heads) 
            for _ in range(n_layers)
        ])
        self.ln_f = nn.LayerNorm(d_model)
        self.head = nn.Linear(d_model, vocab_size)
    
    def forward(self, x):
        x = self.embedding(x) + self.pos_encoding(x)
        for block in self.transformer_blocks:
            x = block(x)
        x = self.ln_f(x)
        return self.head(x)