GPT Style Language Model

A transformer-based language model built from scratch in PyTorch with custom tokenization and training pipeline.

Project Overview

Built a complete GPT-style language model from scratch, implementing all core transformer components including multi-head attention, positional encoding, and causal masking. The model supports both text generation and classification tasks.

Key Features:

• Custom BPE tokenizer implementation
• Multi-head self-attention mechanism
• Causal masking for autoregressive generation
• Residual connections and layer normalization
• Top-k and nucleus sampling strategies
• Modular architecture for different tasks

Technical Implementation

Architecture

The model follows the standard transformer decoder architecture with 12 layers, 768 hidden dimensions, and 12 attention heads. Trained on a diverse text corpus with AdamW optimizer.

Training Details

Implemented custom training loop with gradient clipping, learning rate scheduling, and checkpointing. Used cross-entropy loss with label smoothing for better generalization.

Technologies Used

PyTorch

NumPy

Matplotlib

AdamW

Transformers

CUDA

Python

Try the Model

Enter a prompt and see the model generate text

Input Prompt

Code Example

class GPTModel(nn.Module):
    def __init__(self, vocab_size, d_model, n_heads, n_layers):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, d_model)
        self.pos_encoding = PositionalEncoding(d_model)
        self.transformer_blocks = nn.ModuleList([
            TransformerBlock(d_model, n_heads) 
            for _ in range(n_layers)
        ])
        self.ln_f = nn.LayerNorm(d_model)
        self.head = nn.Linear(d_model, vocab_size)
    
    def forward(self, x):
        x = self.embedding(x) + self.pos_encoding(x)
        for block in self.transformer_blocks:
            x = block(x)
        x = self.ln_f(x)
        return self.head(x)