Experience

Founder

Medium.com - Acquired

Jan 2023 - Feb 2025

Toronto, ON

• Sold a fully automated RAG pipeline for mid–five figures after scaling to $2.8K monthly net profit and 80K+ monthly users
• Decreased end-to-end latency by 87% using an async pipeline with OpenAI, Lambda, SQS, and Playwright on Fargate
• Reduced costs by 57% by offloading I/O-heavy scraping tasks and optimizing Lambda cold starts via pre-loaded containers

ML Engineer Intern

VJDS International Inc.

Jun 2025 - Dec 2025

Edmonton, AL

• Automated grading for 1,000+ students by deploying an AWS Textract and Comprehend-powered pipeline with FastAPI
• Reduced inference latency by 82% via async architecture and serverless orchestration on AWS (Lambda, Step Functions, API Gateway, RDS)
• Scaled the system to 40+ live classrooms with 3 senior engineers, replacing manual grading with real-time automation

ML Engineer Intern

Roam X

TBD - TBD

Location TBD

• Awarded $11,000 to build a multimodal restaurant ranking system using BERT, ResNet-18, Faiss, and PyTorch; deployed with FastAPI, Docker, and EC2 Spot, serving 2,500+ users with sub-1s latency
• Finetuned the model with PyTorch, Faiss, and EC2 on user data, improving recommendation quality by 72%
• Deployed a SageMaker endpoint with Docker, ECR, and CloudWatch for low-latency inference with real-time monitoring

Applied ML Researcher (Co-op)

Western University

May 2025 - Sep 2025

London, ON

• Project 1: Awarded $10,000 to build a DistilBERT pipeline mapping key ideas from articles to 10+ peer-reviewed papers each
• Cut inference time by 55% by deploying on EC2 with Docker, and automated retraining using Sage Maker and MLflow
• Project 2: Built a multi-modal interruption detector using BERT, librosa, and MediaPipe; reached 91% F1 on 9,000+ samples
• Enabled real-time inference via ONNX quantization, reducing latency by 42% and memory usage by 60% in EC2 deployment

Data Engineer Intern (Co-op)

Juteq Inc

May 2025 - Sep 2025

Mississauga, ON

• Re-architected a production ML pipeline by containerizing 12 real-time microservices and decoupling them with Kafka, enabling 4× higher throughput and cutting end-to-end latency by 50%
• Reduced cold start time by 40% across 25+ Lambda functions and 15+ Step Functions using shared Layers, saving over $1,000/month in compute costs and accelerating downstream inference
• Integrated monitoring and alerting with CloudWatch and Grafana, enabling real-time debugging and reducing incident resolution time by 65% during high-traffic events

ML Software Engineer Intern

Enabled Canada (Vector Institute Affiliated)

Jan 2025 - May 2025

Toronto, ON

• Built a voice assistant for seniors, achieving 92% accuracy on 100K+ real-world queries using Whisper and DistilBERT
• Scaled GPU-backed async inference to 1,000+ req/sec with FastAPI, Redis, and EC2; load tested with 500+ concurrent users
• Boosted peak-time request success rate from 91% to 99.9% by architecting a real-time inference queue with Redis Streams, AWS SQS, and async FastAPI workers to absorb burst traffic and eliminate GPU overload

ML Software Engineer Intern (Co-op)

Dibbly Inc

Jan 2025 - May 2025

Oakville, ON

• Built a RAG chatbot to handle 3,000+ weekly support queries in real-time (95% accuracy, 1–5s latency), replacing a multi-day manual process for 5,000+ active users
• Fine-tuned Mistral-7B on SageMaker using PyTorch and CUDA, boosting alignment by 35% and cutting bad responses by 33%
• Built a RAG pipeline with Bedrock, Kendra, and Lambda-based fallback routing, enabling low-latency semantic search over 10,000+ docs and eliminating empty responses in 99.2% of queries