• Sold a fully automated RAG pipeline for mid–five figures after scaling to $2.8K monthly net profit and 80K+ monthly users
• Decreased end-to-end latency by 87% using an async pipeline with OpenAI, Lambda, SQS, and Playwright on Fargate
• Reduced costs by 57% by offloading I/O-heavy scraping tasks and optimizing Lambda cold starts via pre-loaded containers
ML Engineer Intern
VJDS International Inc.
Jun 2025 - Dec 2025
Edmonton, AL
• Automated grading for 1,000+ students by deploying an AWS Textract and Comprehend-powered pipeline with FastAPI
• Reduced inference latency by 82% via async architecture and serverless orchestration on AWS (Lambda, Step Functions, API Gateway, RDS)
• Scaled the system to 40+ live classrooms with 3 senior engineers, replacing manual grading with real-time automation
ML Engineer Intern
Roam X
TBD - TBD
Location TBD
• Awarded $11,000 to build a multimodal restaurant ranking system using BERT, ResNet-18, Faiss, and PyTorch; deployed with FastAPI, Docker, and EC2 Spot, serving 2,500+ users with sub-1s latency
• Finetuned the model with PyTorch, Faiss, and EC2 on user data, improving recommendation quality by 72%
• Deployed a SageMaker endpoint with Docker, ECR, and CloudWatch for low-latency inference with real-time monitoring
Applied ML Researcher (Co-op)
Western University
May 2025 - Sep 2025
London, ON
• Project 1: Awarded $10,000 to build a DistilBERT pipeline mapping key ideas from articles to 10+ peer-reviewed papers each
• Cut inference time by 55% by deploying on EC2 with Docker, and automated retraining using Sage Maker and MLflow
• Project 2: Built a multi-modal interruption detector using BERT, librosa, and MediaPipe; reached 91% F1 on 9,000+ samples
• Enabled real-time inference via ONNX quantization, reducing latency by 42% and memory usage by 60% in EC2 deployment
Data Engineer Intern (Co-op)
Juteq Inc
May 2025 - Sep 2025
Mississauga, ON
• Re-architected a production ML pipeline by containerizing 12 real-time microservices and decoupling them with Kafka, enabling 4× higher throughput and cutting end-to-end latency by 50%
• Reduced cold start time by 40% across 25+ Lambda functions and 15+ Step Functions using shared Layers, saving over $1,000/month in compute costs and accelerating downstream inference
• Integrated monitoring and alerting with CloudWatch and Grafana, enabling real-time debugging and reducing incident resolution time by 65% during high-traffic events
ML Software Engineer Intern
Enabled Canada (Vector Institute Affiliated)
Jan 2025 - May 2025
Toronto, ON
• Built a voice assistant for seniors, achieving 92% accuracy on 100K+ real-world queries using Whisper and DistilBERT
• Scaled GPU-backed async inference to 1,000+ req/sec with FastAPI, Redis, and EC2; load tested with 500+ concurrent users
• Boosted peak-time request success rate from 91% to 99.9% by architecting a real-time inference queue with Redis Streams, AWS SQS, and async FastAPI workers to absorb burst traffic and eliminate GPU overload
ML Software Engineer Intern (Co-op)
Dibbly Inc
Jan 2025 - May 2025
Oakville, ON
• Built a RAG chatbot to handle 3,000+ weekly support queries in real-time (95% accuracy, 1–5s latency), replacing a multi-day manual process for 5,000+ active users
• Fine-tuned Mistral-7B on SageMaker using PyTorch and CUDA, boosting alignment by 35% and cutting bad responses by 33%
• Built a RAG pipeline with Bedrock, Kendra, and Lambda-based fallback routing, enabling low-latency semantic search over 10,000+ docs and eliminating empty responses in 99.2% of queries