Akash Dubey
ProjectsWritingResumeCVContact
Akash Dubey icon

Akash Dubey

If you want to talk about research, robotics, startups, math, or a project worth building together, there are a few easy ways to reach me.

EmailLinkedInGitHubYouTubeKhan AcademyPortfolio
Email
akash.dubey@rutgers.edu
LinkedIn
Professional updates ↗
GitHub
Code, experiments, repositories ↗
Around The Site
ProjectsWritingResumeContact
# SLM Optimization & Post-Training (GSM8k)

URL: https://akashdubey.me/projects/llm-research
Date: Fri Aug 22 2025 00:00:00 GMT+0000 (Coordinated Universal Time)
Tags: AI, Research, Infrastructure

Summary: Fine-tuning and deploying small language models (SLMs) on the Rutgers High Performance Computing (Amarel) cluster for optimal math reasoning capabilities.

Sections: The Challenge: Making Small Models Smart | Infrastructure and Deployment | Results
Links: none listed
← Back to projects
AIResearchInfrastructure

SLM Optimization & Post-Training (GSM8k)

Fine-tuning and deploying small language models (SLMs) on the Rutgers High Performance Computing (Amarel) cluster for optimal math reasoning capabilities.

SLM Optimization & Post-Training (GSM8k)
Image

As part of my research into language model efficiency and capabilities, this project focuses on Small Language Models (SLMs) and their ability to perform complex math reasoning tasks, specifically evaluated on the GSM8k dataset. By adapting and fine-tuning minimal nano-scale Transformer architectures, I explored the trade-offs between model size, inference speed, and reasoning accuracy.

View on GitHub


The Challenge: Making Small Models Smart

While Massive LLMs like GPT-4 dominate benchmarks, they require immense computational resources. My research aimed to see how much reasoning capability could be packed into models with just a few billion parameters. Using the GSM8k (Grade School Math 8K) dataset, the goal was to apply Post-Training and Supervised Fine-Tuning (SFT) techniques to improve the model's zero-shot math solving abilities.


Infrastructure and Deployment

A significant portion of this project involved deploying the training and evaluation pipelines on the Rutgers Amarel High Performance Computing (HPC) cluster.

  • HPC Execution: Wrote SLURM batch scripts to distribute training jobs across multiple nodes.
  • Optimization: Implemented quantization and structural pruning techniques to maximize inference throughput on constrained compute environments.
  • Architecture: Adapted from Andrej Karpathy's nanochat framework, modifying the tokenization and training loops to support custom math-focused datasets.

Results

By carefully managing the learning rate schedule and dataset mixture during mid-training and SFT, the models showed measurable improvements on the GSM8k benchmark compared to the base pre-trained checkpoints. The project demonstrated that targeted post-training can significantly elevate the domain-specific reasoning capabilities of highly constrained models.