Duration: Jul 2023 – Aug 2025
Cut LLM cloud hosting costs by 92% by redesigning AI Agents to run on 8B models instead of its 235B counterpart (96% reduction in parameters).
Extended Qwen3-8B model context limits by 7X using a Hybrid Chain-of-Agents RAG architecture.
Implemented Deep Search with Qwen/Mistral for generating web-search reports of over 180k tokens.
Making AI Agent Orchestration using Chain-of-Thought system.
Built and deployed RAG-based LLMs, agentic networks, and multi-agent AI with open-source and proprietary models.
Designed LLM-driven automation generating 20K+ data rows and training 18+ ML models.
Achieved 92% accuracy in PII masking by fine-tuning BERT model for client-specific data.
Technologies & Tools Used: Python, SQL, LangChain, LlamaIndex, Hugging Face Transformers, OpenAI API, Qdrant, FAISS, Pandas, NumPy, Docker, RunPOD, Git, REST APIs, Prompt Engineering, RAG Architectures
Skills Learned: LLM Hosting on a local Data Center and Cloud, Team Collaboration, Training team of New Recruits in Basics of AI