
SIH - BhashaMitra - AI Powered Language Agnostic Chatbot
A multilingual AI chatbot for Smart India Hackathon 2025 supporting 128+ languages with offline queuing and multi-platform integration.
Timeline
45 days (Hackathon)
Role
Full Stack Developer
Team
CTRL Freaks (5 members)
Status
CompletedTechnology Stack
Key Challenges
- Multilingual accuracy across 128+ languages
- Handling mixed language queries (Hindi in English script)
- Data privacy and GDPR compliance
- Regional dialect variations
- Model performance and latency optimization
- Offline query queuing mechanism
Key Learnings
- Pre-trained XLS-R and XLM-R multilingual models
- Speech-to-text with language detection
- Vector search for knowledge base matching
- Multi-platform chatbot integration
- AI4Bharat IndicASR models for Indian languages
- Kubernetes auto-scaling for high traffic
- End-to-end encryption for data security
Overview
BhashaMitra (meaning "Language Friend") is an AI-powered language-agnostic chatbot developed for Smart India Hackathon 2025 under the Smart Education theme. The project addresses the critical problem of language barriers in Indian educational institutions by enabling students to interact with campus systems in their preferred language.
The chatbot supports 128+ languages including all major Indian languages like Hindi, Gujarati, Telugu, Tamil, and more. What makes it truly unique is its ability to understand code-mixed queries like "fees kitni hai" (Hindi typed in English script) - a common pattern in Indian digital communication.
Problem Statement
Problem ID: 25104
Title: Language Agnostic Chatbot
Category: Software
Theme: Smart Education
Problems Addressed
- Language barriers preventing students from accessing campus information
- Students unable to get timely information in their preferred language
- Overload of repetitive queries slowing down administrative response times
- Existing chatbots lack multi-channel integration (WhatsApp, Telegram, Website)
- Information asymmetry between urban and rural student experiences
Key Features
๐ True Multilingual Support
- Supports 128+ languages using pre-trained XLM-R model
- Automatic language detection from text and voice
- Understands code-mixed queries (Hindi in English: "admission process kya hai")
- Regional dialect support using AI4Bharat's IndicASR models
- Response generation in user's preferred language
๐ค Voice & Text Input
- Text queries from website, WhatsApp, Telegram
- Voice message processing with XLS-R speech recognition
- Automatic speech-based language detection
- Converts voice to text and processes naturally
๐ถ Offline Query Queuing
- Stores queries when connection is poor
- Auto-processes them when connection returns
- No query loss even in low connectivity areas
- Perfect for rural campus environments
๐ Multi-Platform Integration
- Website: Embedded chat widget
- WhatsApp: WhatsApp Business API integration
- Telegram: Native bot integration
- Consistent experience across all platforms
๐ง Vector Search Knowledge Base
- Structured FAQ/knowledge base of campus information
- Vector similarity matching for relevant responses
- Includes circulars, rules, deadlines, procedures
- Easy updating by student volunteers or staff
โก High Scalability
- Kubernetes (K8s) for auto-scaling
- Apache Kafka for message queue handling
- AWS Elastic Beanstalk / GCP Cloud Run deployment
- Handles sudden traffic spikes during admission season
Technical Architecture
Tech Stack
Frontend:
- React (TypeScript)
- TailwindCSS
- WebSocket for real-time chat
Backend:
- FastAPI (Python)
- XLM-R (Multilingual NLP)
- XLS-R (Speech Recognition)
- Vector Database (FAISS)
- Apache Kafka (Message Queue)
AI Models:
- XLM-R: Text language detection & understanding
- XLS-R: Speech-to-text with language detection
- AI4Bharat IndicASR: Indian language specialization
Infrastructure:
- Kubernetes (Auto-scaling)
- AWS / GCP (Cloud hosting)
- Docker (Containerization)
Integrations:
- WhatsApp Business API
- Telegram Bot API
- REST API for websiteHow It Works
# Simplified flow
def process_query(query, input_type='text'):
# Step 1: Language Detection
if input_type == 'voice':
text, language = xls_r_model.transcribe(query)
else:
text = query
language = xlm_r_model.detect_language(text)
# Step 2: Handle Code-Mixed Queries
if is_code_mixed(text):
text = transliterate(text, target=language)
# Step 3: Vector Search for Best Match
knowledge_base = load_faq_database()
matches = vector_search(text, knowledge_base, top_k=3)
# Step 4: Generate Response
response = generate_answer(matches, language)
# Step 5: Translate Response if Needed
if user_language != language:
response = translate(response, target=user_language)
return responseInnovation & Uniqueness
๐ฎ๐ณ Make in India
- Built using XLS-R (Meta AI) and XLM-R (Facebook AI Research)
- Leverages AI4Bharat IndicASR - government-backed initiative
- Supports Digital India and national language technology mission
- Proudly Indian innovation for Indian problems
๐ฏ Code-Mixed Query Understanding
- First chatbot to truly understand "Hinglish" and similar patterns
- Handles queries like: "assignment submit karne ki last date kya hai?"
- Uses transliteration + hybrid language detection pipeline
- Real-world usage pattern recognition
๐ด Offline Capability
- Unique offline query queuing system
- Stores queries locally when connection drops
- Auto-syncs when back online
- Critical for rural and remote campuses
๐ Pan-India Scalability
- Can be deployed to 50,000+ colleges across India
- Minimal technical overhead for new institutions
- Easy to add new languages and dialects
- Self-service knowledge base updates
Use Cases
๐ Campus Information
- Admission procedures and deadlines
- Fee structure and payment methods
- Exam schedules and results
- Course registration process
- Scholarship information
๐ข Administrative Queries
- Library hours and book availability
- Hostel room allocation
- ID card and documents
- Leave applications
- Transport schedules
๐จ Emergency Alerts
- Weather warnings
- Campus safety notifications
- Exam cancellations
- Event updates
- Health advisories
๐ International Student Support
- Support for 128+ languages
- Cultural information
- Visa and documentation help
- Local area guidance
- Translation services
Impact & Benefits
For Students
- โ 24/7 instant answers in preferred language
- โ No language barrier for rural students
- โ Multi-platform access (WhatsApp, Web, Telegram)
- โ Voice support for accessibility
For Faculty
- โ 80% reduction in repetitive queries
- โ More time for meaningful student interaction
- โ Data-driven insights on common concerns
- โ Automated deadline reminders
For Institutions
- โ Increased enrollment from rural areas
- โ Cost savings on administrative staff
- โ Better student satisfaction scores
- โ Competitive advantage in education sector
For Society
- โ Digital inclusion of regional languages
- โ Cultural preservation through language
- โ Supports Digital India mission
- โ Reduced information asymmetry
Feasibility Analysis
โ Technical Feasibility
- Proven AI models: XLS-R & XLM-R are production-ready
- Government support: AI4Bharat provides Indian language foundation
- Established APIs: WhatsApp Business & Telegram APIs available
- Cloud infrastructure: AWS/GCP support auto-scaling
๐ฐ Economic Feasibility
- Massive market: 4.2 crore students across 50,000+ colleges
- Subscription model: โน10,000-50,000 per institution/year
- Low operational cost: Cloud-based auto-scaling
- High ROI: 80% reduction in support staff costs
๐ Market Demand
- Every Indian college faces language barrier issues
- Government push for regional language support
- International students need multilingual support
- Growing digital education infrastructure
Challenges & Solutions
Challenge 1: Multilingual Accuracy
Solution: Fine-tune XLS-R and XLM-R on Indian college datasets with region-specific vocabulary
Challenge 2: Mixed Language Queries
Solution: Implement transliteration with hybrid language detection pipeline using regex + AI
Challenge 3: Data Privacy
Solution: End-to-end encryption, GDPR-compliant storage, no conversation logging
Challenge 4: Regional Dialects
Solution: Leverage AI4Bharat's IndicASR models trained on regional data
Challenge 5: Performance & Latency
Solution: Model quantization, edge computing deployment, CDN for static assets
Challenge 6: High Traffic
Solution: Kubernetes auto-scaling, Apache Kafka message queues, LLM API rate limiting
Future Enhancements
- ๐ค AI-powered personalized responses based on student profile
- ๐ Analytics dashboard for institutions
- ๐จ Custom branding for each institution
- ๐ณ Payment integration for fee queries
- ๐ Course recommendation engine
- ๐ฑ Mobile app with offline mode
- ๐ฃ๏ธ Voice-only interface for visually impaired students
- ๐ Global expansion to multilingual universities worldwide
Awards & Recognition
- ๐ Smart India Hackathon 2025 - Finalist
- ๐ฏ Problem Statement: Language Agnostic Chatbot (ID: 25104)
- ๐ฅ Team: CTRL Freaks
- ๐ Theme: Smart Education
Research & References
- XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale (Meta AI, 2021)
- XLM-R: Unsupervised Cross-lingual Representation Learning at Scale (Facebook AI, 2019)
- AI4Bharat: IndicConformer - Open-Source ASR for 22 Indian Languages (IIT Madras, 2024)
- BHASHINI: National Language Technology Mission (Ministry of Electronics & IT, 2022)
Conclusion
BhashaMitra represents a significant step towards language-inclusive education in India. By breaking down language barriers, we're ensuring that every student - regardless of their linguistic background - has equal access to information and educational opportunities.
The project demonstrates how cutting-edge AI technology can be leveraged to solve real-world problems at scale. With support for 128+ languages, offline capability, and multi-platform integration, BhashaMitra is not just a chatbot - it's a bridge between languages, cultures, and opportunities.
Our vision: Every student in India should be able to interact with their educational institution in their mother tongue. With BhashaMitra, we're making this vision a reality.
"Breaking language barriers, building educational bridges" ๐๐ฃ๏ธ
