Spotlight Search

Find pages, shortcuts, and tools

Back to Projects
SIH - BhashaMitra - AI Powered Language Agnostic Chatbot
CompletedPythonFastAPIReact+8 more

SIH - BhashaMitra - AI Powered Language Agnostic Chatbot

A multilingual AI chatbot for Smart India Hackathon 2025 supporting 128+ languages with offline queuing and multi-platform integration.

Timeline

45 days (Hackathon)

Role

Full Stack Developer

Team

CTRL Freaks (5 members)

Status
Completed

Technology Stack

Python
FastAPI
React
TypeScript
XLS-R
XLM-R
WhatsApp Business API
Telegram Bot API
Kubernetes
AWS
Apache Kafka

Key Challenges

  • Multilingual accuracy across 128+ languages
  • Handling mixed language queries (Hindi in English script)
  • Data privacy and GDPR compliance
  • Regional dialect variations
  • Model performance and latency optimization
  • Offline query queuing mechanism

Key Learnings

  • Pre-trained XLS-R and XLM-R multilingual models
  • Speech-to-text with language detection
  • Vector search for knowledge base matching
  • Multi-platform chatbot integration
  • AI4Bharat IndicASR models for Indian languages
  • Kubernetes auto-scaling for high traffic
  • End-to-end encryption for data security

Overview

BhashaMitra (meaning "Language Friend") is an AI-powered language-agnostic chatbot developed for Smart India Hackathon 2025 under the Smart Education theme. The project addresses the critical problem of language barriers in Indian educational institutions by enabling students to interact with campus systems in their preferred language.

The chatbot supports 128+ languages including all major Indian languages like Hindi, Gujarati, Telugu, Tamil, and more. What makes it truly unique is its ability to understand code-mixed queries like "fees kitni hai" (Hindi typed in English script) - a common pattern in Indian digital communication.

Problem Statement

Problem ID: 25104
Title: Language Agnostic Chatbot
Category: Software
Theme: Smart Education

Problems Addressed

  • Language barriers preventing students from accessing campus information
  • Students unable to get timely information in their preferred language
  • Overload of repetitive queries slowing down administrative response times
  • Existing chatbots lack multi-channel integration (WhatsApp, Telegram, Website)
  • Information asymmetry between urban and rural student experiences

Key Features

๐ŸŒ True Multilingual Support

  • Supports 128+ languages using pre-trained XLM-R model
  • Automatic language detection from text and voice
  • Understands code-mixed queries (Hindi in English: "admission process kya hai")
  • Regional dialect support using AI4Bharat's IndicASR models
  • Response generation in user's preferred language

๐ŸŽค Voice & Text Input

  • Text queries from website, WhatsApp, Telegram
  • Voice message processing with XLS-R speech recognition
  • Automatic speech-based language detection
  • Converts voice to text and processes naturally

๐Ÿ“ถ Offline Query Queuing

  • Stores queries when connection is poor
  • Auto-processes them when connection returns
  • No query loss even in low connectivity areas
  • Perfect for rural campus environments

๐Ÿ”— Multi-Platform Integration

  • Website: Embedded chat widget
  • WhatsApp: WhatsApp Business API integration
  • Telegram: Native bot integration
  • Consistent experience across all platforms

๐Ÿง  Vector Search Knowledge Base

  • Structured FAQ/knowledge base of campus information
  • Vector similarity matching for relevant responses
  • Includes circulars, rules, deadlines, procedures
  • Easy updating by student volunteers or staff

โšก High Scalability

  • Kubernetes (K8s) for auto-scaling
  • Apache Kafka for message queue handling
  • AWS Elastic Beanstalk / GCP Cloud Run deployment
  • Handles sudden traffic spikes during admission season

Technical Architecture

Tech Stack

Frontend:
  - React (TypeScript)
  - TailwindCSS
  - WebSocket for real-time chat

Backend:
  - FastAPI (Python)
  - XLM-R (Multilingual NLP)
  - XLS-R (Speech Recognition)
  - Vector Database (FAISS)
  - Apache Kafka (Message Queue)

AI Models:
  - XLM-R: Text language detection & understanding
  - XLS-R: Speech-to-text with language detection
  - AI4Bharat IndicASR: Indian language specialization

Infrastructure:
  - Kubernetes (Auto-scaling)
  - AWS / GCP (Cloud hosting)
  - Docker (Containerization)

Integrations:
  - WhatsApp Business API
  - Telegram Bot API
  - REST API for website

How It Works

# Simplified flow
def process_query(query, input_type='text'):
    # Step 1: Language Detection
    if input_type == 'voice':
        text, language = xls_r_model.transcribe(query)
    else:
        text = query
        language = xlm_r_model.detect_language(text)

    # Step 2: Handle Code-Mixed Queries
    if is_code_mixed(text):
        text = transliterate(text, target=language)

    # Step 3: Vector Search for Best Match
    knowledge_base = load_faq_database()
    matches = vector_search(text, knowledge_base, top_k=3)

    # Step 4: Generate Response
    response = generate_answer(matches, language)

    # Step 5: Translate Response if Needed
    if user_language != language:
        response = translate(response, target=user_language)

    return response

Innovation & Uniqueness

๐Ÿ‡ฎ๐Ÿ‡ณ Make in India

  • Built using XLS-R (Meta AI) and XLM-R (Facebook AI Research)
  • Leverages AI4Bharat IndicASR - government-backed initiative
  • Supports Digital India and national language technology mission
  • Proudly Indian innovation for Indian problems

๐ŸŽฏ Code-Mixed Query Understanding

  • First chatbot to truly understand "Hinglish" and similar patterns
  • Handles queries like: "assignment submit karne ki last date kya hai?"
  • Uses transliteration + hybrid language detection pipeline
  • Real-world usage pattern recognition

๐Ÿ“ด Offline Capability

  • Unique offline query queuing system
  • Stores queries locally when connection drops
  • Auto-syncs when back online
  • Critical for rural and remote campuses

๐Ÿš€ Pan-India Scalability

  • Can be deployed to 50,000+ colleges across India
  • Minimal technical overhead for new institutions
  • Easy to add new languages and dialects
  • Self-service knowledge base updates

Use Cases

๐Ÿ“š Campus Information

  • Admission procedures and deadlines
  • Fee structure and payment methods
  • Exam schedules and results
  • Course registration process
  • Scholarship information

๐Ÿข Administrative Queries

  • Library hours and book availability
  • Hostel room allocation
  • ID card and documents
  • Leave applications
  • Transport schedules

๐Ÿšจ Emergency Alerts

  • Weather warnings
  • Campus safety notifications
  • Exam cancellations
  • Event updates
  • Health advisories

๐ŸŒ International Student Support

  • Support for 128+ languages
  • Cultural information
  • Visa and documentation help
  • Local area guidance
  • Translation services

Impact & Benefits

For Students

  • โœ… 24/7 instant answers in preferred language
  • โœ… No language barrier for rural students
  • โœ… Multi-platform access (WhatsApp, Web, Telegram)
  • โœ… Voice support for accessibility

For Faculty

  • โœ… 80% reduction in repetitive queries
  • โœ… More time for meaningful student interaction
  • โœ… Data-driven insights on common concerns
  • โœ… Automated deadline reminders

For Institutions

  • โœ… Increased enrollment from rural areas
  • โœ… Cost savings on administrative staff
  • โœ… Better student satisfaction scores
  • โœ… Competitive advantage in education sector

For Society

  • โœ… Digital inclusion of regional languages
  • โœ… Cultural preservation through language
  • โœ… Supports Digital India mission
  • โœ… Reduced information asymmetry

Feasibility Analysis

โœ… Technical Feasibility

  • Proven AI models: XLS-R & XLM-R are production-ready
  • Government support: AI4Bharat provides Indian language foundation
  • Established APIs: WhatsApp Business & Telegram APIs available
  • Cloud infrastructure: AWS/GCP support auto-scaling

๐Ÿ’ฐ Economic Feasibility

  • Massive market: 4.2 crore students across 50,000+ colleges
  • Subscription model: โ‚น10,000-50,000 per institution/year
  • Low operational cost: Cloud-based auto-scaling
  • High ROI: 80% reduction in support staff costs

๐ŸŽ“ Market Demand

  • Every Indian college faces language barrier issues
  • Government push for regional language support
  • International students need multilingual support
  • Growing digital education infrastructure

Challenges & Solutions

Challenge 1: Multilingual Accuracy

Solution: Fine-tune XLS-R and XLM-R on Indian college datasets with region-specific vocabulary

Challenge 2: Mixed Language Queries

Solution: Implement transliteration with hybrid language detection pipeline using regex + AI

Challenge 3: Data Privacy

Solution: End-to-end encryption, GDPR-compliant storage, no conversation logging

Challenge 4: Regional Dialects

Solution: Leverage AI4Bharat's IndicASR models trained on regional data

Challenge 5: Performance & Latency

Solution: Model quantization, edge computing deployment, CDN for static assets

Challenge 6: High Traffic

Solution: Kubernetes auto-scaling, Apache Kafka message queues, LLM API rate limiting

Future Enhancements

  • ๐Ÿค– AI-powered personalized responses based on student profile
  • ๐Ÿ“Š Analytics dashboard for institutions
  • ๐ŸŽจ Custom branding for each institution
  • ๐Ÿ’ณ Payment integration for fee queries
  • ๐ŸŽ“ Course recommendation engine
  • ๐Ÿ“ฑ Mobile app with offline mode
  • ๐Ÿ—ฃ๏ธ Voice-only interface for visually impaired students
  • ๐ŸŒ Global expansion to multilingual universities worldwide

Awards & Recognition

  • ๐Ÿ† Smart India Hackathon 2025 - Finalist
  • ๐ŸŽฏ Problem Statement: Language Agnostic Chatbot (ID: 25104)
  • ๐Ÿ‘ฅ Team: CTRL Freaks
  • ๐Ÿ… Theme: Smart Education

Research & References

  • XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale (Meta AI, 2021)
  • XLM-R: Unsupervised Cross-lingual Representation Learning at Scale (Facebook AI, 2019)
  • AI4Bharat: IndicConformer - Open-Source ASR for 22 Indian Languages (IIT Madras, 2024)
  • BHASHINI: National Language Technology Mission (Ministry of Electronics & IT, 2022)

Conclusion

BhashaMitra represents a significant step towards language-inclusive education in India. By breaking down language barriers, we're ensuring that every student - regardless of their linguistic background - has equal access to information and educational opportunities.

The project demonstrates how cutting-edge AI technology can be leveraged to solve real-world problems at scale. With support for 128+ languages, offline capability, and multi-platform integration, BhashaMitra is not just a chatbot - it's a bridge between languages, cultures, and opportunities.

Our vision: Every student in India should be able to interact with their educational institution in their mother tongue. With BhashaMitra, we're making this vision a reality.


"Breaking language barriers, building educational bridges" ๐ŸŒ‰๐Ÿ—ฃ๏ธ

Design & Developed by Mitang Hindocha
ยฉ 2026. All rights reserved.