Go to file
Adolfo Delorenzo dc3e67e17b Add multi-speaker support for group conversations
Features:
- Speaker management system with unique IDs and colors
- Visual speaker selection with avatars and color coding
- Automatic language detection per speaker
- Real-time translation for all speakers' languages
- Conversation history with speaker attribution
- Export conversation as text file
- Persistent speaker data in localStorage

UI Components:
- Speaker toolbar with add/remove controls
- Active speaker indicators
- Conversation view with color-coded messages
- Settings toggle for multi-speaker mode
- Mobile-responsive speaker buttons

Technical Implementation:
- SpeakerManager class handles all speaker operations
- Automatic translation to all active languages
- Conversation entries with timestamps
- Translation caching per language
- Clean separation of original vs translated text
- Support for up to 8 concurrent speakers

User Experience:
- Click to switch active speaker
- Visual feedback for active speaker
- Conversation flows naturally with colors
- Export feature for meeting minutes
- Clear conversation history option
- Seamless single/multi speaker mode switching

This enables group conversations where each participant can speak
in their native language and see translations in real-time.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-02 23:39:15 -06:00
static Add multi-speaker support for group conversations 2025-06-02 23:39:15 -06:00
templates Add multi-speaker support for group conversations 2025-06-02 23:39:15 -06:00
venv quasi-final 2025-04-05 11:50:31 -06:00
.gitignore Major improvements: TypeScript, animations, notifications, compression, GPU optimization 2025-06-02 21:18:16 -06:00
app.py Fix temporary file accumulation to prevent disk space exhaustion 2025-06-02 23:27:59 -06:00
GPU_SUPPORT.md Major improvements: TypeScript, animations, notifications, compression, GPU optimization 2025-06-02 21:18:16 -06:00
health-monitor.py Add health check endpoints and automatic language detection 2025-06-02 22:37:38 -06:00
maintenance.sh Fix temporary file accumulation to prevent disk space exhaustion 2025-06-02 23:27:59 -06:00
package-lock.json Major improvements: TypeScript, animations, notifications, compression, GPU optimization 2025-06-02 21:18:16 -06:00
package.json Major improvements: TypeScript, animations, notifications, compression, GPU optimization 2025-06-02 21:18:16 -06:00
README_TYPESCRIPT.md Major improvements: TypeScript, animations, notifications, compression, GPU optimization 2025-06-02 21:18:16 -06:00
README.md quasi-final 2025-04-05 11:50:31 -06:00
requirements.txt Major improvements: TypeScript, animations, notifications, compression, GPU optimization 2025-06-02 21:18:16 -06:00
setup-script.sh quasi-final 2025-04-05 11:50:31 -06:00
tsconfig.json Major improvements: TypeScript, animations, notifications, compression, GPU optimization 2025-06-02 21:18:16 -06:00
tts_test_output.mp3 quasi-final 2025-04-05 11:50:31 -06:00
tts-debug-script.py quasi-final 2025-04-05 11:50:31 -06:00
validators.py Add comprehensive input validation and sanitization 2025-06-02 22:58:17 -06:00
whisper_config.py Major improvements: TypeScript, animations, notifications, compression, GPU optimization 2025-06-02 21:18:16 -06:00

Voice Language Translator

A mobile-friendly web application that translates spoken language between multiple languages using:

  • Gemma 3 open-source LLM via Ollama for translation
  • OpenAI Whisper for speech-to-text
  • OpenAI Edge TTS for text-to-speech

Supported Languages

  • Arabic
  • Armenian
  • Azerbaijani
  • English
  • French
  • Georgian
  • Kazakh
  • Mandarin
  • Farsi
  • Portuguese
  • Russian
  • Spanish
  • Turkish
  • Uzbek

Setup Instructions

  1. Install the required Python packages:

    pip install -r requirements.txt
    
  2. Make sure you have Ollama installed and the Gemma 3 model loaded:

    ollama pull gemma3
    
  3. Ensure your OpenAI Edge TTS server is running on port 5050.

  4. Run the application:

    python app.py
    
  5. Open your browser and navigate to:

    http://localhost:8000
    

Usage

  1. Select your source language from the dropdown menu
  2. Press the microphone button and speak
  3. Press the button again to stop recording
  4. Wait for the transcription to complete
  5. Select your target language
  6. Press the "Translate" button
  7. Use the play buttons to hear the original or translated text

Technical Details

  • The app uses Flask for the web server
  • Audio is processed client-side using the MediaRecorder API
  • Whisper for speech recognition with language hints
  • Ollama provides access to the Gemma 3 model for translation
  • OpenAI Edge TTS delivers natural-sounding speech output

Mobile Support

The interface is fully responsive and designed to work well on mobile devices.