## Background Basically what the name says. My first IT job, right out of driving forklifts in 2008, was for a company that transcribed speech from doctors. With today's AI and ASR advancements that job and business model is all but obsolete with the way it worked then. AI today is just plain incredible for most of these tasks now. In the old spirit of that job, I wanted to write something that lets me to speak into a microphone and auto-write to markdown for Obsidian. ## Implementation 1. Uses a modern python, `>=3.12` 2. Uses `uv` for python tooling 3. Leverages Ollama for local Whisper model inference 4. Integrates with OpenWeatherMap for contextual weather data 5. Generates structured Obsidian markdown notes ## Core Features ### Voice Recording The application uses PyAudio to capture high-quality audio input with configurable settings: - Adjustable sample rate and channels - Configurable chunk size for optimal performance - Maximum duration limits to prevent runaway recordings - Automatic cleanup of old recordings based on retention policy ### Transcription We leverage Ollama's local Whisper model implementation for transcription: - Runs completely locally, ensuring privacy - Supports multiple Whisper model sizes (tiny to large) - Configurable language and temperature settings - Initial prompt support for better context ### Text Enhancement The transcribed text goes through several enhancement steps: - Mood detection - Topic identification - Grammar correction - Filler word removal - Date formatting ### Weather Integration Each diary entry includes contextual weather data: - Current temperature - "Feels like" temperature - Weather description - Humidity and wind speed - Location information - Timestamp of weather data ### Obsidian Integration The generated notes are structured for optimal Obsidian usage: - YAML frontmatter for metadata - Automatic file organization in specified vault - Support for audio file attachments - Multiple entries per day with clean separation - Related entries linking ## Technical Details ### Project Structure ``` asr_to_obsidian/ ├── audio/ │ └── recorder.py # Audio recording functionality ├── transcription/ │ └── whisper.py # Whisper model integration ├── enhancement/ │ └── processor.py # Text enhancement pipeline ├── weather/ │ └── weather.py # OpenWeatherMap integration ├── obsidian/ │ └── note_generator.py # Obsidian note creation └── utils/ └── helpers.py # Utility functions ``` ### Configuration The application is highly configurable through `config.yaml`: ```yaml # Audio settings audio: sample_rate: 16000 channels: 1 chunk_size: 1024 max_duration: 300 # Transcription settings transcription: model: "whisper" language: "en" temperature: 0.0 # Weather settings weather: city: "Your City" api_key: "your_openweathermap_api_key" # Privacy settings privacy: delete_audio_after_processing: true retention_days: 7 ``` ### Note Format Generated notes follow this structure: ```markdown --- date: 2024-03-20 time: 15:30 duration: 180 word_count: 200 mood: Productive weather: 75 topics: - Work - Project updates --- Transcribed content... ![[audio_file.wav]] ``` ## Future Improvements 1. **Enhanced Text Processing** - Sentiment analysis - Entity recognition - Automatic tagging 2. **Additional Weather Features** - Historical weather data - Weather trends - Location-based suggestions 3. **Obsidian Integration** - Graph view integration - Custom templates - Tag management 4. **Privacy Enhancements** - End-to-end encryption - Local weather data - Secure API key management ## Getting Started 1. Install dependencies: ```bash uv venv uv pip install -r requirements.txt ``` 2. Set up configuration: ```bash cp config.yaml.example config.yaml # Edit config.yaml with your settings ``` 3. Set up environment variables: ```bash echo "OPENWEATHERMAP_API_KEY=your_key_here" >> .env ``` 4. Start Ollama: ```bash ollama serve ``` 5. Pull Whisper model: ```bash ollama pull whisper ``` 6. Run the application: ```bash uv run python main.py record ``` ## Conclusion This project demonstrates how modern AI tools can be combined to create powerful, privacy-focused applications. By leveraging local models and open APIs, we've created a system that can transform spoken thoughts into well-structured, contextual notes in Obsidian. The modular design allows for easy extension and customization, making it a solid foundation for future enhancements. The code is available on GitHub at [asr-to-obsidian-markdown](https://github.com/vaporeyes/asr-to-obsidian-markdown)