## Background
Basically what the name says. My first IT job, right out of driving forklifts in 2008, was for a company that transcribed speech from doctors. With today's AI and ASR advancements that job and business model is all but obsolete with the way it worked then. AI today is just plain incredible for most of these tasks now. In the old spirit of that job, I wanted to write something that lets me to speak into a microphone and auto-write to markdown for Obsidian.
## Implementation
1. Uses a modern python, `>=3.12`
2. Uses `uv` for python tooling
3. Leverages Ollama for local Whisper model inference
4. Integrates with OpenWeatherMap for contextual weather data
5. Generates structured Obsidian markdown notes
## Core Features
### Voice Recording
The application uses PyAudio to capture high-quality audio input with configurable settings:
- Adjustable sample rate and channels
- Configurable chunk size for optimal performance
- Maximum duration limits to prevent runaway recordings
- Automatic cleanup of old recordings based on retention policy
### Transcription
We leverage Ollama's local Whisper model implementation for transcription:
- Runs completely locally, ensuring privacy
- Supports multiple Whisper model sizes (tiny to large)
- Configurable language and temperature settings
- Initial prompt support for better context
### Text Enhancement
The transcribed text goes through several enhancement steps:
- Mood detection
- Topic identification
- Grammar correction
- Filler word removal
- Date formatting
### Weather Integration
Each diary entry includes contextual weather data:
- Current temperature
- "Feels like" temperature
- Weather description
- Humidity and wind speed
- Location information
- Timestamp of weather data
### Obsidian Integration
The generated notes are structured for optimal Obsidian usage:
- YAML frontmatter for metadata
- Automatic file organization in specified vault
- Support for audio file attachments
- Multiple entries per day with clean separation
- Related entries linking
## Technical Details
### Project Structure
```
asr_to_obsidian/
├── audio/
│ └── recorder.py # Audio recording functionality
├── transcription/
│ └── whisper.py # Whisper model integration
├── enhancement/
│ └── processor.py # Text enhancement pipeline
├── weather/
│ └── weather.py # OpenWeatherMap integration
├── obsidian/
│ └── note_generator.py # Obsidian note creation
└── utils/
└── helpers.py # Utility functions
```
### Configuration
The application is highly configurable through `config.yaml`:
```yaml
# Audio settings
audio:
sample_rate: 16000
channels: 1
chunk_size: 1024
max_duration: 300
# Transcription settings
transcription:
model: "whisper"
language: "en"
temperature: 0.0
# Weather settings
weather:
city: "Your City"
api_key: "your_openweathermap_api_key"
# Privacy settings
privacy:
delete_audio_after_processing: true
retention_days: 7
```
### Note Format
Generated notes follow this structure:
```markdown
---
date: 2024-03-20
time: 15:30
duration: 180
word_count: 200
mood: Productive
weather: 75
topics:
- Work
- Project updates
---
Transcribed content...
![[audio_file.wav]]
```
## Future Improvements
1. **Enhanced Text Processing**
- Sentiment analysis
- Entity recognition
- Automatic tagging
2. **Additional Weather Features**
- Historical weather data
- Weather trends
- Location-based suggestions
3. **Obsidian Integration**
- Graph view integration
- Custom templates
- Tag management
4. **Privacy Enhancements**
- End-to-end encryption
- Local weather data
- Secure API key management
## Getting Started
1. Install dependencies:
```bash
uv venv
uv pip install -r requirements.txt
```
2. Set up configuration:
```bash
cp config.yaml.example config.yaml
# Edit config.yaml with your settings
```
3. Set up environment variables:
```bash
echo "OPENWEATHERMAP_API_KEY=your_key_here" >> .env
```
4. Start Ollama:
```bash
ollama serve
```
5. Pull Whisper model:
```bash
ollama pull whisper
```
6. Run the application:
```bash
uv run python main.py record
```
## Conclusion
This project demonstrates how modern AI tools can be combined to create powerful, privacy-focused applications. By leveraging local models and open APIs, we've created a system that can transform spoken thoughts into well-structured, contextual notes in Obsidian. The modular design allows for easy extension and customization, making it a solid foundation for future enhancements.
The code is available on GitHub at [asr-to-obsidian-markdown](https://github.com/vaporeyes/asr-to-obsidian-markdown)