clicky
Open source macOS AI assistant that captures screen content and interacts via voice using Anthropic, AssemblyAI, and ElevenLabs APIs.
About clicky
Clicky is an open-source AI companion designed to act as a real-time teacher or assistant that lives directly on your desktop. It bridges the gap between static LLMs and interactive screen-aware assistance by combining vision, voice, and screen capture into a unified buddy that resides next to your cursor.
The Non-Technical Lens
Imagine having a private tutor sitting right next to you, looking over your shoulder as you work or study. Instead of you having to explain what is on your screen, Clicky already sees it. You can talk to it naturally, and it can point to specific buttons, lines of code, or images to guide you. It transforms the computer from a passive tool into an active, collaborative partner that helps you learn by doing, rather than just telling.
The Technical Lens
Architecture: A native macOS application built with Swift, utilizing ScreenCaptureKit for high-performance screen recording and Accessibility APIs for global interaction.
Backend Logic: It employs a Cloudflare Worker as a secure proxy. This architecture ensures that sensitive API keys (Anthropic, AssemblyAI, ElevenLabs) are never shipped within the app binary, protecting the developer's credentials.
Multimodal Integration: The system integrates Anthropic (Claude) for vision and reasoning, AssemblyAI for real-time speech-to-text (STT), and ElevenLabs for high-quality text-to-speech (TTS).
License & Requirements: Released under the MIT License. It requires macOS 14.2+, Xcode 15+, and API keys for the aforementioned services.
Why It Matters
Clicky represents a significant move toward transparent, open-source AI agents. While many companies are building proprietary "screen-aware" assistants, Clicky provides the blueprint for how these systems work under the hood. By utilizing a proxy-based architecture for API management, it offers a scalable way for developers to build powerful AI tools without compromising security or performance. It shifts the focus from simple chatbots to context-aware spatial agents.
Explore the repository here: https://github.com/farzaa/clicky