- Published on
Introducing HumeChat: Voice AI with Emotional Intelligence
- Authors
- Name
- Rob@BitWise
- @BitWise_0x
Although voice-based AI interfaces have grown considerably in both capability and adoption, the predominant approach continues to treat speech as a transcription problem. The semantic content of what is said receives all the attention, while the rich emotional signals embedded in tone, rhythm, hesitation, and emphasis are largely discarded. This is a significant limitation: in human conversation, how something is said often carries as much meaning as the words themselves.

The question that motivated this project was straightforward: what becomes possible when an AI system can detect and respond to a speaker's emotional state in real time, rather than simply processing their words? This post introduces HumeChat, a voice AI platform I built to explore that question. It encompasses three distinct products (Archetype, Shaman, and Wooza), each applying emotionally-aware voice interaction to a different context. In the sections that follow, I walk through the foundational emotion detection capability and the design thinking behind each product.
Why I Built HumeChat
The impetus for HumeChat came from a dissatisfaction with the state of voice AI conversations. Even the most capable voice assistants operate on a fundamentally incomplete picture of the interaction: they capture the transcript but lose the affect. The frustration in someone's voice, the excitement behind a question, the hesitation that signals uncertainty: none of this informs the AI's response. I wanted to build something where emotion functions as a first-class input, shaping the conversation in real time rather than being discarded at the point of transcription.
Hume AI's Expressive Voice Interface technology provided the right foundation for this. Their prosody analysis is capable of detecting 48 distinct emotions from voice alone, analyzing tone, pitch, rhythm, and pauses, and it runs continuously throughout a conversation rather than producing a single summary score after the fact. HumeChat is the platform I built on top of that capability, exploring three different applications of emotionally-aware voice interaction: Archetype, which pairs Jungian depth psychology with real-time emotional awareness; Shaman, which offers guided voice journeys rooted in shamanic tradition and intention-setting; and Wooza, a children's voice companion with educational conversations and voice-powered games.
Before examining each product in detail, it is worth understanding the foundational capability that underpins all three: real-time emotion detection.
Real-Time Emotion Detection
The distinguishing characteristic of HumeChat's approach lies in treating emotion detection not as an afterthought or summary metric, but as a continuous process that operates throughout every voice session. As a user speaks, Hume's prosody analysis tracks emotional shifts as they happen, producing a granular breakdown across 48 categories. These categories span a wide range of human affect, from Calmness, Contemplation, and Determination to Realization, Empathic Pain, and Awkwardness, each assigned a precise percentage and organized into six broader groupings: Positive, Social, Cognitive, Calm, Negative, and Distress.
Every message in the conversation transcript receives its own emotion breakdown. The dashboard surfaces the top detected emotions inline alongside each message, with the option to expand and view all 48 categories in detail. This data is not merely informational. It feeds directly into session analysis, persona adaptation, and the clinical work within Archetype, making emotional awareness a persistent thread that runs through the entire platform.
With this emotional awareness operating as a shared foundation, let us examine how each of the three products employs it in practice, beginning with Archetype.
Archetype
Archetype is the product I am most personally invested in. The premise that drew me to it was the convergence of two capabilities that had not been meaningfully combined before: depth psychology as a conversational framework, and real-time access to the speaker's emotional state. I wanted to explore what a Carl Jung persona could do when it can perceive not only what someone says but how they feel as they say it.
For those less familiar with Jung's framework, a brief orientation may be useful. Archetypes, in the Jungian sense, are universal psychological patterns, recurring motifs of thought and behavior that shape how individuals engage with the world. The shadow refers to the repressed or unacknowledged aspects of one's personality, while the anima and animus represent the contrasexual inner figures that influence how we relate to others. Individuation is Jung's term for the lifelong process of integrating these elements into a more complete and self-aware whole. Archetype translates these concepts into a structured self-discovery experience: archetype profiling, shadow integration, anima and animus exploration, and a four-stage individuation journey, all conducted through emotionally-aware voice conversation.
How It Works
The experience begins with a 12-question assessment, which generates a personalized dashboard reflecting the user's archetypal profile. From there, users progress through a structured individuation journey, engaging in voice sessions with a Carl Jung persona that has access to their emotional state in real time. Following each session, an AI clinical analysis is performed, and progress is tracked over time to surface patterns of growth and areas requiring further exploration.
Your Archetype Profile
The entry point is a 12-question assessment based on Carol Pearson's Pearson-Marr Archetype Indicator (PMAI), a well-established framework for identifying dominant psychological patterns. The assessment identifies dominant, secondary, tertiary, and shadow archetypes from a taxonomy of twelve types. Pearson's framework organizes these archetypes into three groups, each corresponding to a different dimension of psychological development:
- Ego archetypes: Innocent, Orphan, Hero, Caregiver
- Soul archetypes: Explorer, Destroyer, Lover, Creator
- Self archetypes: Ruler, Magician, Sage, Jester
The result is a full archetypal blend with percentage weightings for each archetype, along with an identification of the shadow archetype, the pattern most actively repressed or avoided.
Voice Sessions & Clinical Analysis
Voice sessions with Carl Jung are personalized to the user's archetype profile and current individuation stage. Real-time emotion detection runs throughout, allowing Carl to perceive and adapt to the speaker's emotional state as the conversation unfolds, a dimension of awareness that traditional text-based interactions cannot provide.
After each session, Claude Opus 4 performs a clinical analysis grounded in Jungian methodology. Each assessment is tagged with therapeutic indicators (such as "Therapeutically Substantive," "Shadow Present," and "Integration Movement") and measures affective engagement across five clinically-informed levels: intellectual (pure head-level discussion without emotion), defended (actively avoiding feeling through humor or deflection), emerging (beginning to feel, with hints of vulnerability quickly covered), engaged (present with genuine emotion and surprise at self), and deep (full therapeutic presence, tears, silence, profound recognition). These levels draw from assessment standards used by the International Association for Analytical Psychology and the Society of Analytical Psychology, adapted here for voice-based interaction where prosody data provides an additional signal that text alone cannot.
Progress through the four individuation stages (Persona, Shadow, Anima/Animus, and Self) is quantified through a stage readiness score calculated from the last five sessions. Advancing to the next stage requires reaching 80% readiness, a threshold based on genuine therapeutic work rather than mere participation. The platform operates on a full transparency model, exposing everything to the user: Carl's private clinical notes, affective engagement levels, resistance patterns, and session-by-session insights. The intent is that self-discovery should not be opaque; the process itself should be visible.
While Archetype approaches inner work through the analytical lens of depth psychology, HumeChat's second product takes a fundamentally different path: guided experience rather than guided analysis.
Shaman
Shaman began with a question about the nature of guided inner work itself. Archetype operates through analysis: it names patterns, tracks progress, and produces clinical assessments. But not all inner work benefits from that analytical frame. Sometimes what a person needs is not interpretation but immersion, not a diagnosis but a space held with intention. I wanted to explore whether emotionally-aware voice AI could facilitate that kind of experiential practice, one rooted in shamanic tradition rather than clinical psychology.
The experience begins with two choices. First, the user selects a guide from five shamanic personas, each embodying a distinct quality of presence: Diego offers grounded earth wisdom and ceremonial guidance; Daniel Plainview brings deep visionary insight and transformative depth; Isabela provides nurturing, heart-centered healing; Kora channels playful cosmic energy and authentic presence; and Luna serves as a mystical guide drawing on tarot and astrology. Second, the user sets an intention for the session by choosing from four sacred intention cards: Healing (restore balance and find peace), Clarity (see through confusion to truth), Growth (transform and evolve), or Release (let go of what no longer serves).
With guide and intention selected, the user enters a voice session accompanied by sacred geometry visualizations that respond to the audio in real time. The same 48-emotion prosody analysis that powers the rest of the platform runs throughout, allowing the guide to perceive and respond to the user's emotional state as the journey unfolds. Session context carries the selected intention forward, so the guide's responses remain anchored to the user's purpose. Over time, the dashboard tracks intentions and journeys, surfacing patterns across sessions.
Where Archetype and Shaman serve adult users through different modalities of inner work, HumeChat's third product addresses a fundamentally different audience and purpose.
Wooza
Wooza is a children's voice companion built around the question of whether emotionally-aware voice AI could serve younger users in an educational context, and whether the same prosody analysis that detects nuance in adult conversation could also sense when a child is confused, frustrated, or losing interest, and adjust accordingly.
At its center is Buster, a friendly AI troll designed specifically for younger users. Buster can assist with homework across subjects such as mathematics and science, guide children through interactive storytelling adventures, provide feedback during reading practice, and serve as a patient listener for any question a child might have. The design philosophy for Wooza prioritizes safety above all else. There is no persona customization. The experience is entirely controlled, with responses constrained to child-friendly content and the interface designed to maintain focus on learning and enjoyment.
Buster's Game Hub
Beyond conversation, Wooza includes a game hub where Buster leads children through voice-powered games that make direct use of the platform's emotion detection capabilities. Two games are currently available:
Emotion Charades: Buster presents a target emotion and the child expresses it through their voice. The same real-time prosody analysis that powers the rest of the platform detects whether the child's vocal expression matches the target, with streak scoring and particle visualizations that celebrate successful matches. The game turns emotion recognition into active practice rather than passive observation.
Math Creatures: Buster presents math puzzles using emoji creatures ("three stars plus two moons"), and the child answers through voice. Difficulty scales across five levels with adjustable operations, and streak tracking encourages sustained engagement. The particle visualizer transforms with each level, giving the math problems a visual dimension that holds attention.
Both games demonstrate a broader point about the platform's architecture: the emotion detection layer is not confined to therapeutic or conversational contexts. The same prosody analysis that measures affective engagement in Archetype sessions can also tell whether a child is genuinely having fun or starting to disengage.
Platform Overview
At the infrastructure level, HumeChat is built around WebSocket-powered real-time voice chat with sub-second latency and continuous 48-emotion prosody analysis during every session. Each product features GPU-accelerated WebGL particle visualizers with over 80 procedural shapes, from sacred geometry to animals, driven by the audio signal with bass, voice, and treble mapped to distinct visual axes. Persistent memory retains context across sessions, allowing guides and companions to build on previous conversations.
Each of the three products maintains its own dashboard with session history, full transcripts, emotion analytics, and audio recordings. Beyond these shared capabilities, each product contributes its own layer: Archetype extends the dashboard with Jungian profiling, clinical analysis, and individuation tracking; Shaman tracks intentions and journey history across sessions; and Wooza offers voice-powered games that turn emotion detection into interactive play.
Conclusion
Building HumeChat has been an exercise in exploring what becomes possible when emotional awareness is treated not as a novelty feature but as a foundational input to voice AI interaction. The core premise has proven compelling: that voice AI which can perceive how someone feels, not merely what they say, opens a meaningfully different design space. Each of the three products tests that premise in a distinct context, from clinical depth psychology to experiential shamanic practice to children's education and play.
To what extent can prosody analysis deepen the therapeutic potential of AI-guided self-discovery? What happens when a shamanic guide can perceive the emotional terrain of a journey in real time? Can emotionally-aware voice interfaces genuinely support psychological work, or will they remain constrained to surface-level adaptation? What does it mean for a children's educational companion to sense frustration or disengagement and adjust accordingly? These are questions I find worth pursuing, and HumeChat is the vehicle through which I am exploring them.
For those interested in trying the platform, HumeChat offers a free demo requiring no account, with ten minutes of daily voice AI conversation across all three products, including emotion detection and WebGL visualizers. Full access, encompassing persistent memory, session history, dashboards, and the complete Archetype individuation journey, is currently available on an invite basis and can be requested through the sign-in page.