Core Technology

Voice API

Text-to-Speech · Speech-to-Text

Speech recognition (STT) + dialog engine (LLM) + speech synthesis (TTS) fused into a real-time Voice Agent.
A single engine you can drop into customer support, AI tutoring & conversation practice, in-depth interviews, voice chatbots — any service where humans expect to *talk* with a machine. Swap the script and persona, reuse the same pipeline.

View Applied Services View Architecture

Core Technology

Why Core Technology?

Voice API is not just voice conversion. It is the foundational infrastructure that determines the user experience of AI services.

🏗

Service Infrastructure

AI Agents, chatbots, conversation training, educational content, browser extensions — every service that needs voice runs on this API. API quality equals service quality.

⚡

Real-time Streaming Required

In AI 1:1 conversation, if response latency exceeds 1 second, the dialogue becomes unnatural. We target <500ms latency with WebSocket-based streaming.

🔗

Internal + External API

Beyond internal service integration, this is an independent technology asset that can be monetized by providing APIs to external clients.

Architecture

Service Architecture

💬

talking.how

AI Conversation

🗣

native.how

TTS B2C

🤖

AI Agent

Voice Agent

📚

chatbot.how etc.

Chatbot / Education

API Call

⚡

native.how / API

REST API + WebSocket Streaming

/api/v1/tts /api/v1/tts/stream /api/v1/stt/stream

Wrapping

☁️

Google Cloud TTS / STT API

Seoul Region (Minimal Latency)

Text-to-Speech

TTS Technical Specs

🎙

Neural TTS

Based on WaveNet / Neural2. Naturally reproduces human intonation, emotion, and rhythm.

📡

TTS Streaming

Real-time chunk-based delivery. Even long texts start playing immediately, minimizing wait time.

🌍

100+ Languages

Korean, English, Japanese, Chinese, and 100+ languages. Various voice styles per language.

🎭

Voice Customization

Speed, pitch, volume control. SSML support for emphasis, pauses, and precise pronunciation control.

📄

Multiple Input Formats

Automatic parsing of text, PDF, and webpage URLs. SSML markup also supported.

🔊

Multiple Output Formats

MP3, WAV, OGG, FLAC, and more. Configurable bitrate and sample rate.

Speech-to-Text

STT Technical Specs

🎤

Real-time STT Streaming

WebSocket-based real-time speech recognition. Text appears instantly as you speak.

🔄

Interim Results

Real-time delivery of intermediate recognition results. Response preparation can begin before the user finishes speaking.

🧠

AI Post-processing

Automatic punctuation, word correction, and speaker diarization support.

🔇

VAD (Voice Activity)

Automatic voice segment detection. Maximizes efficiency by reducing unnecessary processing during silent periods.

📊

Confidence Score

Confidence score provided for each recognition result. Enables re-confirmation logic for low confidence segments.

🎯

Context Hints

Pre-specify domain terminology and proper nouns to improve recognition accuracy.

Streaming Pipeline

Real-time Voice Pipeline

The core of real-time voice interaction for AI conversation, voice agents, and more. Targeting total pipeline latency < 1 second.

🎤

User Voice

Mic Input

→

📡

STT Stream

Real-time Recognition

→

🧠

LLM Processing

Response Generation

→

🗣

TTS Stream

Speech Synthesis

→

🔊

Speaker Output

AI Response

Total Pipeline Target: < 1 second

API Endpoints

REST API + WebSocket

POST /api/v1/tts Convert text to audio file

POST /api/v1/tts/stream TTS streaming (real-time chunk delivery)

POST /api/v1/stt Convert audio file to text

WS /api/v1/stt/stream STT real-time streaming (WebSocket)

GET /api/v1/voices List available voices

GET /api/v1/languages List supported languages

Applied Services

Core Technology Applied Where It Matters

Voice API is not limited to specific services. It is universally applied wherever voice capabilities are needed across various services.

🗣

native.how

TTS + STT API / B2C

A B2C service that reads text, PDFs, and webpages naturally like a native speaker, as well as the voice API infrastructure consumed by all services.

Visit →

💬

talking.how

Real-time Streaming Conversation

Full pipeline from STT streaming → LLM → TTS streaming. Voice API streaming performance determines conversation quality.

Visit →

🤖

AI Agent

Voice-based agent interaction. Building AI agents that communicate with users through voice.

💬

chatbot.how

Messenger bot voice messages. Convert text responses to voice using TTS.

📚

Education Content

Convert learning materials to voice. Auto-generate native speaker audio for textbooks, vocabulary, and exercises.

🧩

Browser Extension

Web page TTS reading. Read translated text with native pronunciation.

📋

AI Patent

Patent document voice review. Listen to long specifications as audio while reviewing.

🏢

Custom Development

Custom voice integration for clients. Connect TTS/STT to client services via API.

Want to learn more about Voice Agent?

We help you build domain-specific agents for support, education, interviews and more.