SayBrain is now a speech generation API

Natural speech, shipped through one clean API.

This project has been rebuilt around text-to-speech. It now exposes a focused speech generation endpoint, a production-ready auth option, and a browser playground that makes integration obvious.

Open playground View API details

Default model

gpt-4o-mini-tts

Built-in voices

Formats

mp3, wav, aac, opus, flac, pcm

Max input

4096 chars

Natural by default

The API is tuned around OpenAI's expressive TTS model with a human-sounding default prompt instead of flat mechanical reads.

One endpoint

Send JSON to /api/speech and get playable audio bytes back. No queueing layer, transcript schema, or post-processing step required.

Easy integration

The playground generates request JSON, curl, and browser fetch snippets so frontend and backend teams can wire it in immediately.

Useful controls

Built-in voice selection, audio format control, speed tuning, style presets, optional auth, and CORS support for production deployments.

API Surface

Focused, deployable speech generation

The route is intentionally small: one POST endpoint that turns text into audio. It returns audio bytes directly, so you can stream, download, or play the result in a browser without another translation layer.

Endpoint

POST /api/speech

Request fields

`text` (required): up to 4096 characters.
`voice`: built-in voice like `marin` or `cedar`, or a custom `voice_...` id.
`format`: `mp3`, `wav`, `aac`, `opus`, `flac`, or `pcm`.
`speed`: any value from `0.25` to `4`.
`style`: `natural`, `conversational`, `presenter`, `storyteller`, `support`, or `calm`.
`instructions`: optional extra delivery guidance layered on top of the preset.

Production options

Optional app key

Set SPEECH_API_KEY to protect the endpoint without exposing your OpenAI key to clients.

Controlled CORS

Set CORS_ALLOW_ORIGIN when your frontend is hosted elsewhere.

Binary response

The response body is audio, with content-type and speech metadata returned in headers.

Natural defaults

The server defaults to marin and a prompt tuned for human pacing instead of flat synthetic delivery.

Playground

Generate natural speech

Model: gpt-4o-mini-tts

Input text62/4096

VoiceFormat

Speed1.00x

Delivery preset

Extra instructions

Browser playback

The API returns raw audio bytes. This demo converts the response into a Blob and plays it locally.

Request

Integration-ready payload

{
  "text": "안녕하세요. SayBrain Speech API입니다. 사람처럼 자연스럽고 또렷한 음성으로 문장을 읽어드립니다.",
  "voice": "marin",
  "format": "mp3",
  "speed": 1,
  "style": "natural"
}

Code

Copy and integrate

const response = await fetch("https://your-domain.com/api/speech", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_APP_API_KEY",
  },
  body: JSON.stringify({
  "text": "안녕하세요. SayBrain Speech API입니다. 사람처럼 자연스럽고 또렷한 음성으로 문장을 읽어드립니다.",
  "voice": "marin",
  "format": "mp3",
  "speed": 1,
  "style": "natural"
}),
});

if (!response.ok) {
  throw new Error("Speech generation failed");
}

const audioBlob = await response.blob();
const audioUrl = URL.createObjectURL(audioBlob);
new Audio(audioUrl).play();

Optional auth

Set SPEECH_API_KEY to require Authorization: Bearer or x-api-key on every request.

Cross-origin browser use

Set CORS_ALLOW_ORIGIN when the client runs on a different domain.