Overview
When someone calls your 46elks number, 46elks opens a WebSocket connection to your server. Your server receives audio from the caller, forwards it to an AI model, and streams the model's response back — in real time.
Caller ↔ 46elks ↔ Your server (WebSocket) ↔ AI model
Prerequisites
-
A 46elks account.
Don't have one? Sign up here. -
A voice-enabled 46elks phone number.
Allocate one in your dashboard.
A free 46elks websocket-number. -
A public server with an open TCP port.
Any VPS will do — the server must be reachable from the internet. -
Python 3.10+ with
websocketsandpython-dotenv.
Allocate one in your dashboard.
1. Point your numbers at your server
In the 46elks dashboard, set the virtual number's voice_start to {"connect":"YOUR-WEBSOCKET-NUMBER"}.
Then set the websocket-number's voice_start to a WebSocket URL pointing at your server:
ws://YOUR-SERVER-IP:8095
When someone calls, 46elks will connect to this address. If the connection fails (server down, port closed, wrong URL), the caller hears a busy tone.
2. Understand the message protocol
All messages are JSON. The field t identifies the message type. The session follows a strict lifecycle:
- 46elks sends
hello— call has started. - Your server declares audio formats with
sendingandlistening. - Audio streams bidirectionally via
audiomessages. - Either side sends
byeto end the call.
3. Message reference
From 46elks → your server
| Type | Fields | Description |
|---|---|---|
hello | callid, from, to | Call started. |
audio | data (base64) | Audio from caller. |
sync | — | Buffer checkpoint acknowledgment. |
bye | reason (done / hangup / error) | Call ended. |
From your server → 46elks
| Type | Fields | Description |
|---|---|---|
sending | format | Declare outbound audio format (agent → caller). |
listening | format | Declare inbound audio format (caller → agent). |
audio | data (base64) | Audio to play to caller. |
interrupt | — | Clear playback buffer. Must send sending again before resuming. |
sync | — | Request buffer checkpoint. |
bye | — | End the call. |
Supported audio formats
| Format string | Description |
|---|---|
pcm_8000 / pcm_16000 / pcm_24000 | 16-bit PCM mono. |
alaw / ulaw | G.711, 8 kHz. |
g722 | Wideband, 16 kHz. |
ogg | Opus in Ogg container. |
wav / mp3 | Outbound (sending) only. |
4. Prompt for Claude or Lovable
Paste the prompt below into Claude, Lovable, or any AI assistant to generate a custom voice agent. Fill in your assistant's task at the bottom.
Build a Python WebSocket server that accepts incoming phone calls from 46elks and bridges audio to OpenAI Realtime API.
46elks WebSocket protocol (use these exact field names):
- All messages are JSON with field
t(nottype) as the message type. - First message from 46elks:
{"t": "hello", "callid": "…", "from": "…", "to": "…"} - Declare outbound format:
{"t": "sending", "format": "pcm_24000"} - Declare inbound format:
{"t": "listening", "format": "pcm_24000"} - Audio from 46elks:
{"t": "audio", "data": "<base64>"} - Send audio to 46elks:
{"t": "audio", "data": "<base64>"} - Call ended:
{"t": "bye", "reason": "hangup"}
OpenAI Realtime API:
- Model:
gpt-4o-realtime-preview - Header:
OpenAI-Beta: realtime=v1 - Use
input_audio_format: "pcm16"andoutput_audio_format: "pcm16"— these matchpcm_24000. - Forward caller audio via
input_audio_buffer.append. - Play AI responses when
response.audio.deltaarrives.
The assistant's task: [describe what your assistant should do]
Listen on port 8095. Use websockets and python-dotenv.
5. Example: bridge to OpenAI Realtime
The example below creates a WebSocket server that accepts calls from 46elks and bridges audio to the OpenAI Realtime API. Use pcm_24000 on the 46elks side — it matches OpenAI's PCM16 format at 24 kHz.
#!/usr/bin/env python3
# pip install websockets python-dotenv requests
import asyncio, json, logging, os, sys
import requests, websockets
from websockets.asyncio.client import connect as ws_connect
from dotenv import load_dotenv
load_dotenv()
OPENAI_API_KEY = os.environ["OPENAI_API_KEY"]
OPENAI_MODEL = "gpt-4o-realtime-preview"
CODEC = "pcm_24000"
log = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")
INSTRUCTIONS = "You are a helpful voice assistant. Keep answers short and clear."
async def handle_call(elks_ws):
# 1. Receive hello from 46elks
data = json.loads(await elks_ws.recv())
if data.get("t") != "hello":
log.error("Expected hello, got: %s", data)
return
call_id = data.get("callid", "?")
caller = data.get("from", "?")
log.info("Call %s from %s", call_id, caller)
# 2. Connect to OpenAI Realtime
async with ws_connect(
f"wss://api.openai.com/v1/realtime?model={OPENAI_MODEL}",
additional_headers={
"Authorization": f"Bearer {OPENAI_API_KEY}",
"OpenAI-Beta": "realtime=v1",
},
) as openai_ws:
# Configure session
await openai_ws.send(json.dumps({
"type": "session.update",
"session": {
"modalities": ["audio", "text"],
"instructions": INSTRUCTIONS,
"voice": "shimmer",
"input_audio_format": "pcm16",
"output_audio_format": "pcm16",
"turn_detection": {
"type": "server_vad",
"threshold": 0.9,
"silence_duration_ms": 800,
},
},
}))
# Send greeting
await openai_ws.send(json.dumps({
"type": "response.create",
"response": {
"modalities": ["audio", "text"],
"instructions": "Greet the caller briefly.",
},
}))
# 3. Declare audio formats to 46elks
await elks_ws.send(json.dumps({"t": "sending", "format": CODEC}))
await elks_ws.send(json.dumps({"t": "listening", "format": CODEC}))
is_speaking = False
# 4a. Caller audio -> OpenAI
async def elks_to_openai():
async for message in elks_ws:
msg = json.loads(message)
if msg.get("t") == "audio" and not is_speaking:
await openai_ws.send(json.dumps({
"type": "input_audio_buffer.append",
"audio": msg["data"],
}))
elif msg.get("t") == "bye":
log.info("Call ended: %s", msg.get("reason"))
break
# 4b. OpenAI audio -> caller
async def openai_to_elks():
nonlocal is_speaking
async for message in openai_ws:
msg = json.loads(message)
t = msg.get("type")
if t == "response.created": is_speaking = True
elif t == "response.done": is_speaking = False
elif t == "response.audio.delta":
await elks_ws.send(json.dumps({
"t": "audio", "data": msg["delta"],
}))
await asyncio.gather(elks_to_openai(), openai_to_elks())
async def main():
port = int(sys.argv[1]) if len(sys.argv) > 1 else 8095
log.info("Listening on port %d", port)
async with websockets.serve(handle_call, "0.0.0.0", port):
await asyncio.Future()
asyncio.run(main())
6. Run as a systemd service
Create /etc/systemd/system/voice-agent.service:
[Unit]
Description=46elks Voice Agent
After=network.target
[Service]
Type=simple
User=www-data
WorkingDirectory=/var/www/apps/voice-agent
ExecStart=/var/www/apps/voice-agent/venv/bin/python voice_agent.py 8095
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
Then enable and start it:
systemctl daemon-reload
systemctl enable --now voice-agent
journalctl -u voice-agent -f # follow logs
Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
| Busy tone immediately | Server down, port not open, or wrong URL in dashboard. | Check systemctl status voice-agent and open the port in your firewall. |
| Call connects then drops within seconds | Wrong OpenAI model name or invalid API key. | Use gpt-4o-realtime-preview exactly. |
| No audio heard | sending / listening not sent, or sent before OpenAI session is ready. |
Send both declarations after the OpenAI session is configured. |
| Echo / AI interrupts itself | Caller audio fed back to OpenAI while AI is speaking. | Track an is_speaking flag and skip input_audio_buffer.append while it's true. |
Pre-launch checklist
- ☐ Port open in server firewall.
- ☐ 46elks virtual number
voice_startset to{"connect":"YOUR-WEBSOCKET-NUMBER"}. - ☐ 46elks websocket-number
voice_startset tows://YOUR-IP:PORT. - ☐
OPENAI_API_KEYset in environment. - ☐ Model name is
gpt-4o-realtime-preview. - ☐ Service running with
Restart=always.