Audio Processing Models
We provide several most popular Audio processing models for scenarios like TTS(Text To Speech) and ASR(Automatic Speech Recognition)/STT(Speech To Text).
Before started
You should get some parameters before get started : SERVICE_ID
, API_KEY
and MODEL
, you can find them on our dashbord (opens in a new tab).
API reference
TTS
support model
- fish-speech-1.4
/v1/audio/speech
Parameter name | Type | Description | Required |
---|---|---|---|
model | String | model type: fish-speech-1.4, Whisper-large-v3, Whisper-large-v3-turbo | Yes |
input | String | The text to generate audio for. Maximum length is 4096. | Yes |
response_format | String | Audio format: mp3, wav, pcm. default: wav | Yes |
Response Elements
Parameter name | Type | Description |
---|---|---|
Audio file content |
ASR
support model
- fish-speech-1.4
- Whisper-large-v3
- Whisper-large-v3-turbo
/v1/audio/transcriptions
Request Parameters
Parameter name | Type | Description | Required |
---|---|---|---|
model | String | model type: fish-speech-1.4, Whisper-large-v3, Whisper-large-v3-turbo | Yes |
file | String | The audio file object to transcribe, must be one of these formats: flac, mp3, mp4, mpeg, mgpa, m4a, ogg, wav, webm | Yes |
language | String | The language of audio file, format must in ISO-639-1. | Yes |
Response Elements
Parameter name | Type | Description |
---|---|---|
text | String |
Usage
python
/v1/audio/speech
from pathlib import Path
import openai
client = openai.OpenAI(
base_url="https://modelapi.holmesai.xyz/$SERVICE_ID/v1",
api_key="$API_KEY",
)
output_file_path = Path(__file__).parent / "output.wav"
response = client.audio.speech.create(
model="$model",
input="The quick brown fox jumped over the lazy dog.",
)
response.stream_to_file(output_file_path)
/v1/audio/transcriptions
import openai
client = openai.OpenAI(
base_url="https://modelapi.holmesai.xyz/$SERVICE_ID/v1",
api_key="$API_KEY",
)
audio_file = open("input.wav", "rb")
transcript = client.audio.transcriptions.create(
model="$MODEL", file=audio_file
)
print(transcript)
curl
/v1/audio/speech
curl -v --output output.wav -d '{
"model": "$MODEL",
"input": "The quick brown fox jumped over the lazy dog."
}' -H "Authorization: Bearer $API_KEY" -H 'Content-Type: application/json' https://modelapi.holmesai.xyz/$SERVICE_ID/v1/audio/speech
/v1/audio/transcriptions
curl -v https://llmapi.holmesai.xyz/$SERVICE_ID/v1/audio/transcriptions \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: multipart/form-data" \
-F "file=@input.wav" \
-F "metadata={\"model\":\"$MODEL\"}"