Skip to main content
Skip table of contents

Media Server Deployment and Configuration Guide (With Azure Transcription Support)

Prerequisites

Software Requirements

Item

Recommended

Installation guide

Operating System

Debian 12

-

FQDN mapped to server IP address

-

-

Hardware Requirements

Item

Minimum

RAM

16GB

Disk space

150GB

CPU

8 cores

Port Utilization Requirements

The following ports must be open on the server for the voice connector to function.

FireWall Ports/Port range

Network Protocol

Description

5060:5091

udp

Used for SIP signaling.

5060:5091

tcp

Used for SIP signaling.

8021

tcp

Media Server Event Socket

16384:32768

udp

Used for audio/video data in SIP, WSS, and other protocols

7443

tcp

Used for WebRTC

8115

tcp

Voice Connector API

5432

tcp

Postgresql Database

3000

tcp

Outbound Dialer API

22

tcp

SSH

80

tcp

HTTP

443

tcp

HTTPS

1194

udp

OpenVPN

The ports can be opened as follows:

  1. SSH into the Debian server.

    1. Use command

      CODE
      ssh username@server-ip
    2. Enter user password.

    3. Use command

      CODE
      su
    4. Enter root password

  2. Run the following command:

    • CODE
      sudo iptables -A INPUT -p PROTOCOL -m PROTOCOL --dport PORT -j ACCEPT
    • Where PORT is the required Firewall port/port range and PROTOCOL is the associated Network Protocol.

  3. Save this port configuration with command:

    CODE
    sudo iptables-save

Additional Firewall Rules

  • iptables -A INPUT -i lo -j ACCEPT

  • iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT

  • iptables -A INPUT -p icmp --icmp-type echo-request -j ACCEPT

  • iptables -P INPUT DROP

  • iptables -P FORWARD DROP

  • iptables -P OUTPUT ACCEPT

Setup Azure Speech Service

  • Create a Microsoft Azure Account.

  • Create an Azure Speech resource.

  • Create a billing account to use with the speech resource.

  • Note the Subscription Key, Region, and API Endpoint for this resource.

  • For the following APIs, use Authentication type API Key, with the Key name set to Ocp-Apim-Subscription-Key , Value set to the Subscription Key above, and the key added to the request headers.

image-20240822-130919.png
  • Obtain a list of models with the endpoint using the following API:

    • GET <endpoint>/speechtotext/v3.2/models/base

  • Choose a model with properties.deprecationDates.transcriptionDateTime having a date not yet reached.

  • The language code for each model is specified in locale field.

  • The properties.features.supportsTranscriptions field must be True.

  • Example API response (shortened for brevity):

    CODE
    {
      "self": "<endpoint>/speechtotext/v3.2/models/base/model-id",
      "properties": {
        "deprecationDates": {
          "adaptationDateTime": "2025-01-15T00:00:00Z",
          "transcriptionDateTime": "2026-01-15T00:00:00Z"
        },
        "features": {
          "supportsTranscriptions": true,
          "supportsEndpoints": true,
          "supportsTranscriptionsOnSpeechContainers": false,
          "supportsAdaptationsWith": [
            "Language",
            "Acoustic",
            "AudioFiles",
            "Pronunciation",
            "OutputFormatting"
          ],
          "supportedOutputFormats": [
            "Display",
            "Lexical"
          ]
        },
      },
      "lastActionDateTime": "2024-01-18T15:05:45Z",
      "createdDateTime": "2024-01-18T15:04:17Z",
      "locale": "en-US",
      "description": "en-US base model"
    }
  • Note the self field of that model.

  • Use this model’s self and locale values to create a new speech endpoint using the following API:

    • POST <endpoint>/speechtotext/v3.2-preview.2/endpoints

    • CODE
      {
        "model": {
          "self": "<model-self>"
        },
        "properties": {
          "loggingEnabled": true
        },
        "locale": "<model-locale>",
        "displayName": "My New Speech Endpoint",
        "description": "This is a speech endpoint"
      }
  • The response will contain a list of socket/REST endpoint usable for speech transcription. Pick the links.webSocketConversation field.

    CODE
    {
      "values": [
        {
          "self": "https://northeurope.api.cognitive.microsoft.com/speechtotext/v3.2-preview.2/endpoints/e8e25604-a868-454e-9553-37e5cdb0bb7f",
          "model": {
            "self": "https://northeurope.api.cognitive.microsoft.com/speechtotext/v3.2-preview.2/models/base/ca751a0b-4b8c-4a80-baa7-ad1a6388488e"
          },
          "links": {
            "logs": "https://northeurope.api.cognitive.microsoft.com/speechtotext/v3.2-preview.2/endpoints/e8e25604-a868-454e-9553-37e5cdb0bb7f/files/logs",
            "restInteractive": "https://northeurope.stt.speech.microsoft.com/speech/recognition/interactive/cognitiveservices/v1?cid=e8e25604-a868-454e-9553-37e5cdb0bb7f",
            "restConversation": "https://northeurope.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?cid=e8e25604-a868-454e-9553-37e5cdb0bb7f",
            "restDictation": "https://northeurope.stt.speech.microsoft.com/speech/recognition/dictation/cognitiveservices/v1?cid=e8e25604-a868-454e-9553-37e5cdb0bb7f",
            "webSocketInteractive": "wss://northeurope.stt.speech.microsoft.com/speech/recognition/interactive/cognitiveservices/v1?cid=e8e25604-a868-454e-9553-37e5cdb0bb7f",
            "webSocketConversation": "wss://northeurope.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?cid=e8e25604-a868-454e-9553-37e5cdb0bb7f",
            "webSocketDictation": "wss://northeurope.stt.speech.microsoft.com/speech/recognition/dictation/cognitiveservices/v1?cid=e8e25604-a868-454e-9553-37e5cdb0bb7f"
          },
          "lastActionDateTime": "2024-07-08T06:37:37Z",
          "createdDateTime": "2024-07-08T06:37:03Z",
          "locale": "en-US",
          "displayName": "My New Speech Endpoint",
          "description": "This is a speech endpoint"
        }
      ]
    }
  • After these steps we now have the Subscription Key, Azure speech service region and the Azure Speech Service socket endpoint. These will be used in the Media Server configuration below.

Install Media Server

  1. SSH into the Debian server onto which the Media Server will be deployed.

    1. Use command

      CODE
      ssh username@server-ip
    2. Enter user password.

    3. Use command

      CODE
      su
    4. Enter root password.

  2. Run commands:

    • CODE
      sudo apt update
      sudo apt install -y lua-sec certbot lua-socket lua-json lua-dkjson
      apt install -y git
      git clone -b azure_transcribe https://efcx:RecRpsuH34yqp56YRFUb@gitlab.expertflow.com/rtc/media-server-setup.git "/usr/src/fusionpbx-install.sh"
      chmod -R 777 /usr/src/fusionpbx-install.sh
      cd /usr/src/fusionpbx-install.sh/debian && ./install.sh
  3. Once the installation has finished, some information will be shown as below:

  1. In a web browser, open the domain name URL and use the provided username and password to log on.

    1. A screen like below should open for a successful installation:

  1. If the page does not open, then go to the command line and run

    CODE
    systemctl stop apache2
    systemctl restart nginx
  2. Try opening the page in Step 3b again, and if it does not open, reset the server and start the installation again.

  3. In the command line, use the command to access the Freeswitch command line as shown below:

    CODE
    fs_cli

Configure Global Transcription

  • Login to Media Server web interface. 

    • Open in browser: https://IP-addr, where IP-addr is the IP address of the Media Server.

  • Add the username and password that was shown upon installation and press LOGIN.

  • Select the Variables option from the Advanced tab.

image-20240822-062345.png
  • Press ADD on the top right.

image-20240822-062446.png
  • Set the Category to Defaults, Name to transcription-vendor and Value to azure.

image-20240822-062841.png
  • Save the changes by pressing SAVE button in top right corner.

  • Open SIP Status under the Status tab.

image-20240424-103838.png
  • Press the RELOAD XML button at the top right.

  • Open the Dialplan Manager section under the Dialplan tab.

  •  Add a new Dialplan by pressing the ADD Button on the top.

image-20240822-063812.png
  •  Fill the form with following details :

    • Name = azure_transcription

    • Condition 1 =  Select destination_number from list and add a random number, then click to edit and paste ${transcription-vendor}

      • For the field to the right, add ^azure$

    • Action 1  =  Select first item from the list  

  • Save the form by pressing save button on top right Corner.

  • Re-open azure_transcription dialplan.

  • Set the Continue field to True.

  • Set the order field to 889.

  • Set the Context field to global.

  • Set the Domain field to Global.

  • Delete the second row by checking the box in the Delete column for the second row and pressing SAVE in the top right.

  • Add the following information to this dialplan:

Tag

Type

Data

Group

Order

Enabled

action

set

START_RECOGNIZING_ON_VAD=true

0

10

true

action

export

START_RECOGNIZING_ON_VAD=true

0

15

true

action

set

AZURE_SUBSCRIPTION_KEY=<key>

0

20

true

action

export

AZURE_SUBSCRIPTION_KEY=<key>

0

25

true

action

set

AZURE_SERVICE_ENDPOINT=<endpoint>

0

30

true

action

export

AZURE_SERVICE_ENDPOINT=<endpoint>

0

35

true

action

set

AZURE_REGION=<region>

0

40

true

action

export

AZURE_REGION=<region>

0

45

true

action

export

nolocal:execute_on_answer=lua start_transcribe.lua azure <language> ${uuid}

0

50

true

  • The values of <key>, <endpoint> and <region> can be obtained from the Setup Azure Speech Service section above.

  • For <language> a language tag can be chosen from this list, based on the Azure Speech model used e.g. en-US.

  • Save the changes by pressing SAVE button in top right corner.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.