Creating WebSocket Server and Client for Audio Transmitting with React

In this article, we'll see how to create a WebSocket server and client for audio transmitting using Node.js (actually bun) and React. We'll use the ws library for creating the WebSocket server and client. The server will receive audio data from the client and broadcast it to all connected clients. The client will capture audio from the microphone and send it to the server. I needed this application for a small walkie/talkie app that I can use in my home. I also created a local DNS server to use a domain name instead of an IP address for all clients that's connected to my router.

My folder structure is as follows:

.
├── server.ts               # Main server file
├── fe
│   ├── App.tsx             # App component of React
│   ├── Receive.tsx         # Receive component of React
│   └── Stream.tsx          # Stream component of React
└── ...
└── ...

// server.ts
import express from "express";
import http from "http";
import path from "path";
import wslib from "ws";

const app = express();
const server = http.createServer(app);
const wss = new wslib.Server({ server, path: "/socket" });

// Store connected clients
let clients: wslib.WebSocket[] = [];
// Serve the static frontend files
app.use(express.static("fe/dist"));

wss.on("connection", (ws) => {
  console.log("Client connected");
  clients.push(ws);

  ws.on("message", (message) => {
    // Broadcast the message to all other clients
    clients.forEach((client) => {
      if (client !== ws && client.readyState === wslib.OPEN) {
        client.send(message);
        process.stdout.write(".");
      }
    });
  });

  ws.on("error", (error) => {
    console.error(error);
  });

  ws.on("close", () => {
    console.log("\nClient disconnected");
    clients = clients.filter((client) => client !== ws);
  });
});

// Serve the react-router-dom routes
app.get("*", (req, res) => {
  console.log(req.url, req.hostname);
  return res.sendFile(path.resolve("fe", "dist", "index.html"));
});

const PORT = process.env.PORT;

server.listen(PORT, () => {
  console.log(`Server is listening on port ${PORT}`);
});

In the above code, we have created a WebSocket server using the ws library. We store the connected clients in the clients array. When a client sends a message, we broadcast it to all other clients. We also handle the error and close events of the WebSocket connection.

There will be two routes in frontend, name stream and receive. The stream route will capture audio from the microphone and send it to the server. The receive route will receive audio from the server and play it.

Of course since this implementation will require https in frontend, we need to create a self-signed certificate for the server. We can use the mkcert command to create a self-signed certificate. Here is how to do it with apache:

<VirtualHost *:443>
	ServerName yourdomain.com
	SSLEngine on
	SSLCertificateFile "path/to/certs/yourdomain.com.crt"
	SSLCertificateKeyFile "path/to/certs/yourdomain.com.key"
	ProxyPreserveHost On
	ProxyPass / http://localhost:4096/
	ProxyPassReverse / http://localhost:4096/
	RewriteEngine on
	RewriteCond %{HTTP:Upgrade} websocket [NC]
	RewriteCond %{HTTP:Connection} upgrade [NC]
	RewriteRule ^/?(.*) "ws://localhost:4096/$1" [P,L]
</VirtualHost>

Here you can see that we are using the ProxyPass directive to pass the requests to the server. We are also using the RewriteRule directive to rewrite the URL for the WebSocket connection. Of course don't forget to enable the mod_proxy_http module in apache and modify your /etc/hosts file to point your domain to localhost by adding 127.0.0.1 yourdomain.com.

Here you can see the frontend code for stream route:

// Stream.tsx
import React from "react";
import { getSocketError } from "../hooks/socketEvents";

interface SocketState {
  message: string;
  started: boolean;
  deviceAwake: boolean;
}

export default function Stream() {
  const [state, setState] = React.useState<SocketState>({
    message: "",
    started: false,
    deviceAwake: false,
  });

  const socket = React.useRef<WebSocket | undefined>(undefined);
  const setMessage = (msg: string) => setState((s) => ({ ...s, message: msg }));

  React.useEffect(() => {
    if (state.started) {
      socket.current = new WebSocket("/socket");

      socket.current.onopen = () => {
        setState((s) => ({
          ...s,
          message: "Connected to signaling server",
        }));
        startStreamingAudio(
          socket.current!,
          () => setMessage("Streaming audio"),
          setMessage
        );
      };

      socket.current.onmessage = (message) => {
        console.log("Received message:", message.data);
      };

      socket.current.onclose = () => {
        setState((s) => ({
          ...s,
          message: "Disconnected from websocket server.",
        }));
      };

      socket.current.onerror = (error) => {
        setState((s) => ({
          ...s,
          message: "Websocket error:" + getSocketError(error as CloseEvent),
        }));
        console.error(error);
        console.error("WebSocket error:", getSocketError(error as CloseEvent));
      };
    } else if (socket.current) {
      socket.current?.close();
      socket.current.onopen = null;
      socket.current = undefined;
    }
  }, [state.started]);

  return (
    <div>
      <h3>Stream Audio</h3>
      <div className="flex gap-4 justify-center py-4">
        <button
          disabled={state.started}
          onClick={() => setState((s) => ({ ...s, started: true }))}
        >
          Start
        </button>
        <button
          disabled={!state.started}
          onClick={() => setState((s) => ({ ...s, started: false }))}
        >
          Stop
        </button>
      </div>
      <div className="flex gap-4 justify-center pb-4">
        <button disabled={state.deviceAwake} onClick={keepDeviceAwake}>
          {state.deviceAwake ? "Device doesn't sleep" : "Keep device awake"}
        </button>
      </div>
      <p className="text-center min-h-6">{state.message}</p>
    </div>
  );

  function keepDeviceAwake() {
    if ("wakeLock" in navigator) {
      navigator.wakeLock.request("screen").then((wakeLock) => {
        setState((s) => ({
          ...s,
          message: "Screen Wake Lock active",
          deviceAwake: !wakeLock.released,
        }));
        wakeLock.addEventListener("release", () => {
          console.log("Screen Wake Lock released:", wakeLock.released);
          setState((s) => ({
            ...s,
            message: "Screen Wake Lock released",
            deviceAwake: false,
          }));
        });
      });
    } else {
      setState((s) => ({
        ...s,
        message: "Wake Lock API not supported",
      }));
    }
  }
}

function startStreamingAudio(
  ws: WebSocket,
  cb: () => void = () => {},
  errCb: (msg: string) => void
) {
  navigator.mediaDevices
    .getUserMedia({ audio: true })
    .then((stream) => {
      console.log("Microphone access granted");
      const audioContext = new (window.AudioContext ||
        window.webkitAudioContext)();
      const source = audioContext.createMediaStreamSource(stream);
      const processor = audioContext.createScriptProcessor(4096, 1, 1);

      source.connect(processor);
      processor.connect(audioContext.destination);

      processor.onaudioprocess = (e) => {
        const audioData = e.inputBuffer.getChannelData(0);
        // Convert Float32Array to Int16Array for transmission
        const int16Array = new Int16Array(audioData.length);
        for (let i = 0; i < audioData.length; i++) {
          int16Array[i] = audioData[i] * 0x7fff; // Convert to 16-bit PCM
        }
        if (ws.readyState === ws.OPEN) {
          ws.send(int16Array.buffer);
        }
      };
      cb();
    })
    .catch((err) => {
      errCb("Error accessing microphone");
      console.error("Error accessing microphone:", err);
    });
}

Since socket errors are not very descriptive, we have a helper function getSocketError to get the error message. We also have a helper function keepDeviceAwake to keep the device awake. The startStreamingAudio function captures audio from the microphone and sends it to the server. Here's the code for getSocketError:

// hooks/socketEvents.ts
export function getSocketError(event: CloseEvent) {
  let reason: string;
  if (event.code == 1000)
    reason =
      "Normal closure, meaning that the purpose for which the connection was established has been fulfilled.";
  else if (event.code == 1001)
    reason =
      'An endpoint is "going away", such as a server going down or a browser having navigated away from a page.';
  else if (event.code == 1002)
    reason =
      "An endpoint is terminating the connection due to a protocol error";
  else if (event.code == 1003)
    reason =
      "An endpoint is terminating the connection because it has received a type of data it cannot accept (e.g., an endpoint that understands only text data MAY send this if it receives a binary message).";
  else if (event.code == 1004)
    reason = "Reserved. The specific meaning might be defined in the future.";
  else if (event.code == 1005) reason = "No status code was actually present.";
  else if (event.code == 1006)
    reason =
      "The connection was closed abnormally, e.g., without sending or receiving a Close control frame";
  else if (event.code == 1007)
    reason =
      "An endpoint is terminating the connection because it has received data within a message that was not consistent with the type of the message (e.g., non-UTF-8 [https://www.rfc-editor.org/rfc/rfc3629] data within a text message).";
  else if (event.code == 1008)
    reason =
      'An endpoint is terminating the connection because it has received a message that "violates its policy". This reason is given either if there is no other sutible reason, or if there is a need to hide specific details about the policy.';
  else if (event.code == 1009)
    reason =
      "An endpoint is terminating the connection because it has received a message that is too big for it to process.";
  else if (event.code == 1010)
    // Note that this status code is not used by the server, because it can fail the WebSocket handshake instead.
    reason =
      "An endpoint (client) is terminating the connection because it has expected the server to negotiate one or more extension, but the server didn't return them in the response message of the WebSocket handshake. <br /> Specifically, the extensions that are needed are: " +
      event.reason;
  else if (event.code == 1011)
    reason =
      "A server is terminating the connection because it encountered an unexpected condition that prevented it from fulfilling the request.";
  else if (event.code == 1015)
    reason =
      "The connection was closed due to a failure to perform a TLS handshake (e.g., the server certificate can't be verified).";
  else reason = "Unknown reason";

  return reason;
}

Here you can see the frontend code for receive route:

// Receive.tsx
import React, { useEffect } from "react";
import { getSocketError } from "../hooks/socketEvents";

interface SocketState {
  response: string;
  started: boolean;
}

export default function Receive() {
  const [state, setState] = React.useState<SocketState>({
    response: "",
    started: false,
  });

  const socket = React.useRef<WebSocket | undefined>(undefined);
  const audioQueue = React.useRef<ArrayBufferLike[]>([]);
  const audioContext = React.useRef<AudioContext | undefined>(undefined);

  const setMessage = (msg: string) =>
    setState((s) => ({ ...s, response: msg }));

  useEffect(() => {
    if (state.started) {
      const processAudioQueue = async () => {
        if (audioQueue.current.length > 0 && audioContext.current) {
          const audioData = new Int16Array(
            audioQueue.current.shift() as ArrayBufferLike
          );
          if (audioData.length > 0) {
            const float32Array = new Float32Array(audioData.length);
            for (let i = 0; i < audioData.length; i++) {
              float32Array[i] = audioData[i] / 0x7fff;
            }

            const audioBuffer = audioContext.current.createBuffer(
              1,
              float32Array.length,
              audioContext.current.sampleRate
            );
            audioBuffer.copyToChannel(float32Array, 0);

            const source = audioContext.current.createBufferSource();
            source.buffer = audioBuffer;
            source.connect(audioContext.current.destination);
            source.start(0);

            source.onended = () => {
              processAudioQueue();
            };
          } else {
            setTimeout(processAudioQueue, 100);
          }
        } else {
          setTimeout(processAudioQueue, 100);
        }
      };
      console.log("Starting WebSocket connection...");
      socket.current = new WebSocket("/socket");
      socket.current.binaryType = "arraybuffer";

      socket.current.onopen = () => {
        setMessage("Connected to signaling server");
        audioContext.current = new (window.AudioContext ||
          window.webkitAudioContext)();
        console.log("Audio context created");

        if (audioContext.current.state === "suspended") {
          audioContext.current.resume().then(() => {
            console.log("Audio context resumed");
            processAudioQueue();
          });
        } else {
          processAudioQueue();
        }
      };

      socket.current.onmessage = (message) => {
        if (typeof message.data === "string") {
          console.log("Received message:", message.data);
        } else {
          audioQueue.current.push(message.data);
        }
      };

      socket.current.onclose = () => {
        setMessage("Disconnected from signaling server");
      };

      socket.current.onerror = (error) => {
        setMessage(getSocketError(error as CloseEvent));
        console.error(error);
        console.error("WebSocket error:", getSocketError(error as CloseEvent));
      };
    } else if (socket.current) {
      console.log("Closing WebSocket connection...");
      socket.current.close();
      socket.current = undefined;
      audioQueue.current = [];
    }
  }, [state.started]);

  return (
    <div>
      <h3>Receive Audio</h3>
      <div className="flex gap-4 justify-center py-4">
        <button
          disabled={state.started}
          onClick={() => setState((s) => ({ ...s, started: true }))}
        >
          Start
        </button>
        <button
          disabled={!state.started}
          onClick={() => setState((s) => ({ ...s, started: false }))}
        >
          Stop
        </button>
      </div>
      <div className="text-center min-h-6">{state.response}</div>
    </div>
  );
}

Since http payload doesn't suppor 32-bit float, we need to convert it to 16-bit integer. We also need to convert it back to 32-bit float when playing the audio. We also need to create a new AudioContext for playing the audio. We also need to resume the audio context if it's in the suspended state. We also need to create a new AudioBuffer and AudioBufferSourceNode for playing the audio. We also need to handle the onended event of the AudioBufferSourceNode to play the next audio in the queue.