I created matrix-enact as a fun way to render Matrix rooms - it essentially "performs" the room history by progressively speaking each message event in chronological order. In this way, matrix-enact is effectively a simple, read-only Matrix client. Let's see how it was built.
This article will introduce two important concepts in Matrix, specifically in the Matrix Client-Server API:
/context
endpoint, which gets messages before and after a given eventAlthough written in JavaScript (and Reactjs), this project does not use the matrix-js-sdk, it makes direct HTTP calls to the Matrix Client-Server API. Because there are only three endpoints we need to hit, we can keep the project very light by not including an SDK.
Matrix allows for guest access by providing an interface to register a new guest user and be immediately given an access token. To do this we call the /register
endpoint with a query param kind
set to guest
. In matrix-enact, this looks like:
import axios from 'axios';
var url = "https://matrix.org/_matrix/client/r0/register?kind=guest";
const res = await axios.post(url, {});
const { data } = await res;
// data.access_token will contain the access token, we must store it
Once we have the access token, we use it in the same way as if logged in with a normal user.
In the UI, the user can enter either a room alias or a room ID. Whichever they enter, to get message content from a room we need the ID. This means we need to detect if an alias has been entered, and if so get the correct room ID for that alias:
// we know that if the first character is a '#', we have an alias not an id
if (this.state.roomEntry[0] === "#") {
var getIdUrl = "https://matrix.org/_matrix/client/r0/directory/room/";
getIdUrl += encodeURIComponent(this.state.roomEntry);
const res = await axios.get(getIdUrl);
const { data } = await res;
// data.room_id contains the room id for the alias
}
/context
endpointWe use the /context
endpoint to get chronological history of a room timeline.
Looking at this section of the Client-Server API we see:
This API returns a number of events that happened just before and after the specified event. This allows clients to get the context surrounding an event.
To get messages from this endpoint we need to provide a room id and the event id we want context for. Check out the comments in the code below to follow along.
async loadScriptFromEventId(startEventId) {
// first we construct the url as per the CS API
const url = `https://matrix.org/_matrix/client/r0/rooms/${encodeURIComponent(roomId)}/context/${encodeURIComponent(startEventId)}?limit=100&access_token=${this.state.accessToken}`;
axios.get(url).then(res => {
// make an array to store the events from the response
var newEvents = [];
// we only want the events that follow our start events
newEvents = newEvents.concat(res.data.events_after);
// and we only want events that contain a body field, i.e. that are messages
newEvents = newEvents.filter(e => e.content.body);
// finally, since we're using React for this app,
// we store these messages in the state object
this.setState({events: this.state.events.concat(newEvents)});
});
}
Notice the previous URL we hit when calling /context
. We specified a limit
value of 100
. In fact, 100
is usually the limit enforced by the homeserver. This limit refers to the number of events, not the number of messages - remember that we are filtering them in the code above.
If we say that we want our script to be 50 lines long, but after filtering we are left with only 30 messages, what should we do? Get more events after the latest one, and append the new events to our script. Knowing that we have taken a value from the form to be stored in state.messageCount
, and in the previous section we inserted message events into state.events
, we can compare these two variables, and if needed, call loadScriptFromEventId()
again with the last known event.
if (this.state.messageCount > this.state.events.length) {
// get last known event
var lastEvent = res.data.events_after[res.data.events_after.length - 1];
this.loadScriptFromEventId(lastEvent.event_id);
} else {
this.setState({events: this.state.events.slice(0, this.state.messageCount), statusMessage: "Done"});
}
The Web Audio API is a massive topic, out of the scope of this article. We'll cover just enough to be able to show the "happy path" of performing Text-to-Speech (TTS) sequentially.
To deliver a line as audio, the fundamental code is as follows:
var utterance = new SpeechSynthesisUtterance();
utterance.text = "some string";
var someVoice = window.speechSynthesis.getVoices()[0];
utterance.voice = someVoice;
window.speechSynthesis.speak(utterance);
To find out when an utterance ends, attach a function to the onend event:
utterance.onend = function() {
// do something when the line ends
};
Knowing that we can perform TTS on strings we provide, and that we can call a function when a line ends, from here it's easy to see how we can use the list of messages to "enact" the message history.
We will:
Let's create a nextLine()
function in our App
component, and use this to insert lines associated with "Parts", meaning that each part is a separate user with an assigned voice.
nextLine() {
var line = this.state.line;
if (! this.state.events[line]) return;
var newPart = this.state.events[line].sender;
if (! this.state.parts.find(p =>{return p.name === newPart;})) {
this.setState({
parts: this.state.parts.concat([{
name: newPart,
voice: voices[getRandomInt(0, voices.length)]
}])
})
}
this.setState({
script: this.state.script.concat(this.state.events[line]),
line: this.state.line + 1,
nextText: "Continue"
});
}
By incrementing the line
counter, we progress through the script, adding a line at a time to the correct Part
.
During rendering, the App renders an array of Part
Components, which in turn render an array of lines, filtered for that particular Part:
const lines = this.props.script.map((line, lineNumber) => {
line.lineNumber = lineNumber;
return line;
}).filter(l => l.sender === part.name);
Knowing that in React, the constructor
for a Component is called only once, we perform the TTS process itself inside the constructor method:
class Line extends Component {
constructor(props) {
super(props);
var utterance = new SpeechSynthesisUtterance();
var nextLine = this.props.nextLine;
utterance.text = this.props.lineText;
utterance.voice = this.props.part.voice;
synth.speak(utterance);
}
}
Finally, we'll use what we already learned about the onend
event to insert the next line:
class Line extends Component {
constructor(props) {
super(props);
var utterance = new SpeechSynthesisUtterance();
var nextLine = this.props.nextLine;
utterance.onend = function(a) {
nextLine();
};
utterance.text = this.props.lineText;
utterance.voice = this.props.part.voice;
synth.speak(utterance);
}
}
In this way, nextLine() is called in a loop, meaning that the lines are added to React sequentially, and spoken aloud as they are added.
This article covered a lot of ground:
/context/
API endpointTo learn more about Matrix development, check out the Matrix Documentation.