Week 4
my team has made more progress in this week compared to last week
This is especially great news for the team as we didn’t have lots of progress last week due to many obstacles. I honestly was expecting this week to be the same as last week, constantly troubleshooting, and not having solutions right at the door. Even with more progress this week, we did face some obstacles. Just for your information: We use chrome browser as the default browser. So with that, last week, we figured out in order to have the chrome browser to enable permissions of microphone and webcam access, it requires HTTPS. In order to simply use https for our research, we needed to get a SSL certificate for it.
HTTPS w/ SSL certificate
We chose “Let’s Encrypt” as our SSL provider since it was free and gave us up to 3 months. This is the only thing we need for our research to be able to “officially” use https. Why did I say officially? Well last week, I tried to set up https with an unofficial certificate with another github repository called “mkcert”. link: https://github.com/FiloSottile/mkcert So mkcert is probably the most simpliest way to get a SSL certificate up and running. Unfortunately, this did not work in our favor with the chrome browser immediately recognizing it as not official SSL. This was on the sandbox1 server. Raja, our mentor, realized, that in order for the chrome browser to be able to recognize it, we need to get an official SSL certificate, so we chose Let’s Encrypt. there’s one problem: In order for Let’s Encrypt to verify the SSL, it has to be able to access the server on port 443, and by default, the Gallaudet University blocks it. Let’s Encrypt cannot verify on any other port other than 443. This means that our mentor has to submit a ticket to the network/database team to open our vm’s port 443. This could take several days, and we did not want this to stop us from continuing our work. So with that, our mentor took some time to think about what we can do next.. Then he thought of another old server that is not in use, and decided to bring it to his house since it was outside of Gallaudet University’s network, and onto his home network, he can now open port 443 for Let’s Encrypt to verify. We are thankful for our mentor to do that for us and the team! So Let’s Encrypt has been verified. We decided to name this server, Sandbox2. So yay! Working onto the next step: Making the Web Speech API display a live transcript straight on the browser.
Sandbox2 server
With this, it means we have to configure it for our development environment all over again. Based on what we learned and did our work with the first 2 servers: TAP server, and sandbox 1, sandbox2 was more alike to sandbox1. This wasn’t difficult to set it all up, only took us a day and half to get things back to normal as if it was sandbox1. This meant configuring stuff like RDP, SMB, SFTP, and essential things we need in order to be productive. With everything set up, we can resume our focus into the web speech api (for me, at least). My co-worker could start focusing on the same thing I am doing, but with Microsoft Azure instead of Web Speech API.
Web Speech API + live microphone + live transcript
I had an emphasis on getting Web Speech API up and running. The main thing for it to do is to display a transcript in a browser with live microphone as input. I followed a couple guides (I did not bookmark the guides, sorry, otherwise I would paste it here) and was able to successfully clone it and get it to run and especially have the browser listen in on live microphone input because it was run on https with a valid certificate. The operation was successful! The next step I had to do was to add a live webcam input to the browser and have a transcript display a text area translated from voice being spoken to the live microphone. This took some time to configure, and it all worked eventually in the end. Now this transcript on the browser had a toggle button to turn it on and off based on the guide I followed. In a real world video conferencing application, it would be nice to be able to toggle it on and off, but that’s not a primary task for us to do. So, I asked my team if it was ok if I went ahead and tried to have it listen in without any toggle (constant mode). That took some more time, and eventually it all worked. Onto the next, implementing what I learned and configured with WebRTC.
WebRTC + live mic/transcript
So, our research team already has a WebRTC implementation already built to suit our needs of video conferencing over the browser. The video quality is surprisingly good, better than what I normally see on zoom. Anyway, that is not the point. With the success and good progress that my team and I have made with web speech api, and display webcam + transcript in a browser. The next step was to display the transcript in a caption-like environment (bottom of video screen). In the previous paragraph, the web interface had a video and transcript separately in their respective panel. Our research team has Real-time Text (RTT) on their WebRTC, and it was implemented very nearly towards bottom of the video screen (each individual’s webcam had their own RTT space). The functionability of RTT is just to use your keyboard, and type away, and as soon as you type, the other people on the video conference are able to see it immediately as you enter a key which is unusual. The usual thing with text communications is that you have to type for as long as you like to, and others aren’t able to see it until you press “Enter” key which will send the message as a whole sentence/paragraph. RTT ui design is good for our research project. The next step is to replace the RTT content with Web Speech API’s text and have it “captioned” in a live spoken environment. This is to be resumed next week.