With our primary focus on the two ASRs: Web Speech API and Microsoft Azure, we’ve found them to be the perfect candidates for our research project as other ASRs proved to be difficult and complicated mostly to the stuff regarding cloud permissions and API keys. Anyway, we were hoping to get the third ASR into our project. This third ASR is called “Apptek”, is a company based in Northern Virginia. They have a partnership with Gallaudet University for accessibility-related stuff. So with this, we initially had a meeting with them in the first week of REU internship. Unfortunately, it finally came around this week when we got their API key and was able to get started with their ASR. After the meeting this week, we came to a conclusion that we had different goals, we were expecting their ASR to be ready to provide accessibility over a web browser with WebRTC. This isn’t the case as they’ve just started building their project and need feedback and results from the users first before they move on further with the software development cycles. Nobody is blamed here. It’s just an opportunity that would have been great if it was ready to be used in our research project. As we are REU interns, we are only given 10 weeks to work on this research project. That does not give us enough time to do them all (tasks).
Web Speech Transcript/Captions
This week, I pretty much polished the source code for Web Speech transcript and captions. With the big help from Emelia and Norman on fixing some final issues I had with Web Speech source code. I also have merged some of their codes to work with the main goals of making it usable both ways simultaneously with live captions and transcript. Not much to be said for the web speech transcript and caption as I’ve covered most of it in the previous weekly blogs. I got to wrap this up, and is working perfectly for our test use-case.
With Web Speech Transcript and Captions done. It was the time for me to focus on ASR switching application that can switch between two different ASRs in 2 steps. (Hamburger menu icon > click on an ASR) that’s it. In the past weeks, I have gotten the ASR switching application to work properly with the captions, but not transcripts. This week, I’ve cloned Emelia/Norman’s source codes for the 100% working transcripts equivalent to “real-world expectations” of speed and usability. With the code clone, I began to merge the code I’ve been working on in the ASR switching application to work over in the clone directory. The merging part was not easy as I thought, so that took some time for everything to work properly in the clone directory. I spent a lot of time hunting bugs, making things work, and moving things in different places to see if it works or not. All that took me approximately 2 days of work. Finally I’m seeing the light at end of the tunnel. The ASR switching application mostly works now, but the next thing to tackle: Getting the remote captions/transcript to “fast-sync”. By this I mean the local captions/transcript (On my client side), the transcript is performing on par with the live spoken content. However, if there was 2nd person in the WebRTC conference, the captions/transcript would have been much slower to transcribe, putting 10 words at once every time. Whereas with local captions/transcript, you would see word appear “word by word” instead of sentence (even 2 sentences) at once. Emelia and Norman are working mostly on this. They’ve solved the remote captions and is on the remote transcript problem right now. For me, I am working parallel to their remote captions, and try to merge it into the ASR switching application. That’s pretty much it for this week. Continuing to merge remote captions/transcript for the ASR switching application next week (Week 8).