Week 1
First week
In my first week of the REU internship was basically like any other first week, an orientation week, where everything is laid out in front of you and you learn everything there is that surrounds the project and its goals for the next 10 weeks. The only difference this time around is the fact that we are living in amidst the coronavirus crisis. This means I won’t get the opportunity of experiencing the greatness of the REU working with other students and mentors on campus. Nevertheless, I am more than fortunate to be able to work with my mentor in this internship working remotely from the comforts of my home. with this being said, me and my co-intern Emelia Beldon took efforts to set up our development environments on our desktop computers and laptops.
Development environment
The development environment consists of having a Virtual Private Network (VPN) set up in order to access our project codebases on Gallaudet University’s servers from our homes. We also set up other things like verifying our account authenications and connections work with networked file platform, SAMBA, and SSH. SAMBA and SSH is there for us to be able to access our projects from our local computers at home and doing productive work remotely. With the development environment already set up, we started to dive in and learn the inside-outs of how nodejs and WebRTC is run. This is essentially the important factor of our project’s primary goal which is to have an Automatic Speech Recognition (ASR) platform recognize speech and provide automatic captions that serves a video conferencing application.
Literate Review & Research
We did a literature review of several video conferencing applications and their caption setups. This includes Zoom, Google Meet, and MS Teams. Surprisingly enough, all of them have different implementations of how the captions are set up on their platform. There are pros and cons to each caption setup that is seen on their platform. Our research is based on that and try to implement the caption setup in our development environment as the best as possible to work efficiently with the chosen ASR.
Automatic Speech Recognition
Once we were done with the research of various caption setups, we started to work on the 3 ASRs: Google Cloud Platform (GCP), Microsoft Azure (AZ), and IBM Cloud (IBM) and their respective JavaScript (JS) / Linux command line (CLI) programs onto the server that our mentor, Dr. Raja Kushalnagar, had set up. We spent some time studying and figuring out the puzzle pieces to each ASR and their programs both in JS and CLI.
Sample videos
In order to test our development environment similar to a ‘live’ real world video conferencing app, we had to find 4 videos to use as “streams” in the WebRTC application that we are developing. It was easy finding sample videos online that had open source license. With these 4 videos we found online, we set them up to use as a live stream via WebRTC’s method: “Video to Peer Connection”. We need to do this so we can test ASRs and how it translates text into a usable file that can be used to display captions on the screen itself.
The first week itself went well, starting with a lot of research then ended with more hands-on approach with the project codebase, and tinker with ASR Application Programming Interfaces. Continuing a bit of this into the second week.