00:00:00 - 00:00:21
In this lecture, we would look at different approaches to build automatic speech recognition systems. And we would look at in-depth at one type of ASR system which is RNNT based ASR system. In the rest of the lecture, ASR stands for automatic speech recognition and RNNT stands for
00:00:21 - 00:00:42
recurrent neural network transducer based ASR systems. Before we get into the details, a bit about myself, I am Mahavir, I work as a software engineer at Facebook. I have led the development of first productionalizable RNNT based ASR systems at Facebook. My current focus is on enabling personalization of
00:00:42 - 00:01:01
RNNT based ASR systems. For working at Facebook, I have done my Masters from Language Technology Institute at Carnegie Mellon University. So, this lecture will be divided into small modules. We would first look at the task of speech recognition system, where we would look at what is input to
00:01:01 - 00:01:20
ASR and what is an expected output of ASR. We would look at two different paradigms to build ASR systems, modularized and non-modularized paradigms. We would then look at RNNT training. After this, we would look at RNNT greedy decoding and RNNT beam search decoding. There would be supplemental
00:01:20 - 00:01:24
content on personalization of RNNT ASR systems as well in the slides.
00:00:00 - 00:00:48
Different Approaches for ASR & In-Depth Look At RNN-T Based ASR Systems FACEBOOK Al
00:00:48 - 00:01:20
Mahaveer Jain Mahaveer Jain is a Software Engineer at Facebook, Inc. Before this, he was a graduate research assistant at Language Technology Institute at Carnegie Mellon where he finished his master’s in language technologies. Mahaveer has worked extensively on building production ready RNN-T ASR systems at Facebook from scratch. His current focus is to enable contextualization for End2End ASR systems such as RNN-T. His work has been published in leading ASR conferences such as Interspeech, ICASSP etc. Mahaveer enjoys teaching and has given invited tutorial talks at UIUC and CMU on RNN-T ASR Systems. » Lecturer Introduction FACEBOOK Al Georgia
00:01:20 - 00:01:25
Different Approaches for ASR & In-Depth Look At RNN-T Based ASR Systems Task of Automatic Speech Recognition (ASR) System Input And Output Modularized (Hybrid) ASR Non Modularized (End2End) ASR RNN-T Training RNN-T Decoding: Greedy Decoding RNN-T Beam Search Decoding Supplemental Content: Personalization of RNN-T ASR Systems FACEBOOK Al Georgia
00:00:00 - 00:00:21
In this lecture, we would look at different approaches to build automatic speech recognition systems. And we would look at in-depth at one type of ASR system which is RNNT based ASR system. In the rest of the lecture, ASR stands for automatic speech recognition and RNNT stands for
00:00:00 - 00:00:48
Different Approaches for ASR & In-Depth Look At RNN-T Based ASR Systems FACEBOOK Al
00:00:21 - 00:00:42
recurrent neural network transducer based ASR systems. Before we get into the details, a bit about myself, I am Mahavir, I work as a software engineer at Facebook. I have led the development of first productionalizable RNNT based ASR systems at Facebook. My current focus is on enabling personalization of
00:00:42 - 00:01:01
RNNT based ASR systems. For working at Facebook, I have done my Masters from Language Technology Institute at Carnegie Mellon University. So, this lecture will be divided into small modules. We would first look at the task of speech recognition system, where we would look at what is input to
00:00:48 - 00:01:20
Mahaveer Jain Mahaveer Jain is a Software Engineer at Facebook, Inc. Before this, he was a graduate research assistant at Language Technology Institute at Carnegie Mellon where he finished his master’s in language technologies. Mahaveer has worked extensively on building production ready RNN-T ASR systems at Facebook from scratch. His current focus is to enable contextualization for End2End ASR systems such as RNN-T. His work has been published in leading ASR conferences such as Interspeech, ICASSP etc. Mahaveer enjoys teaching and has given invited tutorial talks at UIUC and CMU on RNN-T ASR Systems. » Lecturer Introduction FACEBOOK Al Georgia
00:01:01 - 00:01:20
ASR and what is an expected output of ASR. We would look at two different paradigms to build ASR systems, modularized and non-modularized paradigms. We would then look at RNNT training. After this, we would look at RNNT greedy decoding and RNNT beam search decoding. There would be supplemental
00:01:20 - 00:01:24
content on personalization of RNNT ASR systems as well in the slides.
00:01:20 - 00:01:25
Different Approaches for ASR & In-Depth Look At RNN-T Based ASR Systems Task of Automatic Speech Recognition (ASR) System Input And Output Modularized (Hybrid) ASR Non Modularized (End2End) ASR RNN-T Training RNN-T Decoding: Greedy Decoding RNN-T Beam Search Decoding Supplemental Content: Personalization of RNN-T ASR Systems FACEBOOK Al Georgia