Reconocimiento de voz basado en MFCC, SBC y Espectrogramas

Guillermo Arturo Martínez Mascorro; Gualberto Aguilar Torres

doi:10.17163/ings.n10.2013.02

PDF (Spanish)

Published: 2013-12-30

DOI: https://doi.org/10.17163/ings.n10.2013.02

Keywords:

Speech recognition with voice changes, Mel Frequency Cepstral Coefficients, Subband-Based Cepstral Parameters, Spectrogram, Support Vector Machine.

Guillermo Arturo Martínez Mascorro

Gualberto Aguilar Torres

Abstract

One of the problems of the Automatic Speech Recognition systems is the voice’s changes. Typically, a person can have voluntary and involuntary voice’s changes and the system can get confused in these cases, also the changes could be natural and artificial. This paper proposes and recognition system with a parallel identification, using three different algorithms: MFCC, SBC and Spectrogram. Using a Support Vector Machine as a classifier, every algorithm gives a group of persons with the highest likelihood and, after an evaluation, the result is obtained. The aim of this paper is to take advantage of the three algorithms.

Issue

No. 10 (2013): July / December

Section

Scientific Paper

The Universidad Politécnica Salesiana of Ecuador preserves the copyrights of the published works and will favor the reuse of the works. The works are published in the electronic edition of the journal under a Creative Commons Attribution/Noncommercial-No Derivative Works 4.0 Ecuador license: they can be copied, used, disseminated, transmitted and publicly displayed.

The undersigned author partially transfers the copyrights of this work to the Universidad Politécnica Salesiana of Ecuador for printed editions.

It is also stated that they have respected the ethical principles of research and are free from any conflict of interest. The author(s) certify that this work has not been published, nor is it under consideration for publication in any other journal or editorial work.

The author (s) are responsible for their content and have contributed to the conception, design and completion of the work, analysis and interpretation of data, and to have participated in the writing of the text and its revisions, as well as in the approval of the version which is finally referred to as an attachment.

Author Biographies

Guillermo Arturo Martínez Mascorro

Ingeniero en Electrónica, Estudiante de la Maestría en Ciencias de Ingeniería en Microelectrónica, Instituto Politécnico Nacional, México DF, México

Gualberto Aguilar Torres

Doctor en Ciencias en Comunicaciones y Electrónica, Maestro en Ciencias de Ingeniería en Microelectrónica, Ingeniero en Comunicaciones y Electrónica, Docente del Instituto Politécnico Nacional en la Sección de Estudios de Posgrado e Investigación de la ESIME Culhuacán, México DF, México.

References

I. Mporas, T. Ganchev, M. Siafarikas, and N. Fako- takis, “Comparison of speech features on the speech recognition task,” Journal of Computer Science, vol. 3, no. 8, pp. 608–616, 2007.

B. Logan, “Mel frequency cepstral coefficients for music modeling.” in International Symposium on Music Information Retrieval, 2000.

R. Sarikaya and J. H. Hansen, “High resolution speech feature parametrization for monophone- based stressed speech recognition,” Signal Process- ing Letters, IEEE, vol. 7, no. 7, pp. 182–185, 2000.

G. A. Martínez and G. Aguilar, “Sistema para identificación de hablantes robusto a cambios en la voz,” Ingenius, no. 8, pp. 45–53, 2012.

T. Acharya and A. K. Ray, Image processing: prin- ciples and applications. Wiley, 2005.

R. Solera-Urena, J. Padrell-Sendra, D. Martín- Iglesias, A. Gallardo-Antolín, C. Peláez-Moreno, and F. Díaz-De-María, “Svms for automatic speech recognition: a survey,” Progress in nonlinear speech processing, pp. 190–216, 2007.

Article Sidebar

Main Article Content

Abstract

Article Details

Guillermo Arturo Martínez Mascorro

Gualberto Aguilar Torres

References