Name: | Voice technologies for support of information society |
Sponsor: | Czech Science Foundation |
Principal investigator: | Assoc. Prof. Pavel Sovka, Ph.D. |
Co-principal investigator: | Petr Horák, Ph.D. |
From: | 2002-01-01 | |
To: | 2004-12-31 |
The project is oriented on the theoretical research leading to the next applications:
- Telecommunications: information systems for telephone services in fix and mobile nets, information systems for transport, for databases mining by telephone, to the use of mobile telephones in noisy environment, like running cars.
- Multimedia using computers, interactive systems for the information records, the interactive education, and also, e.g. for the automatic system generating subtitles in TV.
- Command systems for voice control of apparatus, voice control of different functions in a car, voice control of robots.
- The supporting systems for handicapped people. Improvements of cochlear implants for hearing impaired. Speech enhancement for handicapped.
The project connects main institutions that are engaged in the field in the Czech Republic. The project has the interdisciplinary character and uses the results worked out by exact sciences, and arts. The results are applied in the technical disciplines and in the technology. The research is connected with the Czech language. So, the results are not possible to “buy”.
Goals:
In the last period great attention is paid on high naturalness of synthetic speech also from the point of view of prosody. Different voices are required for various types of announcements etc. This trend forces to the development of a TTS system, which is capable to synthesize speech with more voices. Voice transformation in the spectral and time domain appears as an effective and elegant solution both for transformation in the segmental and suprasegmental domain. As a convenient speech coding approach for this system solution concatenation of speech elements modeled in the spectral domain is generally considered.
As the result of the analysis of state of art the proposed solution is as follows:
- Construction of a new speech production model with finite impulse response based on homomorphic signal processing, convenient for male, female and children’s voice modelling.
- Male, female and children’s voice modelling and voice transformation.
- Development of an automatic segmentation algorithm for prosodic database labelling.
- Collection of a new prosodic database (in cooperation with Faculty of Arts Charles University) for research in prosodic modelling. Into the database bounds of suprasegmental segments will be automatically inserted and subsequently manually corrected.
- Construction of prosody generation models for different speaking styles and several speakers, based on homomorphic signal processing and linear prediction.
- Formulation of prosodic rules for Czech language based on developed prosody generation models, for different speaking styles included in the new prosodic database.
- Implementation of the TTS systems with high naturalness of synthetic speech into telecommunication and information systems.