Postdoctoral position in New Techniques for Vision-assisted Speech Processing - [ Postdoc ]

Workplace: , CTNSC@UniFe
Added on: 02/02/2016 - Expires on: 20/10/2016

The Centre for Translational Neurophysiology is looking for 1 post-doc who will work within the H2020 EcoMode project (“Event-Driven Compressive Vision for Multimodal Interaction with Mobile Devices”), funded by the European Commission with grant agreement n. 644096.

Job Description: Robust automatic speech detection and recognition for human-robot  interaction in many realistic environments (where speech is typically noisy and distant) and settings (where the robot must be continuously able to detect verbal commands from non-verbal audio streams) are  still challenging tasks. Vision can be used to increase speech recognition robustness by adding complementary speech-production related information. In this project visual information will be provided by an event-driven (ED) camera. ED vision sensors transmit information as soon as a change occurs in their visual field, achieving very high temporal resolution, coupled with extremely low data rate and automatic segmentation of significant events. In an audio-visual speech recognition setting ED vision can not only provide new additional visual information to the speech recognizer, but can also drive the temporal processing of speech by locating (in the temporal dimension) visual events related to speech production landmarks.

The goal of the proposed research is the exploitation of highly dynamical information from ED vision sensors for robust speech detection and processing. The temporal information provided by EDC sensors will allow to experiment with new audio-visual techniques for voice activity detection and new models of speech temporal dynamics based on events as opposed to the typical fixed-length segments (i.e. frames).

In this context, we are looking for a highly motivated Post-doc who will work on speech processing. The post-doc will mainly develop a novel speech recognition system based on visual, acoustic and (recovered) articulatory features (i.e., features describing the inner vocal tract), that will be targeted for users with mild speech impairments. The temporal information provided by EDC sensors will allow to experiment with new strategies to model the temporal dynamics of normal and atypical speech. The main outcomes of the project will be: (i) a fast, featherweight audio-visual voice detection system combined with (ii) a computationally efficient audio-visual recognition system that robustly recognizes the most relevant commands (key phrases) delivered by users to devices in real-word usage scenarios.

The resulting methods for improving speech detection and recognition will be exploited for the implementation of a tablet with robust speech processing. Given the automatic adaptation of the speech processing to the speech production rhythm, the speech recognition system will target speakers with mild speech impairments, specifically subjects with atypical speech flow and rhythm, typical of some disabilities and of the ageing population. The same approach will then be applied to the humanoid robot iCub to improve its interaction with humans in cooperative tasks.

We are looking for highly motivated people and inquisitive minds with the curiosity to use a new and challenging technology that requires a rethinking of audio-visual speech processing to achieve a high payoff in terms of speed, efficiency and robustness.

The ideal candidates also have the following additional skills:

  • PhD in Computer Science, Robotics, Engineering (or equivalent) with a background in machine learning, signal processing or related areas;
  • Ability to analyze, improve and propose new algorithms;
  • Good knowledge of C, C++ programming languages with proven experience.

Team-work, PhD tutoring and general lab-related activities are expected.

An internationally competitive salary depending on experience will be offered.

Pleasesubmit CV, list of publications, 2 reference letters and a statement of research interest to quoting “Postdoctoral position in New techniques for vision-assisted speech processing BC: 69724” in the subject line.

Please apply by October 20, 2016.


Lichtsteiner, P., Posch, C., & Delbruck, T. (2008). A 128×128 120 dB 15 μs latency asynchronous temporal contrast vision sensor. Solid-State Circuits, IEEE Journal of, 43(2), 566-576.

Rea, F., Metta, G., & Bartolozzi, C. (2013). Event-driven visual attention for the humanoid robot iCub. Frontiers in neuroscience, 7.

Benosman, R.; Clercq, C.; Lagorce, X.; Sio-Hoi Ieng; Bartolozzi, C., (2014) "Event-Based Visual Flow," Neural Networks and Learning Systems, IEEE Transactions on , vol.25, no.2, pp.407,417, Feb. 2014, doi: 10.1109/TNNLS.2013.2273537

 Potamianos, G. Neti, C.  Gravier, G. Garg, A. and Senior, A.W. (2003)  “Recent Advances in the Automatic Recognition of Audiovisual Speech” in Proceedings of the IEEE Vol. 91 pp. 1306-1326 

Sodoyer, D. Rivet, B. Girin, L. Savariaux, C. Schwartz, J.L. and Jutten, C. (2009) “A study of lip movements during spontaneous dialog and its application to voice activity detection”. J Acoust Soc Am. Vol 125(2):1184-96. doi: 10.1121/1.3050257.

Glass, J. (2003)“A probabilistic framework for segment-based speech recognition”, Computer Speech and Language, vol. 17, pp. 137-152.

Badino, L., Canevari, C., Fadiga, L., Metta, G. (2012) "Deep-Level Acoustic-to-Articulatory Mapping for DBN-HMM Based Phone Recognition", in IEEE SLT 2012, Miami, Florida, 2012

Badino, L., Canevari, C., Fadiga, L., Metta, G. (2016) " Integrating articulatory data in deep neural network-based acoustic modeling ", Computer Speech and Language, vol 36, pp. 173–195.

Istituto Italiano di Tecnologia ( is a private Foundation with the objective of promoting Italy's technological development and higher education in science and technology. Research at IIT is carried out in highly innovative scientific fields with state-of-the-art technology.

In order to comply with Italian law (art. 23 of Privacy Law of the Italian Legislative Decree n. 196/03), the candidate is kindly asked to give his/her consent to allow IIT to process his/her personal data. We inform you that the information you provide will be solely used for the purpose of assessing your professional profile to meet the requirements of IIT. Your data will be processed by IIT, with its headquarters in Genoa, Via Morego, 30, acting as the Data Holder, using computer and paper-based means, observing the rules on the protection of personal data, including those relating to the security of data.

Please also note that, pursuant to art. 7 of Legislative Decree 196/2003, you may exercise your rights at any time as a party concerned by contacting the Data Manager.

Istituto Italiano di Tecnologia is an Equal Opportunity Employer that actively seeks diversity in the workforce.


IIT's website uses the following types of cookies: browsing/session, analytics, functional and third party cookies. Users can choose whether or not to accept the use of cookies and access the website.
By clicking on further information, the full information notice on the types of cookies used will be displayed and you will be able to choose whether or not to accept cookies whilst browsing on the website.
Further information
Accept and close

I numeri di IIT

L’Istituto Italiano di Tecnologia (IIT) è una fondazione di diritto privato - cfr. determinazione Corte dei Conti 23/2015 “IIT è una fondazione da inquadrare fra gli organismi di diritto pubblico con la scelta di un modello di organizzazione di diritto privato per rispondere all’esigenza di assicurare procedure più snelle nella selezione non solo nell’ambito nazionale dei collaboratori, scienziati e ricercatori ”.

IIT è sotto la vigilanza del Ministero dell'Istruzione, dell'Università e della Ricerca e del Ministero dell'Economia e delle Finanze ed è stato istituito con la Legge 326/2003. La Fondazione ha l'obiettivo di promuovere l'eccellenza nella ricerca di base e in quella applicata e di favorire lo sviluppo del sistema economico nazionale. La costruzione dei laboratori iniziata nel 2006 si è conclusa nel 2009.

Lo staff complessivo di IIT conta circa 1440 persone. L’area scientifica è rappresentata da circa l’85% del personale. Il 45% dei ricercatori proviene dall’estero: di questi, il 29% è costituito da stranieri provenienti da oltre 50 Paesi e il 16% da italiani rientrati. Oggi il personale scientifico è composto da circa 60 principal investigators, circa 110 ricercatori e tecnologi di staff, circa 350 post doc, circa 500 studenti di dottorato e borsisti, circa 130 tecnici. Oltre 330 posti su 1400 creati su fondi esterni. Età media 34 anni. 41% donne / 59 % uomini.

Nel 2015 IIT ha ricevuto finanziamenti pubblici per circa 96 milioni di euro (80% del budget), conseguendo fondi esterni per 22 milioni di euro (20% budget) provenienti da 18 progetti europei17 finanziamenti da istituzioni nazionali e internazionali, circa 60 progetti industriali

La produzione di IIT ad oggi vanta circa 6990 pubblicazioni, oltre 130 finanziamenti Europei e 11 ERC, più di 350 domande di brevetto attive, oltre 12 start up costituite e altrettante in fase di lancio. Dal 2009 l’attività scientifica è stata ulteriormente rafforzata con la creazione di dieci centri di ricerca nel territorio nazionale (a Torino, Milano, Trento, Parma, Roma, Pisa, Napoli, Lecce, Ferrara) e internazionale (MIT ed Harvard negli USA) che, unitamente al Laboratorio Centrale di Genova, sviluppano i programmi di ricerca del piano scientifico 2015-2017.

IIT: the numbers

Istituto Italiano di Tecnologia (IIT) is a public research institute that adopts the organizational model of a private law foundation. IIT is overseen by Ministero dell'Istruzione, dell'Università e della Ricerca and Ministero dell'Economia e delle Finanze (the Italian Ministries of Education, Economy and Finance).  The Institute was set up according to Italian law 326/2003 with the objective of promoting excellence in basic and applied research andfostering Italy’s economic development. Construction of the Laboratories started in 2006 and finished in 2009.

IIT has an overall staff of about 1,440 people. The scientific staff covers about 85% of the total. Out of 45% of researchers coming from abroad 29% are foreigners coming from more than 50 countries and 16% are returned Italians. The scientific staff currently consists of approximately 60 Principal Investigators110 researchers and technologists350 post-docs and 500 PhD students and grant holders and 130 technicians. External funding has allowed the creation of more than 330 positions . The average age is 34 and the gender balance proportion  is 41% female against 59% male.

In 2015 IIT received 96 million euros in public funding (accounting for 80% of its budget) and obtained 22 million euros in external funding (accounting for 20% of its budget). External funding comes from 18 European Projects, other 17 national and international competitive projects and approximately 60 industrial projects.

So far IIT accounts for: about 6990 publications, more than 130 European grants and 11 ERC grants, more than 350 patents or patent applications12 up start-ups and as many  which are about to be launched. The Institute’s scientific activity has been further strengthened since 2009 with the establishment of 11 research nodes throughout Italy (Torino, Milano, Trento, Parma, Roma, Pisa, Napoli, Lecce, Ferrara) and abroad (MIT and Harvard University, USA), which, along with the Genoa-based Central Lab, implement the research programs included in the 2015-2017 Strategic Plan.