Short Biography

Naoki KIMURA is a 2nd-year Ph.D. student at The University of Tokyo, advised by Prof. Jun Rekimoto and Thad Starner at Georgia Tech. His research focuses on machine learning for human-computer interaction, through 1) silent speech interaction for wearable devices, 2) Deep generative models for enhancing immersive experiences.

PhD fellow@Google, D-CORE@Microsoft Research, ACT-X, PFN intern 2018

Download CV

Education

  • 2019 - Ph.D. student in Applied Computer Science

    The University of Tokyo, Japan.
    Supervisor: Prof. Jun Rekimoto

  • 2017 - 2019 Master of Applied Computer Science

    The University of Tokyo, Japan.
    Supervisor: Prof. Jun Rekimoto

  • 2013 - 2019 Bachelor of Urban Engineering

    The University of Tokyo

Research Projects

SottoVoce: An Ultrasound Imaging-Based Silent Speech Interaction Using Deep Neural Networks

Naoki Kimura, Michinari Kono, Jun Rekimoto

[VIDEO] [DOI] [PDF]

🏅Honorable Mention Award @CHI2019

The availability of digital devices operated by voice is rapidly expanding. However, the usage situation of voice interfaces is still restricted. For example, speaking in public places becomes an annoyance to surrounding people, and secret information should not be uttered. Environmental noise may reduce the accuracy of the speech recognition. To address these limitations, SottoVoce detects a user’s unuttered voice. From internal information observed by an ultrasonic imaging sensor attached to the underside of the jaw, our proposed system recognizes the utterance contents without the user actually uttering voice. Our proposed deep neural network model is used to obtain acoustic features from a sequence of ultrasound images. We confirmed that audio signals generated by our system can control existing smart speakers. We also observed that a user can adjust their oral movement to learn and improve accuracy of their voice recognition.

SonoSpace: Visual Feedback of Timbre with Unsupervised Learning

Naoki Kimura, Keisuke Shiro, Yota Takakura, Hiromi Nakamura, Jun Rekimoto

[VIDEO] [DOI] [PDF]

Accepted as Oral @ACMMM2020

One of the most difficult things in practicing musical instruments is improving timbre. Unlike pitch and rhythm, timbre is a high-dimensional and sensuous concept, and learners cannot evaluate their timbre by themselves. To efficiently improve their timbre control, learners generally need a teacher to provide feedback about timbre. However, hiring teachers is often expensive and sometimes difficult. Our goal is to develop a low-cost learning system that substitutes the teacher. We found that a variational autoencoder (VAE), which is an unsupervised neural network model, provides a 2-dimensional user-friendly mapping of timbre. Our system, SonoSpace, maps the learner’s timbre into a 2D latent space extracted from an advanced player’s performance. Seeing this 2D latent space, the learner can visually grasp the relative distance between their timbre and that of the advanced player. Although our system was evaluated mainly with an alto saxophone, SonoSpace could also be applied to other instruments, such as trumpets, flutes, and drums.

End-to-End Deep Learning Speech Recognition Model for Silent Speech Challenge

Naoki Kimura, Zixiong Su, Takaaki Saeki

[DOI] [PDF]

Show-and-Tell @INTERSPEECH2020

This work is the first attempt to apply an end-to-end, deep neural network-based automatic speech recognition (ASR) pipeline to the Silent Speech Challenge dataset (SSC), which contains synchronized ultrasound images and lip images captured when a single speaker read the TIMIT corpus without uttering audible sounds. In silent speech research using the SSC dataset, established methods in ASR have been utilized with some modifications to use it in visual speech recognition. In this work, we tested the SOTA method of ASR on the SSC dataset using the End-to-End Speech Processing Toolkit, ESPnet. The experimental results show that this end-to-end method achieved a character error rate (CER) of 10.1% and a WER of 20.5% by incorporating SpecAugment, demonstrating the possibility to further improve the performance with additional data collection.

Elicitation of Alternative Pen-Holding Postures for Quick Action Triggers with Suitability for EMG Armband Detection

Fabrice Matulic, Brian Vogel, Naoki Kimura and Daniel Vogel.

[DOI] [PDF]

@ISS2019

In this project we study what alternative ways of gripping a digital pen people might choose to trigger actions and shortcuts in applications (e.g. while holding the pen, extend the pinkie to invoke a menu). We also investigate how well we can recognise these different pen-holding postures using data collected from an EMG armband and deep learning.

ExtVision: Augmentation of Visual Experiences with Generation of Context Images for Peripheral Vision Using Deep Neural Network

Naoki Kimura, Jun Rekimoto

[VIDEO] [DOI]

🏅Honorable Mention Award @CHI2018

We propose a system, called ExtVision, to augment visual experiences by generating and projecting context-images onto the periphery of the television or computer screen. A peripheral projection of the context-image is one of the most effective techniques to enhance visual experiences. However, the projection is not commonly used at present, because of the difficulty in preparing the context-image. In this paper, we propose a deep neural network-based method to generate context-images for peripheral projection. A user study was performed to investigate the manner in which the proposed system augments traditional visual experiences. In addition, we present applications and future prospects of the developed system.

Deep Dive: Deep-Neural-Network-Based Video Extension for Immersive Head-Mounted Display Experiences

Naoki Kimura, Michinari Kono, Jun Rekimoto

[VIDEO]

Accepted @PerDis2019

 

Immersion is an important factor in video experiences. Therefore, various methods and video viewing systems have been proposed. Head-mounted displays (HMDs) are home-friendly pervasive devices, which can provide an immersive video experience owing to their wide field-of-view (FoV) and separation of users from the outside environment. They are often used for viewing panoramic and stereoscopic recorded videos or virtually generated environments, but the demand for viewing standard plane videos with HMDs has increased. However, the theater mode, which restricts the FoV, is basically used for viewing plane videos. Thus, the advantages of HMDs are not fully utilized. Therefore, we explored a method for viewing plane videos by an HMD, in combination with view augmentation by LED implants to the HMD.We have constructed a system for viewing plane videos using an HMD with a deep neural network (DNN) model optimized for generating and extending images for peripheral vision and wide FoV customization. We found that enlarging the original video and extending the video with our DNN model can improve the user experience. However, our method provided more comfortable viewing by preventing motion sickness in a first-person-view video.

Selected Awards

  • 2020 Microsoft Research Asia Fellowship

    Fellowship

  • 2019 Google PhD Fellowship

    Fellowship

  • 2019 Best Paper Honorable Mention Award @ CHI2019

    SottoVoce: An Ultrasound Imaging-Based Silent Speech Interaction Using Deep Neural Networks

  • 2018 Best Paper Honorable Mention Award @ CHI2018

    ExtVision: Augmentation of Visual Experiences with Generation of Context Images for Peripheral Vision Using Deep Neural Network

  • 2019 Best Master Thesis Award @ The University Of Tokyo

    The content goes here...

  • 2019 UTokyo - TOYOTA Study Abroad Scholarships (5,000,000 Yen)

    The content goes here...

  • 2019 Nominee of President’s Award of the University of Tokyo

    The content goes here...

  • 2019 KUMA FOUNDATION Creator Scholarship (1,200,000 Yen)

    The content goes here...

  • 2019 TOYOTA/Dwango AI Scholarship (1,200,000 yen)

    The content goes here...

  • 2018 TOYOTA/Dwango AI Scholarship (1,200,000 yen)

    The content goes here...

  • 2018 37th and 38th Leave a Nest Research Awards (1,500,000 yen)

    The content goes here...

Selected Publications

SottoVoce: An Ultrasound Imaging-Based Silent Speech Interaction Using Deep Neural Networks

In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, 2019
Kimura, Naoki and Kono, Michinari and Rekimoto, Jun
Publisher's website

Deep Dive: Deep-neural-network-based Video Extension for Immersive Head-mounted Display Experiences

In Proceedings of the 8th ACM International Symposium on Pervasive Displays, 2019
Kimura, Naoki and Kono, Michinari and Rekimoto, Jun
Publisher's website

ExtVision: Augmentation of Visual Experiences with Generation of Context Images for a Peripheral Vision Using Deep Neural Network

In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, 2018
Kimura, Naoki and Rekimoto, Jun
Publisher's website

没入感拡張システムのためのpix2pixを用いた周辺視野映像生成

25th Workshop on Interactive Systems and Software, 山梨, 日本, (査読あり, 採択率37%)
木村 直紀, 暦本純一

SottoVoce: 超音波画像と深層学習による無発声音声インタラクション

情報処理学会 インタラクション2019, page 82-91, 東京, 日本, 2月, 2019年 (査読あり,採択率約40%)
暦本純一, 木村直紀, 河野通就.
Publisher's website

I am text block. Click edit button to change this text. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.