Difference between revisions of "Speech recognition"

From ArchWiki
Jump to: navigation, search
m (VEDICS: added link)
(some style improvements)
Line 1: Line 1:
[[Category:Software  (English)]]
+
[[Category:Accessibility (English)]]
 +
[[Category:Audio/Video (English)]]
 +
{{i18n|Speech Recognition}}
  
=Overview=
 
 
Speech recognition is any means by which you can interface with your computer via spoken word.  This page is designed to identify applications that can facilitate speech recognition and to serve as a guide in installing and using this software in Arch.   
 
Speech recognition is any means by which you can interface with your computer via spoken word.  This page is designed to identify applications that can facilitate speech recognition and to serve as a guide in installing and using this software in Arch.   
  
 
'''A note to newcomers:'''  Speech recognition is something that has traditionally not been well supported in Linux.  If you become interested and choose to dig below the immediate surface, you can expect difficulty in finding documentation or help from the community.   
 
'''A note to newcomers:'''  Speech recognition is something that has traditionally not been well supported in Linux.  If you become interested and choose to dig below the immediate surface, you can expect difficulty in finding documentation or help from the community.   
  
===Types of Speech Recognition===
+
==Types of Speech Recognition==
 
Speech recognition can mean several things:
 
Speech recognition can mean several things:
 
* Text-To-Speech:
 
* Text-To-Speech:
Line 15: Line 16:
 
*:Full dictation/recognition software allows the user to read full sentences or paragraphs and translates that data into text on the fly.  This could be used, for instance, to dictate an entire letter into the window of an email client.  By far the most difficult aspect of speech recognition and this sort of software is not as easy/readily-available in linux.  In some cases, these types of applications need to be trained to your voice - several even improve in accuracy the more you use them.   
 
*:Full dictation/recognition software allows the user to read full sentences or paragraphs and translates that data into text on the fly.  This could be used, for instance, to dictate an entire letter into the window of an email client.  By far the most difficult aspect of speech recognition and this sort of software is not as easy/readily-available in linux.  In some cases, these types of applications need to be trained to your voice - several even improve in accuracy the more you use them.   
  
===Development Status===
+
==Development Status==
 
Several years ago there was a push to implement speech recognition in Linux.  Since then, many of those projects have stagnated.   
 
Several years ago there was a push to implement speech recognition in Linux.  Since then, many of those projects have stagnated.   
  
=Text-To-Speech=
+
==Text-To-Speech==
 
The two major players in text-to-speech applications are Festival and eSpeak.  Comparison available [http://braille.uwo.ca/pipermail/speakup/2008-July/046755.html here]
 
The two major players in text-to-speech applications are Festival and eSpeak.  Comparison available [http://braille.uwo.ca/pipermail/speakup/2008-July/046755.html here]
  
Line 41: Line 42:
 
[[mbrola]] is a '''non-free''' phonemes-to-audio program.
 
[[mbrola]] is a '''non-free''' phonemes-to-audio program.
  
=Voiced Commands=
+
==Voiced Commands==
 
===Gnome-Voice-Control===
 
===Gnome-Voice-Control===
 
In AUR
 
In AUR
Line 76: Line 77:
  
  
=Speech Recognition=
+
==Speech Recognition==
==Free Speech Recognition Engines==
+
===Free Speech Recognition Engines===
===CMU Sphinx===
+
====CMU Sphinx====
===Simon===
+
====Simon====
===Julius===
+
====Julius====
===XVoice===
+
====XVoice====
===ViaVoice===
+
====ViaVoice====
===sphinxkeys===
+
====sphinxkeys====
===VoxForge===
+
====VoxForge====
  
==Proprietary Speech Recognition Engines==
+
===Proprietary Speech Recognition Engines===
===Dragon Naturally Speaking in Wine===
+
====Dragon Naturally Speaking in Wine====
 
[http://thenerdshow.com/platypus.html Platypus Project]
 
[http://thenerdshow.com/platypus.html Platypus Project]
===Wizzscribe SI===
+
====Wizzscribe SI====
===Verbio ASR===
+
====Verbio ASR====
===DynaSpeak from SRI International===
+
====DynaSpeak from SRI International====
===LumenVox Speech Engine===
+
====LumenVox Speech Engine====

Revision as of 12:05, 8 February 2012

This template has only maintenance purposes. For linking to local translations please use interlanguage links, see Help:i18n#Interlanguage links.


Local languages: Català – Dansk – English – Español – Esperanto – Hrvatski – Indonesia – Italiano – Lietuviškai – Magyar – Nederlands – Norsk Bokmål – Polski – Português – Slovenský – Česky – Ελληνικά – Български – Русский – Српски – Українська – עברית – العربية – ไทย – 日本語 – 正體中文 – 简体中文 – 한국어


External languages (all articles in these languages should be moved to the external wiki): Deutsch – Français – Română – Suomi – Svenska – Tiếng Việt – Türkçe – فارسی

Speech recognition is any means by which you can interface with your computer via spoken word. This page is designed to identify applications that can facilitate speech recognition and to serve as a guide in installing and using this software in Arch.

A note to newcomers: Speech recognition is something that has traditionally not been well supported in Linux. If you become interested and choose to dig below the immediate surface, you can expect difficulty in finding documentation or help from the community.

Types of Speech Recognition

Speech recognition can mean several things:

  • Text-To-Speech:
    As it sounds, Text-To-Speech (or TTS) will turn a string of text into an audio clip. There are several programs available that perform TTS, some of which are command-line based (ideal for scripting) and others which provide a handy GUI.
  • Simple Voice Control/Commands:
    This is the most basic form of Speech-To-Text application. These are designed to recognize a small number of specific, typically one-word commands and then perform an action. This is often used as an alternative to an application launcher, allowing the user for instance to say the word “firefox” and have his OS open a new browser window.
  • Full dictation/recognition:
    Full dictation/recognition software allows the user to read full sentences or paragraphs and translates that data into text on the fly. This could be used, for instance, to dictate an entire letter into the window of an email client. By far the most difficult aspect of speech recognition and this sort of software is not as easy/readily-available in linux. In some cases, these types of applications need to be trained to your voice - several even improve in accuracy the more you use them.

Development Status

Several years ago there was a push to implement speech recognition in Linux. Since then, many of those projects have stagnated.

Text-To-Speech

The two major players in text-to-speech applications are Festival and eSpeak. Comparison available here

Festival

Festival offers a general framework for building speech synthesis systems as well as including examples of various modules. As a whole it offers full text to speech through a number APIs: from shell level, though a Scheme command interpreter, as a C++ library, from Java, and an Emacs interface. Festival is multi-lingual (currently English (British and American), and Spanish) though English is the most advanced.

  • Free
  • Can install several different voices/accents.
  • Available in Extra

Site Link

eSpeak

eSpeak is "a compact open source software speech synthesizer for English and other languages, for Linux and Windows".

  • Open source
  • Lightweight
  • Available in the community repository
  • Excellent language support

mbrola

mbrola is a non-free phonemes-to-audio program.

Voiced Commands

Gnome-Voice-Control

In AUR

VEDICS

VEDICS (Voice Enabled Desktop Interaction and Control System) is an assistive software which lets the user to interact with the OS using voice commands.

Note: Not tested in Arch

Site Link

Features:

  1. Perform common window operations like close, minimize, maximize etc.
  2. Invoke default applications like browsers, mail clients etc.
  3. Access any element on the desktop just by saying its name.
  4. Supports GNOME3, GNOME2

Perlbox-Voice

Perlbox Voice is an voice enabled application to bring your desktop under your command.

Note:

  • Last updated in 2005
  • Package is in AUR, but missing festival-don dependency.

Site Link

Features:

  1. Text to speech (Thanks to the Festival speech synthesizer)
  2. Voice control to open user specified applications. For example, if you say "Web", the Perlbox-Voice Control will open the browser of your choice.
  3. Desktop plugins to control your Linux desktop using only your voice. You can switch virtual screens, cycle through desktops, invoke the run dialog, quick lock the screen.
  4. Custom commands are fully supported, and you can add commands on the fly.
  5. Pseudo Commands' allow you to enter commands that the speaker should say. For example, if you say "Good morning", the computer voice could say "And good morning to you".


Speech Recognition

Free Speech Recognition Engines

CMU Sphinx

Simon

Julius

XVoice

ViaVoice

sphinxkeys

VoxForge

Proprietary Speech Recognition Engines

Dragon Naturally Speaking in Wine

Platypus Project

Wizzscribe SI

Verbio ASR

DynaSpeak from SRI International

LumenVox Speech Engine