US3828132A - Speech synthesis by concatenation of formant encoded words - Google Patents

Speech synthesis by concatenation of formant encoded words Download PDF

Info

Publication number
US3828132A
US3828132A US00085660A US8566070A US3828132A US 3828132 A US3828132 A US 3828132A US 00085660 A US00085660 A US 00085660A US 8566070 A US8566070 A US 8566070A US 3828132 A US3828132 A US 3828132A
Authority
US
United States
Prior art keywords
word
words
message
descriptions
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
US00085660A
Inventor
J Flanagan
L Rabiner
R Schafer
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
AT&T Corp
Original Assignee
Bell Telephone Laboratories Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bell Telephone Laboratories Inc filed Critical Bell Telephone Laboratories Inc
Priority to US00085660A priority Critical patent/US3828132A/en
Priority to CA107,266A priority patent/CA941968A/en
Priority to DE2115258A priority patent/DE2115258C3/en
Priority to JP1928771A priority patent/JPS539041B1/ja
Application granted granted Critical
Publication of US3828132A publication Critical patent/US3828132A/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/06Elementary speech units used in speech synthesisers; Concatenation rules
    • G10L13/07Concatenation rules

Definitions

  • ABSTRACT Audio response units that select speech sounds, stored in analog or coded digital form, as the excitation for a speech synthesizer are widely used, for example in telephone audio announcement terminals. The speech produced by most units is noticeably artifical and mechanical sounding.
  • human speech is analyzed in terms of formant structure and coded for storage in the unit.
  • a stored program assembles them into a complete utterance, taking into account the durations of the words in the context of the complete utterance, pitch variations common to the language, and transitions between voiced portions of the speech. The result is a more natural sounding synthetic utterance.
  • PHYONEMES DIGIT POSITION 3 450 500 560 6IO [)l g fl 4 260 300 340 380 SEQUENCE 5 340 370 4IO 440 TIME (m SEC.)
  • FIG. 4 F3 IG 5 IFP sow ThlS mon PATENTED NIB SHEET '4 BF 6
  • FIG. 6A
  • This invention relates to the synthesis of limited context messages from stored data, and more particularly to processing techniques for assembling stored information into an appropriate specification for energizing a speech synthesizer.
  • the response to the question is eventually provided in the form of complete spoken utterances.
  • Speech generated by the system would be as intelligible as natural speech. Indeed, the possibility exists that it might be made more intelligible than natural speech. It need not, however, sound like any particular human and may even be permitted to have a machine accent.
  • individual speech sounds may be stored in the form of phoneme specifications. Such specifications can be called out of storage in accordance with word and message assembly rules and used to energize a speech synthesizer.
  • speech at the acoustic level is not particularly discrete. Articulations of adjacent phonemes interact, and transient movements of the vocal tract in the production of any phoneme last much longer than the average duration of the phoneme. That is, the articulatory gestures overlap and are superimposed on one another. Hence, transient motions of the vocal tract are perceptually important. Moreover, much information about the identity of a constant is carried, not by the spectral shape at the steady-state time of the consonant but by its dynamic interactions with adjacent phonemes.
  • Speech synthesis therefore, is strongly concerned with dynamics.
  • a synthesizer must reproduce not only the characteristics of sounds when they most nearly represent the ideal of each phoneme, but also the dynamics of vocaltract motion as it progresses from one phoneme to another. This fact highlights a difference between speech synthesis from word or phrase storage and synthesis from more elementary speech units. If the library of speech elements is a small number of short units, such as phonemes, the linking procedures approach the complexity of the vocal tract itself. Conversely, if the library of speech elements is a much larger number of longer segments of speech, such as words or phrases, the elements can be linked together at points in the message where information in transients is minimal.
  • formant frequencies i.e., the time course of the relevant parameters
  • a steady-state sound can be lengthened or shortened, and even the entire utterance can be speeded up, or slowed down with little or no loss in intelligibility.
  • Formants can be locally distorted, and the entire formant contour can be uniformly raised, or lowered, to alter voice quality.
  • word length formant data are accessed and concatenated to form complete formant functions for the desired utterance.
  • the formant functions are interpolated in accordance with spectral derivatives to establish contours which define smooth transitions between words. Speech contour and word duration data are calculated according to stored rules.
  • concatenated formant functions are used to synthesize a waveform which approximates a naturally spoken message.
  • economy in storage is achieved because formant and excitation parameters change relatively slowly and can be specified by fewer binary numbers per second (bits) than can, for example, the speech waveform.
  • FIG. 1 illustrates schematically a suitable arrangement in accordance with the invention for synthesizing message-length utterances upon command
  • FIG. 2 illustrates the manner of overlapping individual word formants, in accordance with the invention, for four different combinations of words
  • FIG. 3 illustrates timing data which may be used for processing formant data
  • FIG. 4 illustrates the processing of voiced formant data for individual words to produce a concatenated formant structure useful for actuating a speech synthesizer
  • FIG. 5 illustrates the processing of both voiced and fricative formant data for individual words to produce a concatenated formant structure useful for actuating a speech synthesizer
  • FIGS. 6A, 6B and 6C illustrate by way of a flow chart the operations employed in accordance with the invention, for processing parametric data and for concatenating these data to produce a complete set of control signals for energizing a'formant speech synthesizer.
  • FIG. I A system for synthesizing speech by the concatenation of formant encoded words, in accordance with the invention, is illustrated schematically in FIG. I.
  • Isolated words spoken by a human being are analyzed to estimate the parameters required for synthesis.
  • system 10 which may include either studio generated or recorded words
  • the individual words, in whatever format, are supplied to speech analyzer 12, wherein individual formants, amplitudes, pitch period designations, and fricative pole and zero identifications are developed at the Nyquist rate.
  • speech analyzer 12 A suitable speech analyzer is described in detail in a copending application of Rabiner-Schafer, Ser. No.
  • analyzer 12 includes individual channels, including analyser 13 for identifying formant (voiced) frequencies F F F analyzer 14 for developing a pitch period signal P, analyzer 15 for developing buzz, A and hiss, A level control signals, and analyzer 16 for developing fricative (unvoiced) pole and zero signals, Fp and F
  • control parameter values are delivered to parametric description storage unit 17, which may take any desired form. Both analog and digital stores, which may be accessed upon command, are known in the art.
  • storage unit 17 constitutes a word catalog which may be referenced by the word concatenation portion of the system. The parameter values maintained in catalog 17 may be revised from time to time by the addition or deletion of new words.
  • INPUT COMMAND An input command from word sequence input 18 initiates the necessary operations to synthesize a message composed of words from catalog 17.
  • the exact form of input 18 depends upon the particular application of the word synthesis system. Typically, an inquiry of some form is made to the system embodied by unit 18, the necessary data for a response is formulated, and the appropriate word designations for the response, for example, in the English language, are assembled in code language and delivered to the synthesis system as the output signal of unit 18.
  • Such response units are known to those skilled in the art and are described in various patents and publications.
  • the output developed by such a responsive unit may thus be in the form of machine code language, phoneme or other linguistic symbols, or the like. Whatever the form of the output signal, it is delivered, in accordance with this invention, to word processing System 20, wherein required word data is assembled, processed, and delivered to speech synthesizer 26.
  • Processor 20 employs separate strategies for handling the segmental features of the message, such as formant frequencies, unvoiced pole and zero frequencies and amplitudes, and the prosodic features, such as timing and pitch.
  • Program strategy for treating the segmental features is self-stored in the processor.
  • the prosodic feature information needed for processing is derived in or is supplied to processor 20. It is this flexibility in manipulating formant-coded speech that permits the breaking of the synthesis problem into two parts.
  • Timing information may be derived in one of several ways.
  • the timing rules need be nothing more complicated than a table specifying word duration as a function of position in an input string of data and as a function of the number of phonemes per word.
  • Timing data for a typical seven number digit string is illustrated in the table of FIG. 3 and is normally stored in timing unit 22.
  • word duration is determined from rules which take into account the syntax of the specific message to be produced, i.e., rules based on models of the English language. Such data also is stored in timing store 22. It is also possible to specify the duration of each word in the input string to be synthesized from external timing data supplied from unit 23.
  • word duration is chosen according to some external criterion, for example, or measured from a naturally spoken version of the message to be synthesized, and is not necessarily a typical duration for that word, independent of context.
  • external timing data may be supplied from stored data or from real time adjustments made during synthe- SIS.
  • PITCH DATA Synthesis also requires the determination of the appropriate pitch contour, i.e., pitch period as a function of time, for the message being synthesized.
  • Pitch information can be obtained in several ways. For example, the pitch character of the original sequence of spoken words may be measured. Alternatively, a monotone or an arbitrarily shaped contour may be used. However,
  • pitch data stored in unit 24 are supplied to concatenating processor 21 wherein the contour is locally lengthened or shortened as required by ;the specific utterance timing as specified by the timing data.
  • pitch variation data may be supplied from external source 25, either in the form of auxiliary stored data, or as real time input data.
  • a pitch contour extracted from a naturally spoken version of the message may be used. Such data would normally be used when word durations have been obtained in a similar manner, i.e., from external timing unit 23.
  • Pitch and timing information obtained externally in this manner provide the most natural sounding synthesized speech. It is also possible to calculate pitch contour information by rule. Thus, there are many ways in which the prosodic information for a message can be obtained, and the choice depends strongly on the desired quality of the synthetic speech and the specific application for which it is to be used.
  • WORD DURATION ADJUSTMENT Once the timing pattern for the message is established, isolated words in word catalog 17 can be withdrawn and altered to match the specified timing.
  • formant data for a word in the catalog may be either lengthened or shortened.
  • the formant contours for successive voiced words are smoothly connected together to form continuous transitions and continuous formant contours for the message.
  • the choice of place in a word to alter duration is based on the dynamics of the formant contours. For each subinterval of a voiced sound, typically msec in duration, a measure of the rate of change of formant contours is computed in processor 21. This measure is called the spectral derivative.
  • an appropriate number of 10 msec intervals are deleted in the region of the smallest spectral derivative.
  • the region of the lowest spectral derivative is lengthened by adding an appropraite number of 10 msec intervals. Unvoiced regions of words are never modified.
  • the measure of spectral derivative, SD,- is calculated where i(1,2, is the i'" 10 msec interval and F,(i) is the value of the j"' formant in the i"' time interval.
  • i(1,2, is the i'" 10 msec interval
  • F,(i) is the value of the j"' formant in the i"' time interval.
  • this 100 msec region is shared by the two words; hence 50 msec (5 intervals) are allotted to each word separately in terms of the overall timing.
  • the technique by which the W additional 10 msec intervals are inserted, or removed, is based entirely on the spectral derivative measurement. As noted above, for each 10 msec voiced interval of the isolated word, the spectral derivative is calculated. To shorten a word, the W intervals having smallest spectral derivatives are removed. To lengthen a word, the region of the word having smallest spectral derivative is located and W intervals are inserted at the middle of this region. Each of the W intervals is given the control parameters of the center of the interval i.e., a steady-state region of W intervals is added.
  • FIG. 2 illustrates the type of interpolation performed for four simple cases in accordance with these considerations. Although all three formants of a sound are in terpolated, only one formant is illustrated for each word to simplify the presentation.
  • word 1 the top spectrum
  • word 2 the middle spectrum
  • the interpolated curve shown at the bottom of the first column although beginning at the formants of word 1, rapidly makes a transition and follows the formants of word 2.
  • Column 2 shows the reverse situation; word 2 exhibits little spectrum change whereas word 1 has a large spectrum change.
  • the interpolated curve therefore, follows the formants of word 1 for most of the merging or overlap region and makes the transition to the formants of word 2 at the end of the region.
  • Columns 3 and 4 show examples in which spectrum changes in both words are relatively the same. When they are small, as in column 3, the interpolated curve is essentially linear. When they are large, as in column 4, the interpolated curve tends to follow the formants of the first word for half of the overlap region, and the formants of the second word for the other half.
  • the interpolated curve thus always begins at the formants of word 1 (the current word) and terminates with the formants of word 2 (the following word).
  • the rate at which the interpolated curve makes a transition from the formants of the first word to those of the second is defir nined by the average spectral derivatives SDI and SD2.
  • the spectral derivative of the second word is much greater than that of the first so the transition occurs rapidly at the beginning of the overlap region.
  • the spectral derivative of the first word is the greater so that the transition occurs rapidly at the end of the overlap region.
  • the spectral derivatives for both words in the examples of columns 3 and 4am much the same so that no rapid transitions take place in the overlap region.
  • FIGS. 4 and 5 illustrate the manner in which these rules and considerations are turned to account in the practice of the invention.
  • FIG. 4 illustrates the manner in which three voiced words, We, Were, and Away are linked together to form the sentence We were away. As spoken, the words have durations W,, W W as indicated, and through analysis have been determined to have formants F F and F These formant data are stored in storage unit 17 (FIG. 1) for the individual words, as discussed above. Upon an input command from word sequences unit 18 to assemble the three words into the sentence We were away, the formant data is drawn from storage unit 17 and delivered to word concatenating processor 21.
  • Timing data from storage 22 (or alternatively from external unit 23) and pitch variation data from store 24 (or alternatively from external source 25) are supplied to the processor. It is initially determined that the words We and Were are normally linked together in speech by a smooth transition and uttered as one continuous phrase, Wewere. Hence, the two voiced words are adjusted in duration to values D D in accordance with the context of the utterance, and the formants of the words are overlapped and interpolated to provide the smooth transition. Similarly, the words were and away are normally spoken as wereaway with time emphasis on away. Hence, the duration of away is lengthened to D and the formants for the two words are overlapped and interpolated.
  • the resulting smoothly interpolated formant specification is further modified by superimposing the pitch period contour illustrated in the figure.
  • the resultant is a continguous formant specification of the entire utterance.
  • FIG. 5 illustrates the concatenation of the words I, Saw, This, and Man, to form the phrase I saw this man".
  • the words I and Saw are not overlapped because of the intervening fricative at the beginning of Saw.
  • the words Saw and This are generally spoken with a smooth transition.
  • these words are overlapped and the formants are interpolated. Since the word This ends in a fricative, the words This and Man are not overlapped.
  • the individual word lengths W are each modified to the new values D.
  • a stored pitch period contour is superimposed according to a stored rule. The resultant specification of the phrase I saw this man is thus delivered together with voiced-unvoiced character data, A A and fricative pole-zero data, Fp and F to the speech synthesizer.
  • the unvoiced intensity parameters, A is obtained directly from the stored controls in word catalog v17 when the interval to be synthesized is unvoiced.
  • the voiced intensity parameter, A is similarly obtained directly from word catalog 17, except during a merging region of two voiced intervals, in which case it is obtained by interpolation of the individual voiced intensities of the two words in a fashion similar to that described for the interpolation of formants.
  • CONCATENATION PROCESSOR IMPLEMENTATION Although the operations described above for processing word formant data to form word sequence information may be carried out using any desired apparatus and techniques, one suitable arrangement used in practice relies upon the high-speed processing ability of a digital computer. In practice a general purpose digital computer, namely, the Honeywell DDP-516 or the GE- 635, have been found to be satisfactory. The two machines and their software systems are equally adaptable for receiving a program prepared to convert them from a general purpose machine to a special purpose processor for use in the practice of the invention.
  • FIGS. 6A, 6B, and 6C A flow chart of the programming steps employed to convert such a machine into special purpose processing apparatus which turns to account the features of the invention, is shown in FIGS. 6A, 6B, and 6C, taken together as one complete description. Each step illustrated in the flow chart is itself well known and can be reduced to a suitable program by any one skilled in the programming art.
  • the unique subroutines employed in the word length modification operation and in the overlapping operation are set forth in Fortran IV language in Appendices A and B attached hereto.
  • the DDP-5 16 includes 16 k of core memory, hardware, multiply and divide, direct multiplex control with 16 data channels (0.25 mI-Iz each), and a direct memory access channel (1.0 mHz). Input is by way of a teletypewriter.
  • a Fortran IV compiler, DAP-l6 machine-language assembler, match libraries, and various utility software are standard items supplied by the manufacturer and delivered with the machine.
  • a number of peripheral units may be interfaced with the computer for convenience. This may include auxiliary word stores, card readers, display scopes, printers, tape readers, registers, and the like. Such units are well known to those skilled in the art and are generally available on the open market. They may be interconnected with the basic computer as required by the specific application to which the processor of this invention is to be used.
  • PROCESSOR OPERATIONS In the portion of the flow chart shown at the top of FIG. 6A there is indicated schematically the parametric description storage unit 17 of FIG. 1 which contains a catalog of formant pitch amplitude and fricative specifieations for each of the words in the catalog. Upon command from word sequence input 18, these data are transferred to word concatenating processor system 20, which is illustrated by the reaminder of the flow chart.
  • Pitch data is then superimposed on the formant and gain structure of each word in the utterance in the fashion described in detail above. These data are available in pitch variation data store 75 (store 24 of FIG. 1). It is next determined by the steps indicated in block 76 whether external pitch data is to be used. If it is, such data from unit 77 (unit 25 in FIG. 1) is supplied by way of data store 75 to the operations of unit 74.
  • control parameter contours of the commanded utterance may, if desired, be smoothed and band-limited to about 16 Hz. They are then used to control a formant synthesizer which produces a continuous speech output. Numerous systems, both analog and digital, have been described for synthesizing speech from formant data.
  • One suitable synthesizer is described in J. L. Flanagan Pat. No. 3,330,910, another in David-Flanagan, Pat. No. 3,190,963, FIG. 5, and another is described in Gerstman-Kelly Pat. No. 3,158,685.
  • a formant synthesizer includes a system for producing excitation as a train of impulses with a spacing proportional to the fundamental pitch of the desired signal. The intensity of the pulse excitation is controlled and the signal is applied to a cascade of variable resonators.
  • speech synthesizer 26 generates a waveform which approximates that required for the desired utterance.
  • This signal is utilized in any desired fashion, for example, to energize output unit 27 which may be in the form of a loudspeaker, recording device, or the like.
  • a system for composing speech messages from sequences of prerecorded words which comprises:
  • V means for utilizing said continuous description to control a speech synthesizer.
  • said parametric description of each word in said vocabulary comprises:
  • said representations are in a coded digital formant.
  • Apparatus for processing parametric descriptions of selected prerecorded spoken words to form a continuous description of a prescribed message suitable for actuating a speech synthesizer which comprises:
  • said stored timing information comprises a schedule of word durations as a function of position in an input string of words, and of the number of pho-' nemes per word. 6. Apparatus for processing parametric descriptionsas defined in claim 4, wherein, said stored timing infor-' mation comprises: a schedule of word durations derived from rules based on common language usage. I 7. Apparatus for processing parametric descriptions as defined in claim 4, wherein, said stored timing information comprises:
  • Apparatus for developing control signals for a speech synthesizer which comprises:
  • equation (3) should read -F z F (n +,Q,) 9-2 SDl F 1) -2,- SD2 (9mm +2-s D 2 (3)
  • the bar should be only over SDl and SD2 and the numeral (3) should be separated from the equation by spaces as this numeral is not part of the equation but is only intended to identify same.
  • Col. line 13 "eontinguou s should read -c ontiguous--;

Abstract

Audio response units that select speech sounds, stored in analog or coded digital form, as the excitation for a speech synthesizer are widely used, for example in telephone audio announcement terminals. The speech produced by most units is noticeably artifical and mechanical sounding. According to this invention, human speech is analyzed in terms of formant structure and coded for storage in the unit. As the individual words are called for, a stored program assembles them into a complete utterance, taking into account the durations of the words in the context of the complete utterance, pitch variations common to the language, and transitions between voiced portions of the speech. The result is a more natural sounding synthetic utterance.

Description

United States Patent 1191 Flanagan et al.
[ SPEECH SYNTHESIS BY CONCATENATION OF FORMANT ENCODED WORDS Inventors: James Loton Flanagan, Warren;
Lawrence Richard Rabiner, Berkeley Heights; Ronald William Schafer, New Providence, all of NJ.
Bell Telephone Laboratories Incorporated, Murray Hill, NJ.
Filed: Oct. 30, 1970 Appl. No.: 85,660
Assignee:
References Cited UNITED STATES PATENTS 11/1958 David 179/l5.55 R 11/1964 Gerstman 179/1 SA 5/1967 DeClerk 179/1 SA 2/1968 French 179/1 SA 10/1970 Nakata 179/1 SA 6/1971 Martin 179/1 SA OTHER PUBLICATIONS Rabiner, A Model for synthesizing Speech by Rule,
[ Aug. 6, 1974 IEEE Transactions AU-l7 3/69, pp. 7-13.
J. L. Flanagan et a1. Synthetic Voices for Computers, IEEE Spectrum, pp. 2245, October 14, 1970.
Primary Examiner-Kathleen H. Claffy Assistant Examiner-Jon Bradford Leaheey Attorney, Agent, or Firm-A. E. Hirsch; G. E. Murphy [5 7] ABSTRACT Audio response units that select speech sounds, stored in analog or coded digital form, as the excitation for a speech synthesizer are widely used, for example in telephone audio announcement terminals. The speech produced by most units is noticeably artifical and mechanical sounding.
According to this invention, human speech is analyzed in terms of formant structure and coded for storage in the unit. As the individual words are called for, a stored program assembles them into a complete utterance, taking into account the durations of the words in the context of the complete utterance, pitch variations common to the language, and transitions between voiced portions of the speech. The result is a more natural sounding synthetic utterance.
13 Claims, 8 Drawing Figures SPOKEN 5 WORD INPUT lf CONV. I'
SPEECH ANALYZER i I 1 j RMANT PITCH 14 AMPLITUDE 15 FRICATIVE a ANALYZER ANALy ANALYZER POLE/ZERO ANALYZER L 1 2 9 P\ v Aw Em r PARAMETRIC DESCRIPTION STORAGE PATENIEIIIIII: B I 3.828.132
SHEEI 1 BF 6 SPOKEN woRD INPUT 4; coNv. I
I' SPEECH ANALYZER F I I 'l FORMANT I3 PITcH l4 AMPLITUDE I5 FRICI TIVE l6 PoLE ZERO I ANALYZER ANALYZER ANALYZER ANALYZER l F21 F3 P\ v AN E5 z PARAMETRIC DEscRIPTIoN sToRAeE I7 I8 26 mm WORD DIGITAL SEQUENCE CONCATENATING SPEECH COMMAND I INPUT PROCESSOR I SYNTHESIZER TIMING M PITCH SPOKEN DATA VAR'AT'ON MESSAGE 24 DATA (STORED) (STORED) v OUTPUT I EXTERNAL EXTERNAL PlTCH TIMING vARIATIoN DATA DATA J. L. FLANAGAN -/Nl/EN7'OR L. R. RAB/IVER R. n. SCHAFER ATTORNEY PAIENIEB M18 3.828. 1 32 WORD I I CONTROL -I-- FUNCTION I M I I WORD 2 I I CONTROL FUNCTION I INTERPOLATED I l 1 I CURVE I I I I OVER LAP OVERLAP OVER LAP OVERLAP REGION REGION REGION REGIO NT-HV,E
PHYONEMES DIGIT POSITION 3 450 500 560 6IO [)l g fl 4 260 300 340 380 SEQUENCE 5 340 370 4IO 440 TIME (m SEC.)
PATENTEM 6W4 3.828.132
sum 3 or e We were away FIG. 4 F3 IG 5 IFP sow ThlS mon PATENTED NIB SHEET '4 BF 6 FIG. 6A
WORD SEQUENCE COMMAND INPUT PARAMETRIC DESCRIPTION STORAGE CATALOG OF WORD DESCRIPTIONS 6| T TABLE DETERMINE DURATIONS 62/ LOOKUP OF EACH WORD 22 (F|G.l) IN THE SEQUENCE $23,? MODIFY DURATIONAL ENTRIES DATA usme EXTERNAL INITIALIZE WORD SEQUENCIEWCOUNTER PATENIEIIMIB 1 3.828.132
SHEET 5 (IF 6 FIG. 6B
DATA FROM WORD CATALOG WAS ITH WORD MERGED WITH (I-DsT WORD? SYNTHESIZE FIRST 7 50 m SEC OF I TH WORD I I LENGTHEN OR SHORTEN SUBROUTINE I 1 TH WORD TO MAKE CRDELL TIMING AGREE WITH DURATIONAL DATA YES IS ITH WORD MERGED WITH (I+I) sT WORD? DATA FROM DATA FROM WORD CATALOG WORD CATALOG Iv I I OVERLAP MERGE I SUBROUTINE 1 END OF I TH WORD B E R E; BE INTPL WITH BEGINNING 1TH WORD I T (I+I)ST WORD I v UPDATE WORD $72 SEQUENCING INDEX I IS WORD SEQUENCING INDEX GREATER THAN INDEX OF LAST WORD IN INPUT SEQUENCE? PAIENIEBMIB 6 w 3.828.132-
SNEET 8 BF 6 FIG. 66'
v TABLE SUPERIMPOSE PITCH v LOOKUP DATA ON FORMANT 24 (FIG. D AND GAIN DATA 77 IS EXTERNAL TABLE PITCH DATA ENTRIES To BE USED? 25 (FIG. I)
SYNTHESIZE p26 FINAL OUTPUT SPEECH SYNTHESIS BY CONCATENATION OF FORMANT ENCODED WORDS This invention relates to the synthesis of limited context messages from stored data, and more particularly to processing techniques for assembling stored information into an appropriate specification for energizing a speech synthesizer.
BACKGROUND OF THE INVENTION information with which to energize a speech synthe-.
sizer. The response to the question is eventually provided in the form of complete spoken utterances.
For such a service, it is evident that the system must have a large and flexible vocabulary. The system, therefore, must store sizable quantities of speech information and it must have the information in a form amenable to the production of a great variety of messages. Speech generated by the system would be as intelligible as natural speech. Indeed, the possibility exists that it might be made more intelligible than natural speech. It need not, however, sound like any particular human and may even be permitted to have a machine accent.
DESCRIPTION OF THE PRIOR ART One technique for the synthesis of messages is to store individually spoken words and to select the words in accordance with the desired message output. Words pieced together in this fashion yield intelligible but highly unnatural sounding messages. One difficulty is that word waveforms cannot easily be adjusted in duration. Also, it is difficult to make smooth transitions from one word to the next. Nonetheless, such systems are relatively simple to implement and afford a relatively large vocabulary with simple storage apparatus.
To avoid some of the difficulties of word storage and to reduce the size of the store needed for a reasonable variety of message responses, individual speech sounds may be stored in the form of phoneme specifications. Such specifications can be called out of storage in accordance with word and message assembly rules and used to energize a speech synthesizer. However, speech at the acoustic level is not particularly discrete. Articulations of adjacent phonemes interact, and transient movements of the vocal tract in the production of any phoneme last much longer than the average duration of the phoneme. That is, the articulatory gestures overlap and are superimposed on one another. Hence, transient motions of the vocal tract are perceptually important. Moreover, much information about the identity of a constant is carried, not by the spectral shape at the steady-state time of the consonant but by its dynamic interactions with adjacent phonemes.
Speech synthesis, therefore, is strongly concerned with dynamics. A synthesizer must reproduce not only the characteristics of sounds when they most nearly represent the ideal of each phoneme, but also the dynamics of vocaltract motion as it progresses from one phoneme to another. This fact highlights a difference between speech synthesis from word or phrase storage and synthesis from more elementary speech units. If the library of speech elements is a small number of short units, such as phonemes, the linking procedures approach the complexity of the vocal tract itself. Conversely, if the library of speech elements is a much larger number of longer segments of speech, such as words or phrases, the elements can be linked together at points in the message where information in transients is minimal.
Thus, although phoneme synthesis techniques are attractive and sometimes adequate, the intermediate steps of assembling elementary speech specifications into words and words into messages according to prescribed rules requires complicated equipment and, at best, yields mechanical sounding speech.
SUMMARY OF THE INVENTION These shortcomings are overcome in accordance with the present invention by storing representations of spoken words or phrases in terms of individual formant and other speech defining characteristics. Formants are the natural resonances of the vocal tract and they take on different frequency values as the vocal tract changes its shape during talking. Typically, three such resonances occur in the frequency range most important to intelligibility, namely, 0 3 kHz. Representation of the speech wave as a set of slowly varying excitation parameters and vocal tract resonances is attractive for at least two reasons. First it is more efficient for data storage than, for example, a pulse code modulation (PCM) representation of the speech waveform. Secondly, a formant representation permits flexibility in manipulation of the speech signal for the concatenation of words or phrases.
Thus, in accordance with the invention individual, naturally spoken, isolated words are analyzed to produce a word library which is stored in terms of formant frequencies. In the formant representation of an utterance, formant frequencies, voice pitch, amplitude and timing, can all be manipulated independently. Thus in synthesizing an utterance, an artificial pitch contour, i.e., the time course of the relevant parameters, can be substituted for the natural contour; A steady-state sound can be lengthened or shortened, and even the entire utterance can be speeded up, or slowed down with little or no loss in intelligibility. Formants can be locally distorted, and the entire formant contour can be uniformly raised, or lowered, to alter voice quality.
Upon program demand, word length formant data are accessed and concatenated to form complete formant functions for the desired utterance. The formant functions are interpolated in accordance with spectral derivatives to establish contours which define smooth transitions between words. Speech contour and word duration data are calculated according to stored rules. Following the necessary processing and interpolation, concatenated formant functions are used to synthesize a waveform which approximates a naturally spoken message. As an added advantage, economy in storage is achieved because formant and excitation parameters change relatively slowly and can be specified by fewer binary numbers per second (bits) than can, for example, the speech waveform.
BRIEF DESCRIPTION OF THE DRAWINGS The invention will be fully apprehended from the following detailed description of illustrative embodiments thereof taken in connection with the appended drawings in which:
FIG. 1 illustrates schematically a suitable arrangement in accordance with the invention for synthesizing message-length utterances upon command;
FIG. 2 illustrates the manner of overlapping individual word formants, in accordance with the invention, for four different combinations of words;
FIG. 3 illustrates timing data which may be used for processing formant data;
FIG. 4 illustrates the processing of voiced formant data for individual words to produce a concatenated formant structure useful for actuating a speech synthesizer;
FIG. 5 illustrates the processing of both voiced and fricative formant data for individual words to produce a concatenated formant structure useful for actuating a speech synthesizer; and
FIGS. 6A, 6B and 6C illustrate by way of a flow chart the operations employed in accordance with the invention, for processing parametric data and for concatenating these data to produce a complete set of control signals for energizing a'formant speech synthesizer.
DETAILED DESCRIPTION OF THE INVENTION A system for synthesizing speech by the concatenation of formant encoded words, in accordance with the invention, is illustrated schematically in FIG. I. Isolated words spoken by a human being are analyzed to estimate the parameters required for synthesis. Thus, naturally spoken, isolated words originating, for example, in system 10, which may include either studio generated or recorded words, are converted, if desired, to digital form in converter 11. The individual words, in whatever format, are supplied to speech analyzer 12, wherein individual formants, amplitudes, pitch period designations, and fricative pole and zero identifications are developed at the Nyquist rate. A suitable speech analyzer is described in detail in a copending application of Rabiner-Schafer, Ser. No. 872,050, filed Oct. 29, 1969, now U. S. Pat. 3,649,765, granted Mar. 14, I972. In essence, analyzer 12 includes individual channels, including analyser 13 for identifying formant (voiced) frequencies F F F analyzer 14 for developing a pitch period signal P, analyzer 15 for developing buzz, A and hiss, A level control signals, and analyzer 16 for developing fricative (unvoiced) pole and zero signals, Fp and F These control parameter values are delivered to parametric description storage unit 17, which may take any desired form. Both analog and digital stores, which may be accessed upon command, are known in the art. When completed, storage unit 17 constitutes a word catalog which may be referenced by the word concatenation portion of the system. The parameter values maintained in catalog 17 may be revised from time to time by the addition or deletion of new words.
INPUT COMMAND An input command from word sequence input 18 initiates the necessary operations to synthesize a message composed of words from catalog 17. The exact form of input 18 depends upon the particular application of the word synthesis system. Typically, an inquiry of some form is made to the system embodied by unit 18, the necessary data for a response is formulated, and the appropriate word designations for the response, for example, in the English language, are assembled in code language and delivered to the synthesis system as the output signal of unit 18. Such response units are known to those skilled in the art and are described in various patents and publications. The output developed by such a responsive unit may thus be in the form of machine code language, phoneme or other linguistic symbols, or the like. Whatever the form of the output signal, it is delivered, in accordance with this invention, to word processing System 20, wherein required word data is assembled, processed, and delivered to speech synthesizer 26.
To synthesize a message composed of words from storag unit 17 requires the generation of timing contours, a pitch contour, and formant and amplitude contours. Processor 20, in accordance with the invention, employs separate strategies for handling the segmental features of the message, such as formant frequencies, unvoiced pole and zero frequencies and amplitudes, and the prosodic features, such as timing and pitch. Program strategy for treating the segmental features is self-stored in the processor. The prosodic feature information needed for processing is derived in or is supplied to processor 20. It is this flexibility in manipulating formant-coded speech that permits the breaking of the synthesis problem into two parts.
TIMING DATA Timing information may be derived in one of several ways. For limited vocabulary applications, such as automatic intercept services, the timing rules need be nothing more complicated than a table specifying word duration as a function of position in an input string of data and as a function of the number of phonemes per word. Timing data for a typical seven number digit string is illustrated in the table of FIG. 3 and is normally stored in timing unit 22. For more sophisticated applications, word duration is determined from rules which take into account the syntax of the specific message to be produced, i.e., rules based on models of the English language. Such data also is stored in timing store 22. It is also possible to specify the duration of each word in the input string to be synthesized from external timing data supplied from unit 23. In this case, word duration is chosen according to some external criterion, for example, or measured from a naturally spoken version of the message to be synthesized, and is not necessarily a typical duration for that word, independent of context. Thus, external timing data may be supplied from stored data or from real time adjustments made during synthe- SIS.
PITCH DATA Synthesis also requires the determination of the appropriate pitch contour, i.e., pitch period as a function of time, for the message being synthesized. Pitch information can be obtained in several ways. For example, the pitch character of the original sequence of spoken words may be measured. Alternatively, a monotone or an arbitrarily shaped contour may be used. However,
in practice both of these have been found to give unacceptable, unnatural results. Accordingly, it is in accordance with this invention to use a time-normalized pitch contour, stored in unit 24, and to modify it to matchthe word portions as determined from the timing rules. Thus, pitch data stored in unit 24 are supplied to concatenating processor 21 wherein the contour is locally lengthened or shortened as required by ;the specific utterance timing as specified by the timing data. If desired, pitch variation data may be supplied from external source 25, either in the form of auxiliary stored data, or as real time input data. For example, a pitch contour extracted from a naturally spoken version of the message may be used. Such data would normally be used when word durations have been obtained in a similar manner, i.e., from external timing unit 23.
Pitch and timing information obtained externally in this manner provide the most natural sounding synthesized speech. It is also possible to calculate pitch contour information by rule. Thus, there are many ways in which the prosodic information for a message can be obtained, and the choice depends strongly on the desired quality of the synthetic speech and the specific application for which it is to be used.
WORD DURATION ADJUSTMENT Once the timing pattern for the message is established, isolated words in word catalog 17 can be withdrawn and altered to match the specified timing. Thus, formant data for a word in the catalog may be either lengthened or shortened. The formant contours for successive voiced words are smoothly connected together to form continuous transitions and continuous formant contours for the message. The choice of place in a word to alter duration is based on the dynamics of the formant contours. For each subinterval of a voiced sound, typically msec in duration, a measure of the rate of change of formant contours is computed in processor 21. This measure is called the spectral derivative. Regions of the word where the spectral derivative is smallare regions where the word can be shortened or lengthened with the least effect on word intelligibility. Thus, to shorten a word by a given amount, an appropriate number of 10 msec intervals are deleted in the region of the smallest spectral derivative. To lengthen a word, the region of the lowest spectral derivative is lengthened by adding an appropraite number of 10 msec intervals. Unvoiced regions of words are never modified.
In practice, the measure of spectral derivative, SD,-, is calculated where i(1,2, is the i'" 10 msec interval and F,(i) is the value of the j"' formant in the i"' time interval. To determine how many 10 msec intervals must be added to (or substracted from) the isolated word controls, an equation is used based on desired word duration, isolated word duration, and some simple contextual information concerning how the current word is concatenated with its preceding and following neighbors. By defining the symbols:
I l if the end of the preceding word is voiced, and the beginning of the current word is also voiced; 0 otherwise 1 if the end of the current word is voiced, and the beginning of the following word is also voiced; 0 otherwise W, duration of current word spoken in isolation W duration of current word spoken in context (as determined from timing rules) W number of 10 msec intervals to be added if W 0 (or subtracted if W 0) then W W W 5 X (IPM NM) The reason for the last term in the above equation is that whenever either l or 1, it means that the two words must be smoothly merged together, and will overlap each other by msec. However, this 100 msec region is shared by the two words; hence 50 msec (5 intervals) are allotted to each word separately in terms of the overall timing. The technique by which the W additional 10 msec intervals are inserted, or removed, is based entirely on the spectral derivative measurement. As noted above, for each 10 msec voiced interval of the isolated word, the spectral derivative is calculated. To shorten a word, the W intervals having smallest spectral derivatives are removed. To lengthen a word, the region of the word having smallest spectral derivative is located and W intervals are inserted at the middle of this region. Each of the W intervals is given the control parameters of the center of the interval i.e., a steady-state region of W intervals is added.
OVERLAP OF WORD DESCRIPTIONS Except for the case when the end of the current word, as well as the beginning of the following word, are both voiced, the control data from word to word are simply abutted. Whenever the end of one word is voiced and the beginning of the next word is also voiced, a smooth transition is thus made from the formants at the end of one word to those at the beginning of the next word. This transition is made, for example, over the last 100 msec of the first word and the first 100 msec of the second. The transition rate depends on the relative rates of spectram change of the two words over the merging region.
To perform this transition task, an interpolation function is used whose parameters depend strongly on the average spectral derivatives of the two words during the merging region. If the spectral derivative symbols are defined as:
"0+9 SD1= 2 SDI i=n 7'10. I SD2 =2 SDZ,-
n starting interval of merging region for current word F,(l) Value of formant j of the message contour at time 1 during the merger region, 1 0,1, 9,
then the interlation function used F,(l) F,(n +l)-(9l)-SD1 F"j(l)-l-SD2/(9-l)SDl 4-1-8023 where F *(1) value of the j" formant, at time I for word k (k l is current word, k 2 is following word).
FORMANT INTERPOLATION FIG. 2 illustrates the type of interpolation performed for four simple cases in accordance with these considerations. Although all three formants of a sound are in terpolated, only one formant is illustrated for each word to simplify the presentation. For the words in column 1, word 1 (the top spectrum) exhibits a very small change over its last 100 msec of voicing, whereas word 2 (middle spectrum) exhibits a large change. The interpolated curve shown at the bottom of the first column, although beginning at the formants of word 1, rapidly makes a transition and follows the formants of word 2. Column 2 shows the reverse situation; word 2 exhibits little spectrum change whereas word 1 has a large spectrum change. The interpolated curve, therefore, follows the formants of word 1 for most of the merging or overlap region and makes the transition to the formants of word 2 at the end of the region. Columns 3 and 4 show examples in which spectrum changes in both words are relatively the same. When they are small, as in column 3, the interpolated curve is essentially linear. When they are large, as in column 4, the interpolated curve tends to follow the formants of the first word for half of the overlap region, and the formants of the second word for the other half.
The interpolated curve thus always begins at the formants of word 1 (the current word) and terminates with the formants of word 2 (the following word). The rate at which the interpolated curve makes a transition from the formants of the first word to those of the second is defir nined by the average spectral derivatives SDI and SD2. In the example of column 1, the spectral derivative of the second word is much greater than that of the first so the transition occurs rapidly at the beginning of the overlap region. For example of the second column the spectral derivative of the first word is the greater so that the transition occurs rapidly at the end of the overlap region. As indicated above, the spectral derivatives for both words in the examples of columns 3 and 4am much the same so that no rapid transitions take place in the overlap region.
EXAMPLES OF CONCATENATION FIGS. 4 and 5 illustrate the manner in which these rules and considerations are turned to account in the practice of the invention. FIG. 4 illustrates the manner in which three voiced words, We, Were, and Away are linked together to form the sentence We were away. As spoken, the words have durations W,, W W as indicated, and through analysis have been determined to have formants F F and F These formant data are stored in storage unit 17 (FIG. 1) for the individual words, as discussed above. Upon an input command from word sequences unit 18 to assemble the three words into the sentence We were away, the formant data is drawn from storage unit 17 and delivered to word concatenating processor 21. Timing data from storage 22 (or alternatively from external unit 23) and pitch variation data from store 24 (or alternatively from external source 25) are supplied to the processor. It is initially determined that the words We and Were are normally linked together in speech by a smooth transition and uttered as one continuous phrase, Wewere. Hence, the two voiced words are adjusted in duration to values D D in accordance with the context of the utterance, and the formants of the words are overlapped and interpolated to provide the smooth transition. Similarly, the words were and away are normally spoken as wereaway with time emphasis on away. Hence, the duration of away is lengthened to D and the formants for the two words are overlapped and interpolated.
The resulting smoothly interpolated formant specification is further modified by superimposing the pitch period contour illustrated in the figure. The resultant is a continguous formant specification of the entire utterance. These formant data as modified, together with the pitch period contour, and voiced-unvoiced character data A and A are delivered to speech synthesizer 26 (FIG. 1).
FIG. 5 illustrates the concatenation of the words I, Saw, This, and Man, to form the phrase I saw this man". In this case the words I and Saw are not overlapped because of the intervening fricative at the beginning of Saw. However, the words Saw and This" are generally spoken with a smooth transition. Hence, these words are overlapped and the formants are interpolated. Since the word This ends in a fricative, the words This and Man are not overlapped. In accordance with the context of the expression, the individual word lengths W are each modified to the new values D. Finally, a stored pitch period contour is superimposed according to a stored rule. The resultant specification of the phrase I saw this man is thus delivered together with voiced-unvoiced character data, A A and fricative pole-zero data, Fp and F to the speech synthesizer.
INTENSITY DATA The unvoiced intensity parameters, A is obtained directly from the stored controls in word catalog v17 when the interval to be synthesized is unvoiced. The voiced intensity parameter, A is similarly obtained directly from word catalog 17, except during a merging region of two voiced intervals, in which case it is obtained by interpolation of the individual voiced intensities of the two words in a fashion similar to that described for the interpolation of formants.
CONCATENATION PROCESSOR IMPLEMENTATION Although the operations described above for processing word formant data to form word sequence information may be carried out using any desired apparatus and techniques, one suitable arrangement used in practice relies upon the high-speed processing ability of a digital computer. In practice a general purpose digital computer, namely, the Honeywell DDP-516 or the GE- 635, have been found to be satisfactory. The two machines and their software systems are equally adaptable for receiving a program prepared to convert them from a general purpose machine to a special purpose processor for use in the practice of the invention.
A flow chart of the programming steps employed to convert such a machine into special purpose processing apparatus which turns to account the features of the invention, is shown in FIGS. 6A, 6B, and 6C, taken together as one complete description. Each step illustrated in the flow chart is itself well known and can be reduced to a suitable program by any one skilled in the programming art. The unique subroutines employed in the word length modification operation and in the overlapping operation are set forth in Fortran IV language in Appendices A and B attached hereto.
Although any general purpose digital computer may be adapted to perform the operations required by the flow chart of FIG. 6, a unit with characteristics similar to that of the DDP-l6 is preferred. The DDP-5 16 includes 16 k of core memory, hardware, multiply and divide, direct multiplex control with 16 data channels (0.25 mI-Iz each), and a direct memory access channel (1.0 mHz). Input is by way of a teletypewriter. A Fortran IV compiler, DAP-l6 machine-language assembler, match libraries, and various utility software are standard items supplied by the manufacturer and delivered with the machine. If desired, a number of peripheral units may be interfaced with the computer for convenience. This may include auxiliary word stores, card readers, display scopes, printers, tape readers, registers, and the like. Such units are well known to those skilled in the art and are generally available on the open market. They may be interconnected with the basic computer as required by the specific application to which the processor of this invention is to be used.
PROCESSOR OPERATIONS In the portion of the flow chart shown at the top of FIG. 6A there is indicated schematically the parametric description storage unit 17 of FIG. 1 which contains a catalog of formant pitch amplitude and fricative specifieations for each of the words in the catalog. Upon command from word sequence input 18, these data are transferred to word concatenating processor system 20, which is illustrated by the reaminder of the flow chart.
Initially, the duration of each word in the connected sequence is determined, as indicated in block 61, for example, by examining a stored table of timing data 62, of the sort illustrated in FIG. 3 and by unit 22 in FIG. 1. If a timing change is necessary, the program statements of unit 63 determines whether data in store 62 is sufficient of whether external timing data from unit 64 (block 23 of FIG. 1) should be used. In either event, the duration of each commanded word is established and a word sequence counter, in unit 65, is initialized by setting I=l.
It is then necessary to modify the parametric description of the first word in accordance with timing data and other stored rules. Accordingly, it is determined whether the 1" word was merged with the (ll word. This determination is represented by block 66. If it was not, information for the 1" word is withdrawn from word catalog l7 and the first 50 msec of the 1" is synthesized by unit 67. If the 1" word was so merged, the 1 word is lengthened or shortened to make timing agree with durational data supplied as above. This operation takes place in unit 68 in conjunction with subroutines CRDELL, a listing for which appears in Appendix A.
It is then ascertained whether the 1" word is to be merged with the (1+1 word via the steps of block 69. If there is to be a merger, the operations of block 70 are carried out to overlap the end of the 1" word with the beginning of the (1+1 word. This operation is carried out in conjunction with subroutine INTPL, a listing for which appears as Appendix B. If it is determined in block 69 that there is to be no merging, the operations of block 71 synthesize the last 50 msec of the 1" word using data for that word supplied from store 17.
It is then necessary in unit 72 to update the word sequencing of index I and, in operation 73, to determine if the word sequencing index is greater than the index of the last word in the input sequence. If it is not, control is returned to block 66, and the next word is composed in the fashion just described. The operations are thus iterated until the index is equal to the index of the last word in the input sequence, at which time the data from block 73 is transferred to block 74.
Pitch data is then superimposed on the formant and gain structure of each word in the utterance in the fashion described in detail above. These data are available in pitch variation data store 75 (store 24 of FIG. 1). It is next determined by the steps indicated in block 76 whether external pitch data is to be used. If it is, such data from unit 77 (unit 25 in FIG. 1) is supplied by way of data store 75 to the operations of unit 74.
When the pitch contour operation has been completed, all of the data in the word concatenating processor 20 as modified by the program of FIG. 6, is transferred, for example, to speech synthesizer 26 of FIG. 1.
FORMANT SYNTHESIS When all of the control parameter contours of the commanded utterance have been generated, they may, if desired, be smoothed and band-limited to about 16 Hz. They are then used to control a formant synthesizer which produces a continuous speech output. Numerous systems, both analog and digital, have been described for synthesizing speech from formant data. One suitable synthesizer is described in J. L. Flanagan Pat. No. 3,330,910, another in David-Flanagan, Pat. No. 3,190,963, FIG. 5, and another is described in Gerstman-Kelly Pat. No. 3,158,685. The abovecited Rabiner-Schafer application illustrates a typical formant synthesizer and relates the exact parameters described hereinabove to the input of the synthesizer described in the Flanagan patent. Very simply, a formant synthesizer includes a system for producing excitation as a train of impulses with a spacing proportional to the fundamental pitch of the desired signal. The intensity of the pulse excitation is controlled and the signal is applied to a cascade of variable resonators.
Suffice it to say, speech synthesizer 26 generates a waveform which approximates that required for the desired utterance. This signal is utilized in any desired fashion, for example, to energize output unit 27 which may be in the form of a loudspeaker, recording device, or the like.
SUBROUTINECRDELL(IST.ISP,NEL) COMMONJFTCH(300)JADR(8),LSTA,LNO.LTYPE, LCALL,1LLENG COMMONJAP(720.8).IA(25),IAD(25)JSEQQS). lIWTIM(25),J2,J3JRD2JRD4JRD8JAV 7 COMMONIFCTJ Fl(500)JF2(500)JF3(5 00)JB4(500) IC=I l-IST IF(IC.NE.O)CALL CRDSYNGSTJC) CALLCRDSYN(TI,I)
IC=ISPIST+IIC RETURN CONTINU E FIND SPECTRAL DERIVATIVE LEVEL SUCH THAT NEL INTERVALS HAS THIS LEVEL OR SMALLER IDIF=500 ITHR=O JC=O DO4OI=IST,ISP IF(JAR(T,I).EQ.O)GOTO40 IF(.IAR(I-8),LE.ITHRIJC=IC+l CONTINUE IDIF=IDIFl2 IF(IDIF.EQ.O)GOTO IF(.IC.GT.(NEL+I))GOTO45 IF(.IC.LT.NEL)GOTO50 GOTOSS ITI-IR==ITHRIDI-F GOTO37 ITHR=ITHR+IDIF GOTO37 ICNT=O FLIMINATE THE INTERVALS WITH SPECTRAL DERIVATIVES LESS THAN THIS LEVEL DOI=IST,ISP IF(.IAR(I,I).EQ.O)GOTO56 lFUAR(I.8).LE.ITI-IR)GOTO57 CONTINUE CALLCRDSYN(I.1) GOTO60 ICNT=ICNT+I IF(ICNT.GE.NEL)ITHR=-l CONTINUE RETURN END APPENDIX B y SUBROUTINE TO MERGE WORDS AND INTERPOLATE THEIR CONTROL SIGNALS SUBROUTINENTPL(IST,IST,IW,LST,IL1,IL2,ITPANS, 55
INUM) (l)MMON.I PTCH(300),JADR(8),LSTA,LNO,LTYPE, LCALLJLLENG CUMMGNTAR 720,8 .1A 25).IAD(25).1SEQ(25).
C C C lIWTIM(25)J2J3,.IRD2JRD4.IRD8.IAV COMMONIFCTJFI(500)JF2(S00)JF3(500)JF4(500) COMMONJ F5(500)JF6(500)J F7600 DIMENSIONJFINI'I) DI RENSIONJSATG) CALCULATE AVERAGE SPECTRAL DERIVATIVES OF BOTH WORDS OVER THE MERG ING REGION WHICH CONSIST OF IW INTERVALS CONTINUE JSI=O 182 0 DO5I=LIW i Il =IST+Il. I2=IST+I1 JSI=JSI+JAR(II,8) .IS2=JS2+.IAR(I2.8) IND=I l5 DO3OI= I .IW
**** GET STARTING ADDRESSES OF DATA FOR BOTH WORDS IL=IST+II JL=JST+II KL=LST+II LL=IND+I1 LM=IW+1-LL NORM=ISI+LM+JS2+LL MERGE AND INTERPOLATE CONTROL SIGNALS OVER THESE IW INTERVALS DO20J=L7 .ILI=.IAR(ILJ)+LM JL2=JAR(J L.J)+LL .IAR(KLJ)=.IKOL(.IL1,ISI .NORM)+.IKOL(JL2,JS2, NORM) CONTINUE CONTINUE CALLCRDSYN(LST.IW)
RETURN END What is claimed is: l. A system for composing speech messages from sequences of prerecorded words, which comprises:
means for analyzing each word of a vocabulary of spoken words to produce a separate parametric description of each;
means for storing said parametric descriptions;
means under control of an applied command signal for sequentially withdrawing from storage those descriptions required to assemble a desired spoken message;
means for individually altering the duration of the description of each word of said message in accordance with prescribed timing rules;
means for merging consecutive word descriptions together on the basis of the respective, voiceunvoiced character of the merged word descriptions;
means for altering the pitch characteristic of said continuous message description in accordance with a prescribed contour; and
means for utilizing said continuous description to control a speech synthesizer. V
2. A system for composing speech messages as defined in claim 1, wherein,
said parametric description of each word in said vocabulary comprises:
a representation of the formants, voiced and unvoiced amplitudes, and fricative pole-zero characteristics of said spoken word.
3. A system for composing speech messages as defined in claim 2, wherein,
said representations are in a coded digital formant.
4. Apparatus for processing parametric descriptions of selected prerecorded spoken words to form a continuous description of a prescribed message suitable for actuating a speech synthesizer, which comprises:
means for deriving a spectral derivative function for each word description of said message;
means for individually altering the durations of selected word descriptions in accordance with stored timing information;
means operative in response to said spectral derivative functions for developing parametric descriptions of transitions between voiced word regions scheduled to be merged to form said message;
means for concatenating said altered word descrip- 13 tions with said transition descriptions in accor-'} dance with said prescribed message to form a con-, tinuous parametric message description; and l means for altering the pitch characteristic of said message description in accordance with prescribed rules. 5. Apparatus for processing parametric descriptions as defined in claim 4, wherein:
said stored timing information comprises a schedule of word durations as a function of position in an input string of words, and of the number of pho-' nemes per word. 6. Apparatus for processing parametric descriptionsas defined in claim 4, wherein, said stored timing infor-' mation comprises: a schedule of word durations derived from rules based on common language usage. I 7. Apparatus for processing parametric descriptions as defined in claim 4, wherein, said stored timing information comprises:
a schedule of word durations assembled from mea-' surements of a naturally spoken version of said prescribed message.
8. Apparatus for processing parametric descriptions of selected prerecorded words, as defined in claim 4, wherein,
said parametric descriptions of transitions are developed for the last 100 msec of the first of two words to be merged and the first 100 msec of the second of said two words to be merged.
9. Apparatus as defined in claim 8, wherein,
the rate of transition between said two words is proportional to the average of said spectral derivatives I for said two words. 10. Apparatus for processing parametric descriptions of selected words as defined in claim 4, wherein said means for altering the pitch characteristic of said message description comprises:
a stored, time-normalized pitch contour for a selected number of different messages; and
means for modifying said contour in accordance with said altered word description durations.
11. Apparatus for developing control signals for a speech synthesizer, which comprises:
means supplied with word length segmental and prosodic functions of each individual word of a desired message for deriving the spectral derivatives of each of said functions;
means responsive to said spectral derivatives for interpolating said segmental functions to establish for said words as a function of message syntax.
UNITED STATES PATENT OFFICE CERTIFICATE OF CORRECTION Patent No. 3,828,132 Dated August 6, 197M Inv nt (s) James L. Flanagan-Lawrence R. Rabiner- Ronald W. Schafer It is certified that error appears in the above-identified patent and that said Letters Patentare hereby corrected as shown below:
Col. 1, line 31, "should" di read -should;
line 66, "constant" should read --consonant--. 2, line 49, "parameters" should read --parameter-- 6, line 49, "spectram" should read --spectrum;
line 6 1, "1 6(1)" should read --F .($Z.);
line 67, equation (3) should read -F z F (n +,Q,) 9-2 SDl F 1) -2,- SD2 (9mm +2-s D 2 (3) The bar should be only over SDl and SD2 and the numeral (3) should be separated from the equation by spaces as this numeral is not part of the equation but is only intended to identify same. Col. line 13, "eontinguou s should read -c ontiguous--;
001.10, line 67, "IF(TS7)" should read --IF(TST)--. 001.11, line 1, "IF(N C.EQ.O|)GOTO22 should read --IF('NC.EQ.Ol)GOTO22--; line 7, "Il =T+J-l" shouldread --Il=I+J-l--; line 18, "I1 TLOC+NC/2" should read --Il=ILOC+NC/2--; line 29, "IF(NEL.E0.0)GOTO30" should read --IF(NEL.EQ.O)GOTO30--; line 22, "CALLCRDSYN(T1,1)" should read 1 --CALLCRDSYN(Il,l)--; Following line 23, add:
--IF(IC,NE,O)CALLCRDSYN(I1.IC); line 10, "FLIMINATE" should read --ELIMINATE--; line 55, "SUBROUTINE NTPL" should read --SUBROUTINE INTPL--;
line 62, "DIRENSIONJSAT(7) should read nT1vrF.1\TsTnN.TsA r-(7\ v igned and sealed this 1st day 0. April 1375.
(SEAL) Attest:
MARSHALL DANN RUTH C. I-IASON Commissioner of Patents Attesting Officer and Trademarks

Claims (13)

1. A system for composing speech messages from sequences of prerecorded words, which comprises: means for analyzing each word of a vocabulary of spoken words to produce a separate parametric description of each; means for storing said parametric descriptions; means under control of an applied command signal for sequentially withdrawing from storage those descriptions required to assemble a desired spoken message; means for individually altering the duration of the description of each word of said message in accordance with prescribed timing rules; means for merging consecutive word descriptions together on the basis of the respective, voice-unvoiced character of the merged word descriptions; means for altering the pitch characteristic of said continuous message description in accordancE with a prescribed contour; and means for utilizing said continuous description to control a speech synthesizer.
2. A system for composing speech messages as defined in claim 1, wherein, said parametric description of each word in said vocabulary comprises: a representation of the formants, voiced and unvoiced amplitudes, and fricative pole-zero characteristics of said spoken word.
3. A system for composing speech messages as defined in claim 2, wherein, said representations are in a coded digital formant.
4. Apparatus for processing parametric descriptions of selected prerecorded spoken words to form a continuous description of a prescribed message suitable for actuating a speech synthesizer, which comprises: means for deriving a spectral derivative function for each word description of said message; means for individually altering the durations of selected word descriptions in accordance with stored timing information; means operative in response to said spectral derivative functions for developing parametric descriptions of transitions between voiced word regions scheduled to be merged to form said message; means for concatenating said altered word descriptions with said transition descriptions in accordance with said prescribed message to form a continuous parametric message description; and means for altering the pitch characteristic of said message description in accordance with prescribed rules.
5. Apparatus for processing parametric descriptions as defined in claim 4, wherein: said stored timing information comprises a schedule of word durations as a function of position in an input string of words, and of the number of phonemes per word.
6. Apparatus for processing parametric descriptions as defined in claim 4, wherein, said stored timing information comprises: a schedule of word durations derived from rules based on common language usage.
7. Apparatus for processing parametric descriptions as defined in claim 4, wherein, said stored timing information comprises: a schedule of word durations assembled from measurements of a naturally spoken version of said prescribed message.
8. Apparatus for processing parametric descriptions of selected prerecorded words, as defined in claim 4, wherein, said parametric descriptions of transitions are developed for the last 100 msec of the first of two words to be merged and the first 100 msec of the second of said two words to be merged.
9. Apparatus as defined in claim 8, wherein, the rate of transition between said two words is proportional to the average of said spectral derivatives for said two words.
10. Apparatus for processing parametric descriptions of selected words as defined in claim 4, wherein said means for altering the pitch characteristic of said message description comprises: a stored, time-normalized pitch contour for a selected number of different messages; and means for modifying said contour in accordance with said altered word description durations.
11. Apparatus for developing control signals for a speech synthesizer, which comprises: means supplied with word length segmental and prosodic functions of each individual word of a desired message for deriving the spectral derivatives of each of said functions; means responsive to said spectral derivatives for interpolating said segmental functions to establish contours which define smooth transitions between the words of said message; means for concatenating said segmental functions in accordance with said transition contours, and, means for utilizing said prosodic functions to alter said concatenated segmental functions to develop control waveform signals which approximate the waveform of said desired message.
12. Apparatus as defined in claim 11, wherein, said segmental functions include the format frequencies, unvoiced pole and zero frequencies and amplitudes of each of said words.
13. Apparatus as dEfined in claim 11, wherein, said prosodic functions include timing and pitch variations for said words as a function of message syntax.
US00085660A 1970-10-30 1970-10-30 Speech synthesis by concatenation of formant encoded words Expired - Lifetime US3828132A (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
US00085660A US3828132A (en) 1970-10-30 1970-10-30 Speech synthesis by concatenation of formant encoded words
CA107,266A CA941968A (en) 1970-10-30 1971-03-09 Speech synthesis by concatenation of formant encoded words
DE2115258A DE2115258C3 (en) 1970-10-30 1971-03-30 Method and arrangement for speech synthesis from representations of individually spoken words
JP1928771A JPS539041B1 (en) 1970-10-30 1971-04-01

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US00085660A US3828132A (en) 1970-10-30 1970-10-30 Speech synthesis by concatenation of formant encoded words

Publications (1)

Publication Number Publication Date
US3828132A true US3828132A (en) 1974-08-06

Family

ID=22193116

Family Applications (1)

Application Number Title Priority Date Filing Date
US00085660A Expired - Lifetime US3828132A (en) 1970-10-30 1970-10-30 Speech synthesis by concatenation of formant encoded words

Country Status (4)

Country Link
US (1) US3828132A (en)
JP (1) JPS539041B1 (en)
CA (1) CA941968A (en)
DE (1) DE2115258C3 (en)

Cited By (180)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3982070A (en) * 1974-06-05 1976-09-21 Bell Telephone Laboratories, Incorporated Phase vocoder speech synthesis system
US3995116A (en) * 1974-11-18 1976-11-30 Bell Telephone Laboratories, Incorporated Emphasis controlled speech synthesizer
US4060848A (en) * 1970-12-28 1977-11-29 Gilbert Peter Hyatt Electronic calculator system having audio messages for operator interaction
US4075424A (en) * 1975-12-19 1978-02-21 International Computers Limited Speech synthesizing apparatus
US4144582A (en) * 1970-12-28 1979-03-13 Hyatt Gilbert P Voice signal processing system
DE2854601A1 (en) * 1977-12-16 1979-06-21 Sanyo Electric Co CLAY SYNTHESIZER AND METHOD FOR CLAY PROCESSING
US4163120A (en) * 1978-04-06 1979-07-31 Bell Telephone Laboratories, Incorporated Voice synthesizer
DE3019823A1 (en) * 1979-05-29 1980-12-11 Texas Instruments Inc DATA CONVERTER AND LANGUAGE SYNTHESIS ARRANGEMENT THEREFORE
US4384170A (en) * 1977-01-21 1983-05-17 Forrest S. Mozer Method and apparatus for speech synthesizing
US4455551A (en) * 1980-01-08 1984-06-19 Lemelson Jerome H Synthetic speech communicating system and method
US4559602A (en) * 1983-01-27 1985-12-17 Bates Jr John K Signal processing and synthesizing method and apparatus
US5146502A (en) * 1990-02-26 1992-09-08 Davis, Van Nortwick & Company Speech pattern correction device for deaf and voice-impaired
WO2000070799A1 (en) * 1999-05-19 2000-11-23 New Horizons Telecasting, Inc. Streaming media automation and distribution system for multi-window television programming
US6366884B1 (en) * 1997-12-18 2002-04-02 Apple Computer, Inc. Method and apparatus for improved duration modeling of phonemes
EP1193615A2 (en) * 2000-09-28 2002-04-03 Global Language Communication Systems e.K. Electronic text translation apparatus
US6405169B1 (en) * 1998-06-05 2002-06-11 Nec Corporation Speech synthesis apparatus
US20020123130A1 (en) * 2001-03-01 2002-09-05 Cheung Ling Y. Methods and compositions for degrading polymeric compounds
US20020133349A1 (en) * 2001-03-16 2002-09-19 Barile Steven E. Matching a synthetic disc jockey's voice characteristics to the sound characteristics of audio programs
US20030097266A1 (en) * 1999-09-03 2003-05-22 Alejandro Acero Method and apparatus for using formant models in speech systems
US6591240B1 (en) * 1995-09-26 2003-07-08 Nippon Telegraph And Telephone Corporation Speech signal modification and concatenation method by gradually changing speech parameters
US7409347B1 (en) 2003-10-23 2008-08-05 Apple Inc. Data-driven global boundary optimization
US20080270137A1 (en) * 2007-04-27 2008-10-30 Dickson Craig B Text to speech interactive voice response system
US20090048844A1 (en) * 2007-08-17 2009-02-19 Kabushiki Kaisha Toshiba Speech synthesis method and apparatus
US20090070115A1 (en) * 2007-09-07 2009-03-12 International Business Machines Corporation Speech synthesis system, speech synthesis program product, and speech synthesis method
US7643990B1 (en) 2003-10-23 2010-01-05 Apple Inc. Global boundary-centric feature extraction and associated discontinuity metrics
US20100285778A1 (en) * 2009-05-11 2010-11-11 Max Bluvband Method, circuit, system and application for providing messaging services
US8229086B2 (en) 2003-04-01 2012-07-24 Silent Communication Ltd Apparatus, system and method for providing silently selectable audible communication
US20120309363A1 (en) * 2011-06-03 2012-12-06 Apple Inc. Triggering notifications associated with tasks items that represent tasks to perform
US20130080176A1 (en) * 1999-04-30 2013-03-28 At&T Intellectual Property Ii, L.P. Methods and Apparatus for Rapid Acoustic Unit Selection From a Large Speech Corpus
US8583418B2 (en) 2008-09-29 2013-11-12 Apple Inc. Systems and methods of detecting language and natural language strings for text to speech synthesis
US8600743B2 (en) 2010-01-06 2013-12-03 Apple Inc. Noise profile determination for voice-related feature
US8614431B2 (en) 2005-09-30 2013-12-24 Apple Inc. Automated response to and sensing of user activity in portable devices
US8620662B2 (en) 2007-11-20 2013-12-31 Apple Inc. Context-aware unit selection
US8645137B2 (en) 2000-03-16 2014-02-04 Apple Inc. Fast, language-independent method for user authentication by voice
US8660849B2 (en) 2010-01-18 2014-02-25 Apple Inc. Prioritizing selection criteria by automated assistant
US8670985B2 (en) 2010-01-13 2014-03-11 Apple Inc. Devices and methods for identifying a prompt corresponding to a voice input in a sequence of prompts
US8677377B2 (en) 2005-09-08 2014-03-18 Apple Inc. Method and apparatus for building an intelligent automated assistant
US8676904B2 (en) 2008-10-02 2014-03-18 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US8682649B2 (en) 2009-11-12 2014-03-25 Apple Inc. Sentiment prediction from textual data
US8682667B2 (en) 2010-02-25 2014-03-25 Apple Inc. User profiling for selecting user specific voice input processing information
US8688446B2 (en) 2008-02-22 2014-04-01 Apple Inc. Providing text input using speech data and non-speech data
US8706472B2 (en) 2011-08-11 2014-04-22 Apple Inc. Method for disambiguating multiple readings in language conversion
US8713021B2 (en) 2010-07-07 2014-04-29 Apple Inc. Unsupervised document clustering using latent semantic density analysis
US8712776B2 (en) 2008-09-29 2014-04-29 Apple Inc. Systems and methods for selective text to speech synthesis
US8719006B2 (en) 2010-08-27 2014-05-06 Apple Inc. Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis
US8718047B2 (en) 2001-10-22 2014-05-06 Apple Inc. Text to speech conversion of text messages from mobile communication devices
US8719014B2 (en) 2010-09-27 2014-05-06 Apple Inc. Electronic device with text error correction based on voice recognition data
US8751238B2 (en) 2009-03-09 2014-06-10 Apple Inc. Systems and methods for determining the language to use for speech generated by a text to speech engine
US8762156B2 (en) 2011-09-28 2014-06-24 Apple Inc. Speech recognition repair using contextual information
US8768702B2 (en) 2008-09-05 2014-07-01 Apple Inc. Multi-tiered voice feedback in an electronic device
US8775442B2 (en) 2012-05-15 2014-07-08 Apple Inc. Semantic search using a single-source semantic model
US8781836B2 (en) 2011-02-22 2014-07-15 Apple Inc. Hearing assistance system for providing consistent human speech
US8812294B2 (en) 2011-06-21 2014-08-19 Apple Inc. Translating phrases from one language into another using an order-based set of declarative rules
US8862252B2 (en) 2009-01-30 2014-10-14 Apple Inc. Audio user interface for displayless electronic device
US8898568B2 (en) 2008-09-09 2014-11-25 Apple Inc. Audio user interface
US8935167B2 (en) 2012-09-25 2015-01-13 Apple Inc. Exemplar-based latent perceptual modeling for automatic speech recognition
US8977584B2 (en) 2010-01-25 2015-03-10 Newvaluexchange Global Ai Llp Apparatuses, methods and systems for a digital conversation management platform
US8977255B2 (en) 2007-04-03 2015-03-10 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
US8996376B2 (en) 2008-04-05 2015-03-31 Apple Inc. Intelligent text-to-speech conversion
US9053089B2 (en) 2007-10-02 2015-06-09 Apple Inc. Part-of-speech tagging using latent analogy
US9262612B2 (en) 2011-03-21 2016-02-16 Apple Inc. Device access using voice authentication
US9280610B2 (en) 2012-05-14 2016-03-08 Apple Inc. Crowd sourcing information to fulfill user requests
US9300784B2 (en) 2013-06-13 2016-03-29 Apple Inc. System and method for emergency calls initiated by voice command
US9311043B2 (en) 2010-01-13 2016-04-12 Apple Inc. Adaptive audio feedback system and method
US9330720B2 (en) 2008-01-03 2016-05-03 Apple Inc. Methods and apparatus for altering audio output signals
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
US9368114B2 (en) 2013-03-14 2016-06-14 Apple Inc. Context-sensitive handling of interruptions
US9431006B2 (en) 2009-07-02 2016-08-30 Apple Inc. Methods and apparatuses for automatic speech recognition
US9430463B2 (en) 2014-05-30 2016-08-30 Apple Inc. Exemplar-based natural language processing
US9483461B2 (en) 2012-03-06 2016-11-01 Apple Inc. Handling speech synthesis of content for multiple languages
US9495129B2 (en) 2012-06-29 2016-11-15 Apple Inc. Device, method, and user interface for voice-activated navigation and browsing of a document
US9502031B2 (en) 2014-05-27 2016-11-22 Apple Inc. Method for supporting dynamic grammars in WFST-based ASR
US9535906B2 (en) 2008-07-31 2017-01-03 Apple Inc. Mobile device having human language translation capability with positional feedback
US9547647B2 (en) 2012-09-19 2017-01-17 Apple Inc. Voice-based media searching
US9576574B2 (en) 2012-09-10 2017-02-21 Apple Inc. Context-sensitive handling of interruptions by intelligent digital assistant
US9582608B2 (en) 2013-06-07 2017-02-28 Apple Inc. Unified ranking with entropy-weighted information for phrase-based semantic auto-completion
US9620104B2 (en) 2013-06-07 2017-04-11 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US9620105B2 (en) 2014-05-15 2017-04-11 Apple Inc. Analyzing audio input for efficient speech and music recognition
US9633004B2 (en) 2014-05-30 2017-04-25 Apple Inc. Better resolution when referencing to concepts
US9633674B2 (en) 2013-06-07 2017-04-25 Apple Inc. System and method for detecting errors in interactions with a voice-based digital assistant
US9646609B2 (en) 2014-09-30 2017-05-09 Apple Inc. Caching apparatus for serving phonetic pronunciations
US9668121B2 (en) 2014-09-30 2017-05-30 Apple Inc. Social reminders
US9697820B2 (en) 2015-09-24 2017-07-04 Apple Inc. Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks
US9697822B1 (en) 2013-03-15 2017-07-04 Apple Inc. System and method for updating an adaptive speech recognition model
US9706030B2 (en) 2007-02-22 2017-07-11 Mobile Synergy Solutions, Llc System and method for telephone communication
US9711141B2 (en) 2014-12-09 2017-07-18 Apple Inc. Disambiguating heteronyms in speech synthesis
US9715875B2 (en) 2014-05-30 2017-07-25 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US9721566B2 (en) 2015-03-08 2017-08-01 Apple Inc. Competing devices responding to voice triggers
US9721563B2 (en) 2012-06-08 2017-08-01 Apple Inc. Name recognition system
US9733821B2 (en) 2013-03-14 2017-08-15 Apple Inc. Voice control to diagnose inadvertent activation of accessibility features
US9734193B2 (en) 2014-05-30 2017-08-15 Apple Inc. Determining domain salience ranking from ambiguous words in natural speech
US9756185B1 (en) * 2014-11-10 2017-09-05 Teton1, Llc System for automated call analysis using context specific lexicon
US9760559B2 (en) 2014-05-30 2017-09-12 Apple Inc. Predictive text input
US9785630B2 (en) 2014-05-30 2017-10-10 Apple Inc. Text prediction using combined word N-gram and unigram language models
US9798393B2 (en) 2011-08-29 2017-10-24 Apple Inc. Text correction processing
US9818400B2 (en) 2014-09-11 2017-11-14 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US9842105B2 (en) 2015-04-16 2017-12-12 Apple Inc. Parsimonious continuous-space phrase representations for natural language processing
US9842101B2 (en) 2014-05-30 2017-12-12 Apple Inc. Predictive conversion of language input
US9858925B2 (en) 2009-06-05 2018-01-02 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US9865280B2 (en) 2015-03-06 2018-01-09 Apple Inc. Structured dictation using intelligent automated assistants
US9886432B2 (en) 2014-09-30 2018-02-06 Apple Inc. Parsimonious handling of word inflection via categorical stem + suffix N-gram language models
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
US9899019B2 (en) 2015-03-18 2018-02-20 Apple Inc. Systems and methods for structured stem and suffix language models
US9922642B2 (en) 2013-03-15 2018-03-20 Apple Inc. Training an at least partial voice command system
US9934775B2 (en) 2016-05-26 2018-04-03 Apple Inc. Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9946706B2 (en) 2008-06-07 2018-04-17 Apple Inc. Automatic language identification for dynamic text processing
US9959870B2 (en) 2008-12-11 2018-05-01 Apple Inc. Speech recognition involving a mobile device
US9966065B2 (en) 2014-05-30 2018-05-08 Apple Inc. Multi-command single utterance input method
US9966068B2 (en) 2013-06-08 2018-05-08 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US9972304B2 (en) 2016-06-03 2018-05-15 Apple Inc. Privacy preserving distributed evaluation framework for embedded personalized systems
US9977779B2 (en) 2013-03-14 2018-05-22 Apple Inc. Automatic supplementation of word correction dictionaries
US10002189B2 (en) 2007-12-20 2018-06-19 Apple Inc. Method and apparatus for searching using an active ontology
US10019994B2 (en) 2012-06-08 2018-07-10 Apple Inc. Systems and methods for recognizing textual identifiers within a plurality of words
US10049663B2 (en) 2016-06-08 2018-08-14 Apple, Inc. Intelligent automated assistant for media exploration
US10049668B2 (en) 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10057736B2 (en) 2011-06-03 2018-08-21 Apple Inc. Active transport based notifications
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
US10074360B2 (en) 2014-09-30 2018-09-11 Apple Inc. Providing an indication of the suitability of speech recognition
US10078631B2 (en) 2014-05-30 2018-09-18 Apple Inc. Entropy-guided text prediction using combined word and character n-gram language models
US10078487B2 (en) 2013-03-15 2018-09-18 Apple Inc. Context-sensitive handling of interruptions
US10083688B2 (en) 2015-05-27 2018-09-25 Apple Inc. Device voice control for selecting a displayed affordance
US10089072B2 (en) 2016-06-11 2018-10-02 Apple Inc. Intelligent device arbitration and control
US10101822B2 (en) 2015-06-05 2018-10-16 Apple Inc. Language input correction
US10127220B2 (en) 2015-06-04 2018-11-13 Apple Inc. Language identification from short strings
US10127911B2 (en) 2014-09-30 2018-11-13 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US10134385B2 (en) 2012-03-02 2018-11-20 Apple Inc. Systems and methods for name pronunciation
US10170123B2 (en) 2014-05-30 2019-01-01 Apple Inc. Intelligent assistant for home automation
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
US10186254B2 (en) 2015-06-07 2019-01-22 Apple Inc. Context-based endpoint detection
US10185542B2 (en) 2013-06-09 2019-01-22 Apple Inc. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US10192552B2 (en) 2016-06-10 2019-01-29 Apple Inc. Digital assistant providing whispered speech
US10199051B2 (en) 2013-02-07 2019-02-05 Apple Inc. Voice trigger for a digital assistant
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US10241752B2 (en) 2011-09-30 2019-03-26 Apple Inc. Interface for a virtual digital assistant
US10241644B2 (en) 2011-06-03 2019-03-26 Apple Inc. Actionable reminder entries
US10249300B2 (en) 2016-06-06 2019-04-02 Apple Inc. Intelligent list reading
US10255907B2 (en) 2015-06-07 2019-04-09 Apple Inc. Automatic accent detection using acoustic models
US10269345B2 (en) 2016-06-11 2019-04-23 Apple Inc. Intelligent task discovery
US10276170B2 (en) 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US10289433B2 (en) 2014-05-30 2019-05-14 Apple Inc. Domain specific language for encoding assistant dialog
US10296160B2 (en) 2013-12-06 2019-05-21 Apple Inc. Method for extracting salient dialog usage from live data
US10297253B2 (en) 2016-06-11 2019-05-21 Apple Inc. Application integration with a digital assistant
US10354011B2 (en) 2016-06-09 2019-07-16 Apple Inc. Intelligent automated assistant in a home environment
US10366158B2 (en) 2015-09-29 2019-07-30 Apple Inc. Efficient word encoding for recurrent neural network language models
US10417037B2 (en) 2012-05-15 2019-09-17 Apple Inc. Systems and methods for integrating third party services with a digital assistant
US10446143B2 (en) 2016-03-14 2019-10-15 Apple Inc. Identification of voice inputs providing credentials
US10446141B2 (en) 2014-08-28 2019-10-15 Apple Inc. Automatic speech recognition based on user feedback
US10490187B2 (en) 2016-06-10 2019-11-26 Apple Inc. Digital assistant providing automated status report
US10496753B2 (en) 2010-01-18 2019-12-03 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10509862B2 (en) 2016-06-10 2019-12-17 Apple Inc. Dynamic phrase expansion of language input
US10515147B2 (en) 2010-12-22 2019-12-24 Apple Inc. Using statistical language models for contextual lookup
US10521466B2 (en) 2016-06-11 2019-12-31 Apple Inc. Data driven natural language event detection and classification
US10540976B2 (en) 2009-06-05 2020-01-21 Apple Inc. Contextual voice commands
US10552013B2 (en) 2014-12-02 2020-02-04 Apple Inc. Data detection
US10553209B2 (en) 2010-01-18 2020-02-04 Apple Inc. Systems and methods for hands-free notification summaries
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US10572476B2 (en) 2013-03-14 2020-02-25 Apple Inc. Refining a search based on schedule items
US10592095B2 (en) 2014-05-23 2020-03-17 Apple Inc. Instantaneous speaking of content on touch devices
US10593346B2 (en) 2016-12-22 2020-03-17 Apple Inc. Rank-reduced token representation for automatic speech recognition
US10642574B2 (en) 2013-03-14 2020-05-05 Apple Inc. Device, method, and graphical user interface for outputting captions
US10652394B2 (en) 2013-03-14 2020-05-12 Apple Inc. System and method for processing voicemail
US10659851B2 (en) 2014-06-30 2020-05-19 Apple Inc. Real-time digital assistant knowledge updates
US10672399B2 (en) 2011-06-03 2020-06-02 Apple Inc. Switching between text data and audio data based on a mapping
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US10679605B2 (en) 2010-01-18 2020-06-09 Apple Inc. Hands-free list-reading by intelligent automated assistant
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US10705794B2 (en) 2010-01-18 2020-07-07 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10733993B2 (en) 2016-06-10 2020-08-04 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US10748529B1 (en) 2013-03-15 2020-08-18 Apple Inc. Voice activated device for use with a voice-based digital assistant
US10762293B2 (en) 2010-12-22 2020-09-01 Apple Inc. Using parts-of-speech tagging and named entity recognition for spelling correction
US10791176B2 (en) 2017-05-12 2020-09-29 Apple Inc. Synchronization and task delegation of a digital assistant
US10791216B2 (en) 2013-08-06 2020-09-29 Apple Inc. Auto-activating smart responses based on activities from remote devices
US10789041B2 (en) 2014-09-12 2020-09-29 Apple Inc. Dynamic thresholds for always listening speech trigger
US10810274B2 (en) 2017-05-15 2020-10-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
US10915227B1 (en) 2019-08-07 2021-02-09 Bank Of America Corporation System for adjustment of resource allocation based on multi-channel inputs
US11010550B2 (en) 2015-09-29 2021-05-18 Apple Inc. Unified language modeling framework for word prediction, auto-completion and auto-correction
US11025565B2 (en) 2015-06-07 2021-06-01 Apple Inc. Personalized prediction of responses for instant messaging
US11151899B2 (en) 2013-03-15 2021-10-19 Apple Inc. User training by intelligent digital assistant
US11587559B2 (en) 2015-09-30 2023-02-21 Apple Inc. Intelligent device identification

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2020077B (en) 1978-04-28 1983-01-12 Texas Instruments Inc Learning aid or game having miniature electronic speech synthesizer chip

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US2860187A (en) * 1955-12-08 1958-11-11 Bell Telephone Labor Inc Artificial reconstruction of speech
US3158685A (en) * 1961-05-04 1964-11-24 Bell Telephone Labor Inc Synthesis of speech from code signals
US3319002A (en) * 1963-05-24 1967-05-09 Clerk Joseph L De Electronic formant speech synthesizer
US3369077A (en) * 1964-06-09 1968-02-13 Ibm Pitch modification of audio waveforms
US3532821A (en) * 1967-11-29 1970-10-06 Hitachi Ltd Speech synthesizer
US3588353A (en) * 1968-02-26 1971-06-28 Rca Corp Speech synthesizer utilizing timewise truncation of adjacent phonemes to provide smooth formant transition

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US2860187A (en) * 1955-12-08 1958-11-11 Bell Telephone Labor Inc Artificial reconstruction of speech
US3158685A (en) * 1961-05-04 1964-11-24 Bell Telephone Labor Inc Synthesis of speech from code signals
US3319002A (en) * 1963-05-24 1967-05-09 Clerk Joseph L De Electronic formant speech synthesizer
US3369077A (en) * 1964-06-09 1968-02-13 Ibm Pitch modification of audio waveforms
US3532821A (en) * 1967-11-29 1970-10-06 Hitachi Ltd Speech synthesizer
US3588353A (en) * 1968-02-26 1971-06-28 Rca Corp Speech synthesizer utilizing timewise truncation of adjacent phonemes to provide smooth formant transition

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
J. L. Flanagan et al. Synthetic Voices for Computers, IEEE Spectrum, pp. 22 45, October 14, 1970. *
Rabiner, A Model for Synthesizing Speech by Rule, IEEE Transactions AU 17 3/69, pp. 7 13. *

Cited By (276)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4060848A (en) * 1970-12-28 1977-11-29 Gilbert Peter Hyatt Electronic calculator system having audio messages for operator interaction
US4144582A (en) * 1970-12-28 1979-03-13 Hyatt Gilbert P Voice signal processing system
US3982070A (en) * 1974-06-05 1976-09-21 Bell Telephone Laboratories, Incorporated Phase vocoder speech synthesis system
US3995116A (en) * 1974-11-18 1976-11-30 Bell Telephone Laboratories, Incorporated Emphasis controlled speech synthesizer
US4075424A (en) * 1975-12-19 1978-02-21 International Computers Limited Speech synthesizing apparatus
US4092495A (en) * 1975-12-19 1978-05-30 International Computers Limited Speech synthesizing apparatus
US4384170A (en) * 1977-01-21 1983-05-17 Forrest S. Mozer Method and apparatus for speech synthesizing
DE2854601A1 (en) * 1977-12-16 1979-06-21 Sanyo Electric Co CLAY SYNTHESIZER AND METHOD FOR CLAY PROCESSING
US4163120A (en) * 1978-04-06 1979-07-31 Bell Telephone Laboratories, Incorporated Voice synthesizer
WO1979000892A1 (en) * 1978-04-06 1979-11-15 Western Electric Co Voice synthesizer
DE3019823A1 (en) * 1979-05-29 1980-12-11 Texas Instruments Inc DATA CONVERTER AND LANGUAGE SYNTHESIS ARRANGEMENT THEREFORE
US4455551A (en) * 1980-01-08 1984-06-19 Lemelson Jerome H Synthetic speech communicating system and method
US4559602A (en) * 1983-01-27 1985-12-17 Bates Jr John K Signal processing and synthesizing method and apparatus
US5146502A (en) * 1990-02-26 1992-09-08 Davis, Van Nortwick & Company Speech pattern correction device for deaf and voice-impaired
US6591240B1 (en) * 1995-09-26 2003-07-08 Nippon Telegraph And Telephone Corporation Speech signal modification and concatenation method by gradually changing speech parameters
US6366884B1 (en) * 1997-12-18 2002-04-02 Apple Computer, Inc. Method and apparatus for improved duration modeling of phonemes
US6785652B2 (en) * 1997-12-18 2004-08-31 Apple Computer, Inc. Method and apparatus for improved duration modeling of phonemes
US6553344B2 (en) 1997-12-18 2003-04-22 Apple Computer, Inc. Method and apparatus for improved duration modeling of phonemes
US6405169B1 (en) * 1998-06-05 2002-06-11 Nec Corporation Speech synthesis apparatus
US20130080176A1 (en) * 1999-04-30 2013-03-28 At&T Intellectual Property Ii, L.P. Methods and Apparatus for Rapid Acoustic Unit Selection From a Large Speech Corpus
US8788268B2 (en) * 1999-04-30 2014-07-22 At&T Intellectual Property Ii, L.P. Speech synthesis from acoustic units with default values of concatenation cost
US9236044B2 (en) 1999-04-30 2016-01-12 At&T Intellectual Property Ii, L.P. Recording concatenation costs of most common acoustic unit sequential pairs to a concatenation cost database for speech synthesis
US9691376B2 (en) 1999-04-30 2017-06-27 Nuance Communications, Inc. Concatenation cost in speech synthesis for acoustic unit sequential pair using hash table and default concatenation cost
WO2000070799A1 (en) * 1999-05-19 2000-11-23 New Horizons Telecasting, Inc. Streaming media automation and distribution system for multi-window television programming
US20050060759A1 (en) * 1999-05-19 2005-03-17 New Horizons Telecasting, Inc. Encapsulated, streaming media automation and distribution system
US8621508B2 (en) 1999-05-19 2013-12-31 Xialan Chi Ltd., Llc Encapsulated, streaming media automation and distribution system
US6708154B2 (en) * 1999-09-03 2004-03-16 Microsoft Corporation Method and apparatus for using formant models in resonance control for speech systems
US20030097266A1 (en) * 1999-09-03 2003-05-22 Alejandro Acero Method and apparatus for using formant models in speech systems
US8645137B2 (en) 2000-03-16 2014-02-04 Apple Inc. Fast, language-independent method for user authentication by voice
US9646614B2 (en) 2000-03-16 2017-05-09 Apple Inc. Fast, language-independent method for user authentication by voice
EP1193615A2 (en) * 2000-09-28 2002-04-03 Global Language Communication Systems e.K. Electronic text translation apparatus
EP1193615A3 (en) * 2000-09-28 2005-07-13 Global Language Communication Systems e.K. Electronic text translation apparatus
US20020123130A1 (en) * 2001-03-01 2002-09-05 Cheung Ling Y. Methods and compositions for degrading polymeric compounds
US20020133349A1 (en) * 2001-03-16 2002-09-19 Barile Steven E. Matching a synthetic disc jockey's voice characteristics to the sound characteristics of audio programs
US6915261B2 (en) 2001-03-16 2005-07-05 Intel Corporation Matching a synthetic disc jockey's voice characteristics to the sound characteristics of audio programs
US8718047B2 (en) 2001-10-22 2014-05-06 Apple Inc. Text to speech conversion of text messages from mobile communication devices
US8229086B2 (en) 2003-04-01 2012-07-24 Silent Communication Ltd Apparatus, system and method for providing silently selectable audible communication
US7409347B1 (en) 2003-10-23 2008-08-05 Apple Inc. Data-driven global boundary optimization
US8015012B2 (en) 2003-10-23 2011-09-06 Apple Inc. Data-driven global boundary optimization
US20090048836A1 (en) * 2003-10-23 2009-02-19 Bellegarda Jerome R Data-driven global boundary optimization
US20100145691A1 (en) * 2003-10-23 2010-06-10 Bellegarda Jerome R Global boundary-centric feature extraction and associated discontinuity metrics
US7930172B2 (en) 2003-10-23 2011-04-19 Apple Inc. Global boundary-centric feature extraction and associated discontinuity metrics
US7643990B1 (en) 2003-10-23 2010-01-05 Apple Inc. Global boundary-centric feature extraction and associated discontinuity metrics
US8677377B2 (en) 2005-09-08 2014-03-18 Apple Inc. Method and apparatus for building an intelligent automated assistant
US10318871B2 (en) 2005-09-08 2019-06-11 Apple Inc. Method and apparatus for building an intelligent automated assistant
US9501741B2 (en) 2005-09-08 2016-11-22 Apple Inc. Method and apparatus for building an intelligent automated assistant
US9389729B2 (en) 2005-09-30 2016-07-12 Apple Inc. Automated response to and sensing of user activity in portable devices
US8614431B2 (en) 2005-09-30 2013-12-24 Apple Inc. Automated response to and sensing of user activity in portable devices
US9958987B2 (en) 2005-09-30 2018-05-01 Apple Inc. Automated response to and sensing of user activity in portable devices
US9619079B2 (en) 2005-09-30 2017-04-11 Apple Inc. Automated response to and sensing of user activity in portable devices
US9117447B2 (en) 2006-09-08 2015-08-25 Apple Inc. Using event alert text as input to an automated assistant
US8942986B2 (en) 2006-09-08 2015-01-27 Apple Inc. Determining user intent based on ontologies of domains
US8930191B2 (en) 2006-09-08 2015-01-06 Apple Inc. Paraphrasing of user requests and results by automated digital assistant
US9706030B2 (en) 2007-02-22 2017-07-11 Mobile Synergy Solutions, Llc System and method for telephone communication
US10568032B2 (en) 2007-04-03 2020-02-18 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
US8977255B2 (en) 2007-04-03 2015-03-10 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
US20080270137A1 (en) * 2007-04-27 2008-10-30 Dickson Craig B Text to speech interactive voice response system
US7895041B2 (en) * 2007-04-27 2011-02-22 Dickson Craig B Text to speech interactive voice response system
US20090048844A1 (en) * 2007-08-17 2009-02-19 Kabushiki Kaisha Toshiba Speech synthesis method and apparatus
US8175881B2 (en) * 2007-08-17 2012-05-08 Kabushiki Kaisha Toshiba Method and apparatus using fused formant parameters to generate synthesized speech
US9275631B2 (en) * 2007-09-07 2016-03-01 Nuance Communications, Inc. Speech synthesis system, speech synthesis program product, and speech synthesis method
US20130268275A1 (en) * 2007-09-07 2013-10-10 Nuance Communications, Inc. Speech synthesis system, speech synthesis program product, and speech synthesis method
US8370149B2 (en) * 2007-09-07 2013-02-05 Nuance Communications, Inc. Speech synthesis system, speech synthesis program product, and speech synthesis method
US20090070115A1 (en) * 2007-09-07 2009-03-12 International Business Machines Corporation Speech synthesis system, speech synthesis program product, and speech synthesis method
US9053089B2 (en) 2007-10-02 2015-06-09 Apple Inc. Part-of-speech tagging using latent analogy
US8620662B2 (en) 2007-11-20 2013-12-31 Apple Inc. Context-aware unit selection
US10002189B2 (en) 2007-12-20 2018-06-19 Apple Inc. Method and apparatus for searching using an active ontology
US11023513B2 (en) 2007-12-20 2021-06-01 Apple Inc. Method and apparatus for searching using an active ontology
US10381016B2 (en) 2008-01-03 2019-08-13 Apple Inc. Methods and apparatus for altering audio output signals
US9330720B2 (en) 2008-01-03 2016-05-03 Apple Inc. Methods and apparatus for altering audio output signals
US8688446B2 (en) 2008-02-22 2014-04-01 Apple Inc. Providing text input using speech data and non-speech data
US9361886B2 (en) 2008-02-22 2016-06-07 Apple Inc. Providing text input using speech data and non-speech data
US8996376B2 (en) 2008-04-05 2015-03-31 Apple Inc. Intelligent text-to-speech conversion
US9626955B2 (en) 2008-04-05 2017-04-18 Apple Inc. Intelligent text-to-speech conversion
US9865248B2 (en) 2008-04-05 2018-01-09 Apple Inc. Intelligent text-to-speech conversion
US9946706B2 (en) 2008-06-07 2018-04-17 Apple Inc. Automatic language identification for dynamic text processing
US10108612B2 (en) 2008-07-31 2018-10-23 Apple Inc. Mobile device having human language translation capability with positional feedback
US9535906B2 (en) 2008-07-31 2017-01-03 Apple Inc. Mobile device having human language translation capability with positional feedback
US8768702B2 (en) 2008-09-05 2014-07-01 Apple Inc. Multi-tiered voice feedback in an electronic device
US9691383B2 (en) 2008-09-05 2017-06-27 Apple Inc. Multi-tiered voice feedback in an electronic device
US8898568B2 (en) 2008-09-09 2014-11-25 Apple Inc. Audio user interface
US8712776B2 (en) 2008-09-29 2014-04-29 Apple Inc. Systems and methods for selective text to speech synthesis
US8583418B2 (en) 2008-09-29 2013-11-12 Apple Inc. Systems and methods of detecting language and natural language strings for text to speech synthesis
US11348582B2 (en) 2008-10-02 2022-05-31 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US9412392B2 (en) 2008-10-02 2016-08-09 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US8713119B2 (en) 2008-10-02 2014-04-29 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US10643611B2 (en) 2008-10-02 2020-05-05 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US8762469B2 (en) 2008-10-02 2014-06-24 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US8676904B2 (en) 2008-10-02 2014-03-18 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US9959870B2 (en) 2008-12-11 2018-05-01 Apple Inc. Speech recognition involving a mobile device
US8862252B2 (en) 2009-01-30 2014-10-14 Apple Inc. Audio user interface for displayless electronic device
US8751238B2 (en) 2009-03-09 2014-06-10 Apple Inc. Systems and methods for determining the language to use for speech generated by a text to speech engine
US9565551B2 (en) 2009-05-11 2017-02-07 Mobile Synergy Solutions, Llc Systems, methods, circuits and associated software for augmenting contact details stored on a communication device with data relating to the contact contained on social networking sites
US8494490B2 (en) 2009-05-11 2013-07-23 Silent Communicatin Ltd. Method, circuit, system and application for providing messaging services
US8792874B2 (en) 2009-05-11 2014-07-29 Silent Communication Ltd. Systems, methods, circuits and associated software for augmenting contact details stored on a communication device with data relating to the contact contained on social networking sites
US20100285778A1 (en) * 2009-05-11 2010-11-11 Max Bluvband Method, circuit, system and application for providing messaging services
US11080012B2 (en) 2009-06-05 2021-08-03 Apple Inc. Interface for a virtual digital assistant
US10540976B2 (en) 2009-06-05 2020-01-21 Apple Inc. Contextual voice commands
US9858925B2 (en) 2009-06-05 2018-01-02 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US10795541B2 (en) 2009-06-05 2020-10-06 Apple Inc. Intelligent organization of tasks items
US10475446B2 (en) 2009-06-05 2019-11-12 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US10283110B2 (en) 2009-07-02 2019-05-07 Apple Inc. Methods and apparatuses for automatic speech recognition
US9431006B2 (en) 2009-07-02 2016-08-30 Apple Inc. Methods and apparatuses for automatic speech recognition
US8682649B2 (en) 2009-11-12 2014-03-25 Apple Inc. Sentiment prediction from textual data
US8600743B2 (en) 2010-01-06 2013-12-03 Apple Inc. Noise profile determination for voice-related feature
US9311043B2 (en) 2010-01-13 2016-04-12 Apple Inc. Adaptive audio feedback system and method
US8670985B2 (en) 2010-01-13 2014-03-11 Apple Inc. Devices and methods for identifying a prompt corresponding to a voice input in a sequence of prompts
US10706841B2 (en) 2010-01-18 2020-07-07 Apple Inc. Task flow identification based on user intent
US8731942B2 (en) 2010-01-18 2014-05-20 Apple Inc. Maintaining context information between user interactions with a voice assistant
US8903716B2 (en) 2010-01-18 2014-12-02 Apple Inc. Personalized vocabulary for digital assistant
US10276170B2 (en) 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US11423886B2 (en) 2010-01-18 2022-08-23 Apple Inc. Task flow identification based on user intent
US10553209B2 (en) 2010-01-18 2020-02-04 Apple Inc. Systems and methods for hands-free notification summaries
US10496753B2 (en) 2010-01-18 2019-12-03 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US8660849B2 (en) 2010-01-18 2014-02-25 Apple Inc. Prioritizing selection criteria by automated assistant
US9318108B2 (en) 2010-01-18 2016-04-19 Apple Inc. Intelligent automated assistant
US10679605B2 (en) 2010-01-18 2020-06-09 Apple Inc. Hands-free list-reading by intelligent automated assistant
US9548050B2 (en) 2010-01-18 2017-01-17 Apple Inc. Intelligent automated assistant
US8706503B2 (en) 2010-01-18 2014-04-22 Apple Inc. Intent deduction based on previous user interactions with voice assistant
US8799000B2 (en) 2010-01-18 2014-08-05 Apple Inc. Disambiguation based on active input elicitation by intelligent automated assistant
US10705794B2 (en) 2010-01-18 2020-07-07 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US8892446B2 (en) 2010-01-18 2014-11-18 Apple Inc. Service orchestration for intelligent automated assistant
US8670979B2 (en) 2010-01-18 2014-03-11 Apple Inc. Active input elicitation by intelligent automated assistant
US9431028B2 (en) 2010-01-25 2016-08-30 Newvaluexchange Ltd Apparatuses, methods and systems for a digital conversation management platform
US8977584B2 (en) 2010-01-25 2015-03-10 Newvaluexchange Global Ai Llp Apparatuses, methods and systems for a digital conversation management platform
US9424861B2 (en) 2010-01-25 2016-08-23 Newvaluexchange Ltd Apparatuses, methods and systems for a digital conversation management platform
US9424862B2 (en) 2010-01-25 2016-08-23 Newvaluexchange Ltd Apparatuses, methods and systems for a digital conversation management platform
US9190062B2 (en) 2010-02-25 2015-11-17 Apple Inc. User profiling for voice input processing
US9633660B2 (en) 2010-02-25 2017-04-25 Apple Inc. User profiling for voice input processing
US10049675B2 (en) 2010-02-25 2018-08-14 Apple Inc. User profiling for voice input processing
US8682667B2 (en) 2010-02-25 2014-03-25 Apple Inc. User profiling for selecting user specific voice input processing information
US8713021B2 (en) 2010-07-07 2014-04-29 Apple Inc. Unsupervised document clustering using latent semantic density analysis
US8719006B2 (en) 2010-08-27 2014-05-06 Apple Inc. Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis
US9075783B2 (en) 2010-09-27 2015-07-07 Apple Inc. Electronic device with text error correction based on voice recognition data
US8719014B2 (en) 2010-09-27 2014-05-06 Apple Inc. Electronic device with text error correction based on voice recognition data
US10762293B2 (en) 2010-12-22 2020-09-01 Apple Inc. Using parts-of-speech tagging and named entity recognition for spelling correction
US10515147B2 (en) 2010-12-22 2019-12-24 Apple Inc. Using statistical language models for contextual lookup
US8781836B2 (en) 2011-02-22 2014-07-15 Apple Inc. Hearing assistance system for providing consistent human speech
US10102359B2 (en) 2011-03-21 2018-10-16 Apple Inc. Device access using voice authentication
US9262612B2 (en) 2011-03-21 2016-02-16 Apple Inc. Device access using voice authentication
US10241644B2 (en) 2011-06-03 2019-03-26 Apple Inc. Actionable reminder entries
US20120309363A1 (en) * 2011-06-03 2012-12-06 Apple Inc. Triggering notifications associated with tasks items that represent tasks to perform
US11120372B2 (en) 2011-06-03 2021-09-14 Apple Inc. Performing actions associated with task items that represent tasks to perform
US10706373B2 (en) 2011-06-03 2020-07-07 Apple Inc. Performing actions associated with task items that represent tasks to perform
US10255566B2 (en) 2011-06-03 2019-04-09 Apple Inc. Generating and processing task items that represent tasks to perform
US10672399B2 (en) 2011-06-03 2020-06-02 Apple Inc. Switching between text data and audio data based on a mapping
US10057736B2 (en) 2011-06-03 2018-08-21 Apple Inc. Active transport based notifications
US8812294B2 (en) 2011-06-21 2014-08-19 Apple Inc. Translating phrases from one language into another using an order-based set of declarative rules
US8706472B2 (en) 2011-08-11 2014-04-22 Apple Inc. Method for disambiguating multiple readings in language conversion
US9798393B2 (en) 2011-08-29 2017-10-24 Apple Inc. Text correction processing
US8762156B2 (en) 2011-09-28 2014-06-24 Apple Inc. Speech recognition repair using contextual information
US10241752B2 (en) 2011-09-30 2019-03-26 Apple Inc. Interface for a virtual digital assistant
US10134385B2 (en) 2012-03-02 2018-11-20 Apple Inc. Systems and methods for name pronunciation
US9483461B2 (en) 2012-03-06 2016-11-01 Apple Inc. Handling speech synthesis of content for multiple languages
US9280610B2 (en) 2012-05-14 2016-03-08 Apple Inc. Crowd sourcing information to fulfill user requests
US9953088B2 (en) 2012-05-14 2018-04-24 Apple Inc. Crowd sourcing information to fulfill user requests
US10417037B2 (en) 2012-05-15 2019-09-17 Apple Inc. Systems and methods for integrating third party services with a digital assistant
US8775442B2 (en) 2012-05-15 2014-07-08 Apple Inc. Semantic search using a single-source semantic model
US10079014B2 (en) 2012-06-08 2018-09-18 Apple Inc. Name recognition system
US9721563B2 (en) 2012-06-08 2017-08-01 Apple Inc. Name recognition system
US10019994B2 (en) 2012-06-08 2018-07-10 Apple Inc. Systems and methods for recognizing textual identifiers within a plurality of words
US9495129B2 (en) 2012-06-29 2016-11-15 Apple Inc. Device, method, and user interface for voice-activated navigation and browsing of a document
US9576574B2 (en) 2012-09-10 2017-02-21 Apple Inc. Context-sensitive handling of interruptions by intelligent digital assistant
US9547647B2 (en) 2012-09-19 2017-01-17 Apple Inc. Voice-based media searching
US9971774B2 (en) 2012-09-19 2018-05-15 Apple Inc. Voice-based media searching
US8935167B2 (en) 2012-09-25 2015-01-13 Apple Inc. Exemplar-based latent perceptual modeling for automatic speech recognition
US10199051B2 (en) 2013-02-07 2019-02-05 Apple Inc. Voice trigger for a digital assistant
US10978090B2 (en) 2013-02-07 2021-04-13 Apple Inc. Voice trigger for a digital assistant
US11388291B2 (en) 2013-03-14 2022-07-12 Apple Inc. System and method for processing voicemail
US10642574B2 (en) 2013-03-14 2020-05-05 Apple Inc. Device, method, and graphical user interface for outputting captions
US9977779B2 (en) 2013-03-14 2018-05-22 Apple Inc. Automatic supplementation of word correction dictionaries
US10652394B2 (en) 2013-03-14 2020-05-12 Apple Inc. System and method for processing voicemail
US9733821B2 (en) 2013-03-14 2017-08-15 Apple Inc. Voice control to diagnose inadvertent activation of accessibility features
US9368114B2 (en) 2013-03-14 2016-06-14 Apple Inc. Context-sensitive handling of interruptions
US10572476B2 (en) 2013-03-14 2020-02-25 Apple Inc. Refining a search based on schedule items
US9922642B2 (en) 2013-03-15 2018-03-20 Apple Inc. Training an at least partial voice command system
US11151899B2 (en) 2013-03-15 2021-10-19 Apple Inc. User training by intelligent digital assistant
US9697822B1 (en) 2013-03-15 2017-07-04 Apple Inc. System and method for updating an adaptive speech recognition model
US10748529B1 (en) 2013-03-15 2020-08-18 Apple Inc. Voice activated device for use with a voice-based digital assistant
US10078487B2 (en) 2013-03-15 2018-09-18 Apple Inc. Context-sensitive handling of interruptions
US9582608B2 (en) 2013-06-07 2017-02-28 Apple Inc. Unified ranking with entropy-weighted information for phrase-based semantic auto-completion
US9633674B2 (en) 2013-06-07 2017-04-25 Apple Inc. System and method for detecting errors in interactions with a voice-based digital assistant
US9966060B2 (en) 2013-06-07 2018-05-08 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US9620104B2 (en) 2013-06-07 2017-04-11 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US9966068B2 (en) 2013-06-08 2018-05-08 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US10657961B2 (en) 2013-06-08 2020-05-19 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US10185542B2 (en) 2013-06-09 2019-01-22 Apple Inc. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
US9300784B2 (en) 2013-06-13 2016-03-29 Apple Inc. System and method for emergency calls initiated by voice command
US10791216B2 (en) 2013-08-06 2020-09-29 Apple Inc. Auto-activating smart responses based on activities from remote devices
US10296160B2 (en) 2013-12-06 2019-05-21 Apple Inc. Method for extracting salient dialog usage from live data
US9620105B2 (en) 2014-05-15 2017-04-11 Apple Inc. Analyzing audio input for efficient speech and music recognition
US10592095B2 (en) 2014-05-23 2020-03-17 Apple Inc. Instantaneous speaking of content on touch devices
US9502031B2 (en) 2014-05-27 2016-11-22 Apple Inc. Method for supporting dynamic grammars in WFST-based ASR
US9633004B2 (en) 2014-05-30 2017-04-25 Apple Inc. Better resolution when referencing to concepts
US10497365B2 (en) 2014-05-30 2019-12-03 Apple Inc. Multi-command single utterance input method
US9715875B2 (en) 2014-05-30 2017-07-25 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US9966065B2 (en) 2014-05-30 2018-05-08 Apple Inc. Multi-command single utterance input method
US9785630B2 (en) 2014-05-30 2017-10-10 Apple Inc. Text prediction using combined word N-gram and unigram language models
US9760559B2 (en) 2014-05-30 2017-09-12 Apple Inc. Predictive text input
US11133008B2 (en) 2014-05-30 2021-09-28 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US9430463B2 (en) 2014-05-30 2016-08-30 Apple Inc. Exemplar-based natural language processing
US10078631B2 (en) 2014-05-30 2018-09-18 Apple Inc. Entropy-guided text prediction using combined word and character n-gram language models
US11257504B2 (en) 2014-05-30 2022-02-22 Apple Inc. Intelligent assistant for home automation
US10083690B2 (en) 2014-05-30 2018-09-25 Apple Inc. Better resolution when referencing to concepts
US10289433B2 (en) 2014-05-30 2019-05-14 Apple Inc. Domain specific language for encoding assistant dialog
US10170123B2 (en) 2014-05-30 2019-01-01 Apple Inc. Intelligent assistant for home automation
US10169329B2 (en) 2014-05-30 2019-01-01 Apple Inc. Exemplar-based natural language processing
US9734193B2 (en) 2014-05-30 2017-08-15 Apple Inc. Determining domain salience ranking from ambiguous words in natural speech
US9842101B2 (en) 2014-05-30 2017-12-12 Apple Inc. Predictive conversion of language input
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
US9668024B2 (en) 2014-06-30 2017-05-30 Apple Inc. Intelligent automated assistant for TV user interactions
US10904611B2 (en) 2014-06-30 2021-01-26 Apple Inc. Intelligent automated assistant for TV user interactions
US10659851B2 (en) 2014-06-30 2020-05-19 Apple Inc. Real-time digital assistant knowledge updates
US10446141B2 (en) 2014-08-28 2019-10-15 Apple Inc. Automatic speech recognition based on user feedback
US10431204B2 (en) 2014-09-11 2019-10-01 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US9818400B2 (en) 2014-09-11 2017-11-14 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US10789041B2 (en) 2014-09-12 2020-09-29 Apple Inc. Dynamic thresholds for always listening speech trigger
US9646609B2 (en) 2014-09-30 2017-05-09 Apple Inc. Caching apparatus for serving phonetic pronunciations
US9986419B2 (en) 2014-09-30 2018-05-29 Apple Inc. Social reminders
US9668121B2 (en) 2014-09-30 2017-05-30 Apple Inc. Social reminders
US9886432B2 (en) 2014-09-30 2018-02-06 Apple Inc. Parsimonious handling of word inflection via categorical stem + suffix N-gram language models
US10127911B2 (en) 2014-09-30 2018-11-13 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US10074360B2 (en) 2014-09-30 2018-09-11 Apple Inc. Providing an indication of the suitability of speech recognition
US9756185B1 (en) * 2014-11-10 2017-09-05 Teton1, Llc System for automated call analysis using context specific lexicon
US11556230B2 (en) 2014-12-02 2023-01-17 Apple Inc. Data detection
US10552013B2 (en) 2014-12-02 2020-02-04 Apple Inc. Data detection
US9711141B2 (en) 2014-12-09 2017-07-18 Apple Inc. Disambiguating heteronyms in speech synthesis
US9865280B2 (en) 2015-03-06 2018-01-09 Apple Inc. Structured dictation using intelligent automated assistants
US11087759B2 (en) 2015-03-08 2021-08-10 Apple Inc. Virtual assistant activation
US10311871B2 (en) 2015-03-08 2019-06-04 Apple Inc. Competing devices responding to voice triggers
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US9721566B2 (en) 2015-03-08 2017-08-01 Apple Inc. Competing devices responding to voice triggers
US9899019B2 (en) 2015-03-18 2018-02-20 Apple Inc. Systems and methods for structured stem and suffix language models
US9842105B2 (en) 2015-04-16 2017-12-12 Apple Inc. Parsimonious continuous-space phrase representations for natural language processing
US10083688B2 (en) 2015-05-27 2018-09-25 Apple Inc. Device voice control for selecting a displayed affordance
US10127220B2 (en) 2015-06-04 2018-11-13 Apple Inc. Language identification from short strings
US10101822B2 (en) 2015-06-05 2018-10-16 Apple Inc. Language input correction
US11025565B2 (en) 2015-06-07 2021-06-01 Apple Inc. Personalized prediction of responses for instant messaging
US10255907B2 (en) 2015-06-07 2019-04-09 Apple Inc. Automatic accent detection using acoustic models
US10186254B2 (en) 2015-06-07 2019-01-22 Apple Inc. Context-based endpoint detection
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US11500672B2 (en) 2015-09-08 2022-11-15 Apple Inc. Distributed personal assistant
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US9697820B2 (en) 2015-09-24 2017-07-04 Apple Inc. Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks
US10366158B2 (en) 2015-09-29 2019-07-30 Apple Inc. Efficient word encoding for recurrent neural network language models
US11010550B2 (en) 2015-09-29 2021-05-18 Apple Inc. Unified language modeling framework for word prediction, auto-completion and auto-correction
US11587559B2 (en) 2015-09-30 2023-02-21 Apple Inc. Intelligent device identification
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US11526368B2 (en) 2015-11-06 2022-12-13 Apple Inc. Intelligent automated assistant in a messaging environment
US10049668B2 (en) 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US10446143B2 (en) 2016-03-14 2019-10-15 Apple Inc. Identification of voice inputs providing credentials
US9934775B2 (en) 2016-05-26 2018-04-03 Apple Inc. Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9972304B2 (en) 2016-06-03 2018-05-15 Apple Inc. Privacy preserving distributed evaluation framework for embedded personalized systems
US10249300B2 (en) 2016-06-06 2019-04-02 Apple Inc. Intelligent list reading
US11069347B2 (en) 2016-06-08 2021-07-20 Apple Inc. Intelligent automated assistant for media exploration
US10049663B2 (en) 2016-06-08 2018-08-14 Apple, Inc. Intelligent automated assistant for media exploration
US10354011B2 (en) 2016-06-09 2019-07-16 Apple Inc. Intelligent automated assistant in a home environment
US10733993B2 (en) 2016-06-10 2020-08-04 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10490187B2 (en) 2016-06-10 2019-11-26 Apple Inc. Digital assistant providing automated status report
US10192552B2 (en) 2016-06-10 2019-01-29 Apple Inc. Digital assistant providing whispered speech
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
US10509862B2 (en) 2016-06-10 2019-12-17 Apple Inc. Dynamic phrase expansion of language input
US11037565B2 (en) 2016-06-10 2021-06-15 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10521466B2 (en) 2016-06-11 2019-12-31 Apple Inc. Data driven natural language event detection and classification
US10089072B2 (en) 2016-06-11 2018-10-02 Apple Inc. Intelligent device arbitration and control
US11152002B2 (en) 2016-06-11 2021-10-19 Apple Inc. Application integration with a digital assistant
US10269345B2 (en) 2016-06-11 2019-04-23 Apple Inc. Intelligent task discovery
US10297253B2 (en) 2016-06-11 2019-05-21 Apple Inc. Application integration with a digital assistant
US10593346B2 (en) 2016-12-22 2020-03-17 Apple Inc. Rank-reduced token representation for automatic speech recognition
US11405466B2 (en) 2017-05-12 2022-08-02 Apple Inc. Synchronization and task delegation of a digital assistant
US10791176B2 (en) 2017-05-12 2020-09-29 Apple Inc. Synchronization and task delegation of a digital assistant
US10810274B2 (en) 2017-05-15 2020-10-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
US10915227B1 (en) 2019-08-07 2021-02-09 Bank Of America Corporation System for adjustment of resource allocation based on multi-channel inputs

Also Published As

Publication number Publication date
JPS539041B1 (en) 1978-04-03
DE2115258B2 (en) 1973-06-07
CA941968A (en) 1974-02-12
DE2115258A1 (en) 1972-05-10
DE2115258C3 (en) 1974-01-24

Similar Documents

Publication Publication Date Title
US3828132A (en) Speech synthesis by concatenation of formant encoded words
US4912768A (en) Speech encoding process combining written and spoken message codes
JP3408477B2 (en) Semisyllable-coupled formant-based speech synthesizer with independent crossfading in filter parameters and source domain
EP0831460B1 (en) Speech synthesis method utilizing auxiliary information
US20040073427A1 (en) Speech synthesis apparatus and method
US5400434A (en) Voice source for synthetic speech system
US5682502A (en) Syllable-beat-point synchronized rule-based speech synthesis from coded utterance-speed-independent phoneme combination parameters
JPH0833744B2 (en) Speech synthesizer
Rabiner et al. Computer synthesis of speech by concatenation of formant-coded words
US5659664A (en) Speech synthesis with weighted parameters at phoneme boundaries
US5163110A (en) Pitch control in artificial speech
JP5175422B2 (en) Method for controlling time width in speech synthesis
US7130799B1 (en) Speech synthesis method
GB2284328A (en) Speech synthesis
EP1543500A1 (en) Speech synthesis using concatenation of speech waveforms
JPH0642158B2 (en) Speech synthesizer
JPH11249676A (en) Voice synthesizer
KR970003092B1 (en) Method for constituting speech synthesis unit and sentence speech synthesis method
JP2573586B2 (en) Rule-based speech synthesizer
JPH0863187A (en) Speech synthesizer
JPH08160991A (en) Method for generating speech element piece, and method and device for speech synthesis
Eady et al. Pitch assignment rules for speech synthesis by word concatenation
JPH05257494A (en) Voice rule synthesizing system
JP2002244693A (en) Device and method for voice synthesis
JP3133347B2 (en) Prosody control device