Apparatus and method for transforming audio characteristics of an audio recording

التفاصيل البيبلوغرافية
العنوان: Apparatus and method for transforming audio characteristics of an audio recording
Patent Number: 8,825,483
تاريخ النشر: September 02, 2014
Appl. No: 12/375792
Application Filed: October 17, 2007
مستخلص: A method of audio processing comprises composing one or more transformation profiles for transforming audio characteristics of an audio recording and then generating for the or each transformation profile, a metadata set comprising transformation profile data and location data indicative of where in the recording the transformation profile data is to be applied; the or each metadata set is then stored in association with the corresponding recording. A corresponding method of audio reproduction comprises reading a recording and a meta-data set associated with that recording from storage, applying transformations to the recording data in accordance with the metadata set transformation profile; and then outputting the transformed recording.
Inventors: Bardino, Daniele Giuseppe (London, GB); Griffiths, Richard James (London, GB)
Assignees: Sony Computer Entertainment Europe Limited (GB)
Claim: 1. A method of audio processing comprising the steps of: composing one or more transformation profiles for transforming audio characteristics of an audio recording; generating, for each of the one or more transformation profiles, a metadata set comprising respective transformation profile data and location data indicative of where in the recording the transformation profile data is to be applied; storing each metadata set in association with the corresponding recording; selecting which associated metadata set is to be read from storage according to a degree of correspondence between emotion data in the metadata sets and one or more current emotion parameters in an application; applying random variations to the emotion data in the selected metadata sets; prior to composing the one or more transformation profiles, identifying locations of speech syllables in the recording; and adjusting one or more of intensity, pitch and duration transformation profiles extracted from the selected metadata.
Claim: 2. A method of audio processing according to claim 1 in which each transformation profile comprises at least one sequence of predefined profile elements whose parameters are adjustable by a user.
Claim: 3. A method of audio processing according to claim 2 in which at least one of the predefined profile elements is selected from a list consisting of: i. uniform alteration of amplitude, pitch or duration; ii. ramp-up change in amplitude or pitch; iii. ramp-down change in amplitude or pitch; iv. peaked change in amplitude or pitch; v. point change in amplitude; and vi. non-linear alteration in duration.
Claim: 4. A method of audio processing according to claim 1 in which each transformation profile comprises at least one user-defined profile.
Claim: 5. The method of claim 1 , further comprising interpolating pitch profile elements between the speech syllables to avoid exceeding a defined pitch gradient threshold.
Claim: 6. A method of audio processing according to claim 1 in which identifying the locations of speech syllables in the recording is performed by a hidden Markov model.
Claim: 7. A method of audio processing according to claim 1 in which identifying the locations of speech syllables in the recording is performed by a comb filter, the comb filter being configured to detect instances of voiced harmonics.
Claim: 8. A method of audio processing according to claim 1 , further comprising selecting a predefined profile element for use in a given one of the one or more transformation profiles to be applied to a segment of the recording corresponding to a given one of the identified syllables.
Claim: 9. A method of audio processing according to claim 1 , further comprising arranging recorded dialogue into lines.
Claim: 10. A method of audio processing according to claim 1 , further comprising constraining each transformation profile to substantially maintain a relative formant structure of speech within the recording upon transformation.
Claim: 11. A method of audio processing according to claim 1 in which the metadata set further comprises at least a first tag indicative of an emotion conveyed by the recording when modified according to the transformation profile of the metadata set.
Claim: 12. A method of audio processing according to claim 11 where the first tag indicates one or more selected from a list consisting of: i. an emotion state within a preset list of emotion states; and ii. a value on a scale indicating the positive or negative extent of an emotion state.
Claim: 13. A method of audio processing according to claim 1 , further comprising: reading, from storage, the recording and the metadata set; applying transformations to the recording in accordance with the one or more transformation profiles to obtain a transformed recording; and outputting the transformed recording.
Claim: 14. Audio processing apparatus, comprising: composition means for composing one or more transformation profiles for transforming audio characteristics of an audio recording; metadata set generation means for generating, for each of the one or more transformation profiles, a metadata set comprising respective transformation profile data and location data indicative of where in the recording the transformation profile data is to be applied; storage writing means for storing each metadata set in association with the corresponding recording, and means for selecting which associated metadata set is to be read from storage according to a degree of correspondence between emotion data in the metadata sets and one or more current emotion parameters in an application and for applying random variations to the emotion data in the selected metadata sets; the audio processing apparatus being configured to, prior to composing the one or more transformation profiles, identify locations of speech syllables in the recording and adjust one or more of intensity, pitch and duration transformation profiles extracted from the selected metadata.
Claim: 15. A method of audio reproduction, comprising the steps of: reading from storage a recording and a metadata set associated with said recording, in which the metadata set comprises a transformation profile, the metadata set being read from storage according to a degree of correspondence between emotion data in the metadata set and one or more current emotion parameters in an application; applying random variations to the emotion data in the metadata set; identifying locations of speech syllables in the recording; applying transformations to the recording in accordance with said transformation profile, including adjusting one or more of intensity, pitch and duration transformation profiles extracted from the read metadata; and outputting the transformed recording.
Claim: 16. A method of audio reproduction according to claim 15 in which transformations are applied to one or more characteristics of the recording selected from a list consisting of: i. amplitude; ii. pitch; and iii. duration.
Claim: 17. A method of audio reproduction according to claim 15 in which the transformation profile comprises one or more predefined profile elements, wherein at least some of the predefined profile elements are selected from a list consisting of: i. uniform alteration of amplitude, pitch or duration; ii. ramp-up change in amplitude or pitch; iii. ramp-down change in amplitude or pitch; iv. peaked change in amplitude or pitch; v. point change in amplitude; and vi. non-linear alteration in duration.
Claim: 18. A method of audio reproduction according to claim 15 in which a transformation profile comprises at least one user-defined profile.
Claim: 19. A method of audio reproduction according to claim 15 , further comprising selecting one metadata set based upon a respective emotion tag of the metadata set from among a plurality of metadata sets associated with a recording.
Claim: 20. A method of audio reproduction according to claim 19 in which the emotion tag indicates a specific emotion conveyed by a recording when modified according to the transformation profile of the corresponding metadata set.
Claim: 21. A method of audio reproduction according to claim 19 in which the emotion tag is a value on an emotional scale indicative of degree of positive or negative emotion conveyed in a recording when modified according to the transformation profile of the corresponding metadata set.
Claim: 22. A method of audio reproduction according to claim 15 , further comprising modifying lip synchronisation of a video game character according to transformation profile data relating to changes in duration when dialogue being delivered by the video game character is also modified according to said transformation profile data.
Claim: 23. A method of audio reproduction according to claim 15 , further comprising modifying facial animation of a video game character according to transformation profile data relating to changes in any or all of amplitude and pitch when dialogue being delivered by the video game character is also modified according to said transformation profile data.
Claim: 24. A method of audio reproduction according to claim 15 , further comprising modifying an expression of a video game character according to an emotion tag of a selected metadata set when dialogue being delivered by the video game character is also modified according to transformation profile data associated with the selected metadata set.
Claim: 25. A method of audio reproduction according to claim 15 , further comprising altering one or more values of a transformation profile prior to applying transformations to the recording, according to a value of one or more parameters of a video-game outputting the recording.
Claim: 26. A method of audio reproduction according to claim 15 , further comprising randomly altering one or more values of the transformation profile prior to applying transformations to the recording.
Claim: 27. A method of audio reproduction according to claim 26 in which any or all of: i. a degree of random change; and ii. a number of random changes, is dependent upon a duration of game-play from a last re-load of a video-game that is outputting the recording.
Claim: 28. A method of audio reproduction according to claim 15 , further comprising randomly composing a given transformation profile from one or more available predefined profile elements.
Claim: 29. A method of audio reproduction according to claim 15 , further comprising constraining any changes to the transformation profile to substantially maintain a relative formant structure of speech within the recording upon transformation.
Claim: 30. Audio reproduction apparatus, comprising: storage reading means for reading, from storage, an audio recording and a metadata set, the metadata set configured to be read from storage according to a degree of correspondence between emotion data in the metadata set and one or more current emotion parameters in an application; means for applying random variations to the emotion data in the metadata set; transformation processing means for applying transformations to the recording in accordance with the transformation profile, including adjusting one or more of intensity, pitch and duration transformation profiles extracted from the read metadata; and audio output means for outputting the transformed recording.
Claim: 31. A non-transitory computer-readable recording medium storing computer readable instructions thereon, the instructions, when executed by a computer, cause the computer to carry out the method of audio processing according to claim 1 .
Claim: 32. A non-transitory computer-readable recording medium storing computer readable instructions thereon, the instructions, when executed by a computer, cause the computer to carry out the method of audio reproduction according to claim 15 .
Claim: 33. Audio processing apparatus comprising: a profile composer to compose one or more transformation profiles for transforming audio characteristics of an audio recording; a generator to generate, for each transformation profile, a metadata set comprising respective transformation profile data and location data indicative of where in the recording the transformation profile data is to be applied, the metadata set including emotion data; a metadata store to store each metadata set in association with the corresponding recording; means for selecting which metadata set is to be read from the metadata store according to a degree of correspondence between emotion data in the metadata sets and one or more current emotion parameters in an application; and means for applying random variations to the emotion data in the selected metadata sets; wherein the apparatus is configured, prior to composing the one or more transformation profiles by the profile composer, to identify locations of speech syllables in the recording.
Claim: 34. Audio reproduction apparatus, comprising: a storage reader to read from storage a recording and a metadata set associated with said recording, in which the metadata set comprises a transformation profile, the metadata set being read from storage according to a degree of correspondence between emotion data in the metadata set and one or more current emotion parameters in an application; means for applying random variations to the emotion data in the metadata set; a transformer to apply transformations to the recording in accordance with said transformation profile; and an output to output the transformed recording.
Current U.S. Class: 704/258
Patent References Cited: 5113449 May 1992 Blanton et al.
5630017 May 1997 Gasper et al.
5687240 November 1997 Yoshida et al.
5749073 May 1998 Slaney
6446040 September 2002 Socher et al.
6539354 March 2003 Sutton et al.
7065490 June 2006 Asano et al.
7379872 May 2008 Cabezas et al.
7478047 January 2009 Loyall et al.
7865365 January 2011 Anglin et al.
7983910 July 2011 Subramanian et al.
2002/0173962 November 2002 Tang et al.
2003/0110026 June 2003 Minoru
2003/0158721 August 2003 Kato et al.
2003/0158728 August 2003 Bi et al.
2004/0143438 July 2004 Cabezas et al.
2004/0166918 August 2004 Walker et al.
2005/0065784 March 2005 McAulay et al.
2005/0223078 October 2005 Sato et al.
2006/0095265 May 2006 Chu et al.
2006/0136213 June 2006 Hirose et al.
2006/0143000 June 2006 Setoguchi
2006/0259303 November 2006 Bakis
2007/0233472 October 2007 Sinder et al.
2007/0233489 October 2007 Hirose et al.
2008/0147413 June 2008 Sobol-Shikler
2008/0235024 September 2008 Goldberg et al.
2370954 July 2002
2001009157 January 2001
2001034282 February 2001
2001333378 November 2001
2002113262 April 2002
2002351489 December 2002
2003036100 February 2003
2004313767 November 2004
2005065191 March 2005
2005210196 August 2005
2006079712 March 2006









Other References: Preliminary Report on Patentability dated Jun. 10, 2009, for corresponding International Application PCT/GB2007/003956. cited by applicant
European Communication pursuant to Article 94(3) EPC dated Dec. 9, 2009, from the corresponding European Application. cited by applicant
http://manuals.info.apple.com/en/logic/LogicTest—Pro—7—Reference—Manual.pdf, section 7, 2004. cited by applicant
Notification of Transmittal of the International Preliminary Report on Patentability dated Oct. 17, 2008, for corresponding International Application PCT/GB2007/003956. cited by applicant
Notification of Transmittal of the International Search Report, The International Search Report and the Written Opinion of the International Searching Authority dated Feb. 21, 2008, for corresponding International Application PCT/GB2007/003956. cited by applicant
Combined Search and Examination Report under Sections 17 and 18(3) dated Sep. 25, 2007, for corresponding British Application GB0620829.2. cited by applicant
Japanese Office Action for Application No. 2009-532891 dated Mar. 12, 2013. cited by applicant
Japanese Office Action for Application No. JP2013-145999 dated Nov. 12, 2013. cited by applicant
Lawrence R. Rabiner, ‘A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition’, Proceedings of the IEEE, 77 (2), p. 257-286 Feb. 1989; in particular, p. 275-276. cited by applicant
Japanese Office Action for Application No. 2013-145999 dated Jun. 3, 2014. cited by applicant
Primary Examiner: He, Jialong
Attorney, Agent or Firm: Lerner, David, Littenberg, Krumholz & Mentlik, LLP
رقم الانضمام: edspgr.08825483
قاعدة البيانات: USPTO Patent Grants