رسالة جامعية

Relative music loudness estimation in TV broadcast audio using deep learning: an industrial perspective

التفاصيل البيبلوغرافية
العنوان: Relative music loudness estimation in TV broadcast audio using deep learning: an industrial perspective
المؤلفون: Meléndez Catalán, Blai
المساهمون: Molina, Emilio, Gómez Gutiérrez, Emilia, Universitat Pompeu Fabra. Departament de Tecnologies de la Informació i les Comunicacions
المصدر: TDX (Tesis Doctorals en Xarxa)
بيانات النشر: Universitat Pompeu Fabra
سنة النشر: 2021
المجموعة: Tesis Doctorals de la Universitat d'Andorra (TDX)
مصطلحات موضوعية: Music detection, Relative music loudness estimation, Deep learning, Copyright management industry, Public dataset, Annotation tool, Convolutional neural networks, Temporal convolutional networks, TV broadcast audio, Audio processing, Detecció de música, Estimació del volum relatiu de la música, Aprenentatge profund, Indústria del dret d’autor, Conjunt de dades públic, Eina d’anotació, Xarxes neuronals convolucionals, Xarxes temporals convolucionals, Àudio emès per TV, Processament d’àudio
الوقت: 62
الوصف: Under the current copyright management business model, broadcasters are taxed by the corresponding copyright management organization according to the percentage of music they broadcast, and the collected money is then distributed among the copyright holders of that music. In the specific case of TV broadcasts, whether a musical piece is played in the foreground or the background is often a relevant factor that affects the amount of money collected and distributed. In recent years, the music industry is increasingly adopting technological solutions to automatize this process. We have conducted this industrial PhD at BMAT, a company that has an active role in providing these solutions: since 2015, this company has been offering a service that currently monitors about 4300 radio stations and TV channels to automatically detect the presence of music, and to classify it as foreground or background music. We name this task relative music loudness estimation. From an industrial point of view, this thesis focuses on the improvement of the technology behind the service; and from the academic point of view, it pursues the introduction and promotion of the task in the research field of music information retrieval, and provides computational approaches to it. The industrial and academic contributions of this thesis result from logical steps towards these goals. We first create BAT: a new open-source, web-based tool for the efficient annotation of audio events and their partial loudness in the presence of other simultaneous events. We use BAT to annotate two datasets: one private and the other public. We use the private dataset for training in the development of BMAT's new relative music loudness estimation algorithm called the Deep Music Detector. The Deep Music Detector represents the first application of deep learning within BMAT, and provides a significant boost in performance with respect to its predecessor. The public dataset, called OpenBMAT, is released in order to foster transparent, comparable and reproducible ...
نوع الوثيقة: doctoral or postdoctoral thesis
وصف الملف: 166 p.; application/pdf
اللغة: English
العلاقة: http://hdl.handle.net/10803/671425Test
الإتاحة: http://hdl.handle.net/10803/671425Test
حقوق: L'accés als continguts d'aquesta tesi queda condicionat a l'acceptació de les condicions d'ús establertes per la següent llicència Creative Commons: http://creativecommons.org/licenses/by-nc-sa/4.0Test/ ; http://creativecommons.org/licenses/by-nc-sa/4.0Test/ ; info:eu-repo/semantics/openAccess
رقم الانضمام: edsbas.B1AD61A0
قاعدة البيانات: BASE