A BWT-based algorithm for random de Bruijn sequence construction

التفاصيل البيبلوغرافية
العنوان: A BWT-based algorithm for random de Bruijn sequence construction
المؤلفون: Zsuzsanna Liptak, Luca Parmigiani
المساهمون: Liptak, Zsuzsanna, Parmigiani, Luca
سنة النشر: 2024
المجموعة: Università degli Studi di Verona: Catalogo dei Prodotti della Ricerca (IRIS)
مصطلحات موضوعية: De Bruijn sequence, Burrows-Wheeler Transform, extended BWT, random generation, spanning tree, standard permutation, Lyndon words
الوصف: A binary de Bruijn sequence (dB sequence) of order k is a circular binary string that contains each k-length word exactly once as a substring. Most existing algorithms construct a specific dB sequence, or members of a specific class of dB sequences, representing only a tiny frac- tion of the complete set. The only algorithms capable of generating all dB sequences are based on finding Euler cycles in de Bruijn graphs. Here, we present an algorithm for constructing random binary dB sequences which uses the extended Burrows-Wheeler Transform. Our method is simple to implement (less than 120 lines of C++ code) and can produce random dB sequences of any order. Even though it does not output dB sequences uniformly at random, it provably outputs each dB sequence with positive probability. The algorithm runs in linear space and near-linear time in the length of the dB sequence and needs less than one second on a lap- top computer for orders up to 23, including outputting the sequence. It can be straightforwardly extended to any constant-size alphabet. To the best of our knowledge, this is the first practical algorithm for generating random dB sequences which is capable of producing all dB sequences. Apart from its immediate usefulness in contexts where it is desirable to use a dB sequence that cannot be guessed easily, we also demonstrate our algorithm’s potential in theoretical studies, giving hitherto unknown estimates of the average discrepancy of binary dB sequences. The code is availabe (in C++ and python) at https://github.com/lucaparmigianiTest/ rnd dbseq.
نوع الوثيقة: book part
وصف الملف: STAMPA
اللغة: English
العلاقة: ispartofbook:Proc. of the 16th Latin American Symposium on Theoretical Informatics (LATIN 2024); volume:14578; firstpage:1; lastpage:15; numberofpages:15; serie:LECTURE NOTES IN COMPUTER SCIENCE; https://hdl.handle.net/11562/1120694Test
الإتاحة: https://hdl.handle.net/11562/1120694Test
حقوق: info:eu-repo/semantics/restrictedAccess
رقم الانضمام: edsbas.5789D1A7
قاعدة البيانات: BASE