دورية أكاديمية

Weighted Graph Compression for Parameter-free Clustering With PaCCo

التفاصيل البيبلوغرافية
العنوان: Weighted Graph Compression for Parameter-free Clustering With PaCCo
المؤلفون: Nikola S. Mueller, Katrin Haegler, Junming Shao, Claudia Plant, Christian Böhm
المساهمون: The Pennsylvania State University CiteSeerX Archives
المصدر: http://siam.omnibooksonline.com/2011datamining/data/papers/040.pdfTest.
المجموعة: CiteSeerX
مصطلحات موضوعية: Clustering, Weighted Graph, Data Compression, Biological Network ∗These authors contributed equally to this work
الوصف: Object similarities are now more and more characterized by connectivity information available in form of network or graph data. Complex graph data arises in various fields like e-commerce, social networks, high throughput biological analysis etc. The generated interaction information for objects is often not simply binary but rather associated with interaction strength which are in turn represented as edge weights in graphs. The identification of groups of highly connected nodes is an important task and results in valuable knowledge of the data set as a whole. Many popular clustering techniques are designed for vector or unweighted graph data, and can thus not be directly applied for weighted graphs. In this paper, we propose a novel clustering algorithm for weighted graphs, called PaCCo (Parameter-free C lustering by Coding costs), which is based on the Minimum Description Length (MDL) principle in combination with a bisecting k-Means strategy. MDL relates the clustering problem to the problem of data compression: A good cluster structure on graphs enables strong graph compression. The compression efficiency depends on the underlying edges which constitute the graph connectivity. The compression rate serves as similarity or distance metric for nodes. The MDL principle ensures that our algorithm is parameter free (automatically finds the number of clusters) and avoids restrictive assumptions that no information on the data is required. We systematically evaluate our clustering approach PaCCo on synthetic as well as on real data to demonstrate the superiority of our developed algorithm over existing approaches.
نوع الوثيقة: text
وصف الملف: application/pdf
اللغة: English
العلاقة: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.231.3379Test; http://siam.omnibooksonline.com/2011datamining/data/papers/040.pdfTest
الإتاحة: http://siam.omnibooksonline.com/2011datamining/data/papers/040.pdfTest
حقوق: Metadata may be used without restrictions as long as the oai identifier remains attached to it.
رقم الانضمام: edsbas.31BB1977
قاعدة البيانات: BASE