H-Prop and H-Prop-News Propaganda Datasets in Hindi
العنوان: | H-Prop and H-Prop-News Propaganda Datasets in Hindi |
---|---|
المؤلفون: | Chaudhari, Deptii, Pawar, Ambika, Barrón-Cedeno, Alberto |
سنة النشر: | 2022 |
المجموعة: | Zenodo |
مصطلحات موضوعية: | propaganda detection, Hindi news, news bias, News articles analysis, Hindi Text processing |
الوصف: | The H-Prop dataset contains 28,630 articles created by translating a portion of Proppy Corpus in Hindi. Each article is labeled as either “propagandistic” (positive class) or “non-propagandistic” (negative class). The labeling done indirectly in Proppy corpus using a technique known as distant supervision is retained. The H-Prop-News dataset contains 5,500 Hindi News articles collected from 30+ prominent Hindi News websites. Each article is labeled as either “propagandistic” (positive class) or “non-propagandistic” (negative class). The labeling was done by human annotators and the inter-annotator agreement using Cohen’s Kappa measure observed is 0.81. ## Data format We provide the H-Prop dataset in three tsv files, including training, testing and validation partitions. The H-Prop-News dataset is provided in csv files including training, testing and validation partitions. Each line represents one article in H-Prop dataset with the following information: 1. article_text: the text of the article translated from Proppy corpus. 2. propaganda_label: label for articles retained from Proppy corpus. Each line represents one article in H-Prop-News dataset with the following information: 1. news_website: Name of the news source website 2. article_url: the direct URL for the published article in its source website 3. news_headline: news headline 4. article_text: the text of the article retrieved via parsehub tool 5. propaganda_label: label for articles ## About The H-Prop dataset was translated using IBM Watson Language Translator. ## Credit Please cite the dataset as: [HProp-News] Deptii Chaudhari, Ambika Pawar, and Alberto Barrón-Cedeño. 2022. H-Prop and H-Prop-News: Computational Propaganda Datasets in Hindi. doi:10.5281/zenodo.5828240 ## Authors Deptii Chaudhari; Ambika Pawar; Alberto Barrón-Cedeno |
نوع الوثيقة: | dataset |
اللغة: | Hindi |
العلاقة: | https://zenodo.org/record/5828240Test; https://doi.org/10.5281/zenodo.5828240Test; oai:zenodo.org:5828240 |
DOI: | 10.5281/zenodo.5828240 |
الإتاحة: | https://doi.org/10.5281/zenodo.5828240Test https://doi.org/10.5281/zenodo.5828239Test https://zenodo.org/record/5828240Test |
حقوق: | info:eu-repo/semantics/openAccess ; https://creativecommons.org/licenses/by/4.0/legalcodeTest |
رقم الانضمام: | edsbas.1643C73A |
قاعدة البيانات: | BASE |
DOI: | 10.5281/zenodo.5828240 |
---|