Datasets
Open datasets released by the SCIRE Group and collaborators, archived on Zenodo with permanent DOIs.
DNIPRO
Diverse Narratives and International Perspectives on the Russo-Ukrainian Offensive
A longitudinal, cross-lingual corpus for studying how the same armed conflict is narrated differently across national media ecosystems — enabling research in computational journalism, propaganda detection, and cross-lingual narrative analysis.
246,229 articles 3 languages 5 nations 11 media outlets 31 months
Mohanty, Sabadyn, Rodrigues, Wang, Kalugade, and Banerjee · Zenodo · February 2026
cite this dataset
@dataset{mohanty2026dnipro,
author = {Mohanty, Dikshya and Sabadyn, Taisiia and Rodrigues, Jelwin
and Wang, Chenlu and Kalugade, Abhishek and Banerjee, Ritwik},
title = {Diverse Narratives and International Perspectives on the Russo-Ukrainian Offensive (DNIPRO)},
month = feb,
year = {2026},
publisher = {Zenodo},
doi = {10.5281/zenodo.18470677},
url = {https://doi.org/10.5281/zenodo.18470677}
} HDCR
Cross-lingual Medical Misinformation Detection Dataset
A cross-lingual benchmark pairing Chinese health claims from news sources with English biomedical evidence across four fine-grained distortion types — supporting research in health claim verification and medical NLP.
72,275 claim-evidence pairs 2 languages 4 distortion types
Chaoyuan Zuo (SCIRE alumni) · Zenodo · 2025 · Paper co-authored with Ritwik Banerjee
cite this dataset
@dataset{zuo2025hdcr_data,
author = {Zuo, Chaoyuan},
title = {HDCR: Cross-lingual Medical Misinformation Detection Dataset},
year = {2025},
publisher = {Zenodo},
doi = {10.5281/zenodo.17486207},
url = {https://doi.org/10.5281/zenodo.17486207}
}