Datasets

Open datasets released by the SCIRE Group and collaborators, archived on Zenodo with permanent DOIs.

DNIPRO

Diverse Narratives and International Perspectives on the Russo-Ukrainian Offensive

A longitudinal, cross-lingual corpus for studying how the same armed conflict is narrated differently across national media ecosystems — enabling research in computational journalism, propaganda detection, and cross-lingual narrative analysis.

246,229 articles 3 languages 5 nations 11 media outlets 31 months

Mohanty, Sabadyn, Rodrigues, Wang, Kalugade, and Banerjee · Zenodo · February 2026

cite this dataset
@dataset{mohanty2026dnipro,
  author    = {Mohanty, Dikshya and Sabadyn, Taisiia and Rodrigues, Jelwin
               and Wang, Chenlu and Kalugade, Abhishek and Banerjee, Ritwik},
  title     = {Diverse Narratives and International Perspectives on the Russo-Ukrainian Offensive (DNIPRO)},
  month     = feb,
  year      = {2026},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.18470677},
  url       = {https://doi.org/10.5281/zenodo.18470677}
}
HDCR

Cross-lingual Medical Misinformation Detection Dataset

A cross-lingual benchmark pairing Chinese health claims from news sources with English biomedical evidence across four fine-grained distortion types — supporting research in health claim verification and medical NLP.

72,275 claim-evidence pairs 2 languages 4 distortion types

Chaoyuan Zuo (SCIRE alumni) · Zenodo · 2025 · Paper co-authored with Ritwik Banerjee

cite this dataset
@dataset{zuo2025hdcr_data,
  author    = {Zuo, Chaoyuan},
  title     = {HDCR: Cross-lingual Medical Misinformation Detection Dataset},
  year      = {2025},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.17486207},
  url       = {https://doi.org/10.5281/zenodo.17486207}
}