Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessGet ready for a wave of TBPN clones after its blockbuster OpenAI dealBusiness InsiderBlackSwanX,174 AI agents predict the future by fighting each other,run on OllamaHacker News AI TopPSA: Anyone with a link can view your Granola notes by defaultThe Verge AIWhat People Actually Want From AI - PYMNTS.comGoogle News: AIReddit is moving on from r/allThe Verge AIElementary school students create award-winning mascot with artificial intelligence - The Journal GazetteGoogle News: AITripped up by misinformation? Here's a refresher on identifying AI - PBSGoogle News: AISilicon Valley's Computer History Museum examines evolution of technology, growing role of AI - CBS NewsGoogle News: AICursor s New Tool Lets Users Delegate to a Team of Coding AgentsGizmodoOpenAI acquires tech podcast ‘TBPN’ - thehill.comGoogle News: OpenAIA Baseless Copyright Claim Against a Web Host—and Why It FailedElectronic Frontier FoundationNvidia Needs to Remind Itself What PC Gamers Actually WantGizmodoBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessGet ready for a wave of TBPN clones after its blockbuster OpenAI dealBusiness InsiderBlackSwanX,174 AI agents predict the future by fighting each other,run on OllamaHacker News AI TopPSA: Anyone with a link can view your Granola notes by defaultThe Verge AIWhat People Actually Want From AI - PYMNTS.comGoogle News: AIReddit is moving on from r/allThe Verge AIElementary school students create award-winning mascot with artificial intelligence - The Journal GazetteGoogle News: AITripped up by misinformation? Here's a refresher on identifying AI - PBSGoogle News: AISilicon Valley's Computer History Museum examines evolution of technology, growing role of AI - CBS NewsGoogle News: AICursor s New Tool Lets Users Delegate to a Team of Coding AgentsGizmodoOpenAI acquires tech podcast ‘TBPN’ - thehill.comGoogle News: OpenAIA Baseless Copyright Claim Against a Web Host—and Why It FailedElectronic Frontier FoundationNvidia Needs to Remind Itself What PC Gamers Actually WantGizmodo
AI NEWS HUBbyEIGENVECTOREigenvector

New multimodal dataset will help in the development of ethical AI systems

Vector Instituteby Ian GormelyOctober 23, 20241 min read0 views
Source Quiz

By Shaina Raza and Deval Pandya The Vector Institute’s AI Engineering team has developed Newsmediabias-plus (NMB+), a new multimodal dataset. It includes full-text articles alongside comprehensive publication details. It also [ ] The post New multimodal dataset will help in the development of ethical AI systems appeared first on Vector Institute for Artificial Intelligence .

By Shaina Raza and Deval Pandya

The Vector Institute’s AI Engineering team has developed Newsmediabias-plus (NMB+), a new multimodal dataset. It includes full-text articles alongside comprehensive publication details. It also features extensive bias categorization, addressing critical issues such as gender and racial biases, and specific topics including ideological leanings and framing, gender discrimination, and environmental concerns.

NMB+ is designed for academic researchers, NGOs, and socially focused groups. This is aligned to Vector’s goal of addressing both near- and long-term risks through the provision of practical tools for safe AI systems. Potential uses include:

  • Ensuring AI adheres to Vector’s AI trust and safety principles

  • Analyzing media trends and reporting styles across different outlets

  • Training AI to fairly detect and address disinformation in texts and images.

Developed by Shaina Raza, Vector Institute Applied Machine Learning Scientist, Responsible AI, the dataset builds on the previously released UnBIAS work by incorporating images alongside text.

Dataset features

The dataset includes around 90,000 news articles, curated from a broad spectrum of reliable sources, including major news outlets from around the globe, from May 2023 to September 2024. These articles were gathered through open data sources using Google RSS, adhering to research ethics guidelines.1, 2

Various machine learning models were built to evaluate the dataset’s effectiveness in detecting biases and fake content, demonstrating its versatility and utility. This benchmarking process shows how the dataset performs across different modalities, including text and images, highlighting its potential for training advanced AI models designed to combat disinformation.

Each entry in the dataset features full article text, publication details (date, outlet, URL), bias assessments for both text and images, as well as topic categorizations and image descriptions and analyses. A commitment to ethical AI governance requires designing transparent AI systems that can be understood and audited, holding developers accountable for the content their AI tools generate, and establishing clear ethical standards for the development and deployment of AI technologies. Developers and researchers should focus on building robust and transparent algorithms, integrating ethical considerations and personal information protection in data, and collaborating with experts across disciplines to enhance disinformation detection techniques. It also requires continuously adapting AI tools to counter evolving disinformation tactics.

NMB+’s development and use are governed by strict ethical standards to align regulatory requirements with technical work. Comprehensive human reviews have been implemented to ensure the accuracy and reliability of the data and its labels. The dataset underwent extensive audits to validate the data collection and labeling methodologies. These audits involve independent reviewers who assess the dataset for adherence to ethical standards and accuracy. They examine the data sources, collection procedures, and labeling criteria to ensure that all elements meet established research integrity and reliability guidelines. This thorough review helps to confirm that the dataset is both robust and trustworthy for use in training and evaluating AI systems.

Researchers, technologists, and the general public are invited to explore the NMB+ dataset and delve into the findings. The dataset is accessible on Vector’s Hugging Face page under a non-commercial license. The details can be found at News Media Bias Plus page.

References

[1] Does my data collection activity require ethics review? | Research | University of Waterloo

[2] What Can Open Data be Used For?

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
New multimo…multimodalVector Inst…

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 157 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Products