Knowledge Quiz
Test your understanding of this article
1.What is the primary purpose of the OS-Harm benchmark?
2.Which of the following best describes 'Misalignment' as a type of harm in OS-Harm?
3.Why does OS-Harm use a dual evaluation scheme, assessing both task completion and harmful behavior?
4.OS-Harm is built on top of which existing benchmark?
