Pharos-ESG: A Framework for Multimodal Parsing, Contextual Narration, and Hierarchical Labeling of ESG Report
arXiv:2511.16417v3 Announce Type: replace Abstract: Environmental, Social, and Governance (ESG) principles are reshaping the foundations of global financial governance, transforming capital allocation architectures, regulatory frameworks, and systemic risk coordination mechanisms. However, as the core medium for assessing corporate ESG performance, the ESG reports present significant challenges for large-scale understanding, due to chaotic reading order from slide-like irregular layouts and implicit hierarchies arising from lengthy, weakly structured content. To address these challenges, we pr — Yan Chen, Yu Zou, Jialei Zeng, Haoran You, Xiaorui Zhou, Aixi Zhong
View PDF HTML (experimental)
Abstract:Environmental, Social, and Governance (ESG) principles are reshaping the foundations of global financial governance, transforming capital allocation architectures, regulatory frameworks, and systemic risk coordination mechanisms. However, as the core medium for assessing corporate ESG performance, the ESG reports present significant challenges for large-scale understanding, due to chaotic reading order from slide-like irregular layouts and implicit hierarchies arising from lengthy, weakly structured content. To address these challenges, we propose Pharos-ESG, a unified framework that transforms ESG reports into structured representations through multimodal parsing, contextual narration, and hierarchical labeling. It integrates a reading-order modeling module based on layout flow, hierarchy-aware segmentation guided by table-of-contents anchors, and a multi-modal aggregation pipeline that contextually transforms visual elements into coherent natural language. The framework further enriches its outputs with ESG, GRI, and sentiment labels, yielding annotations aligned with the analytical demands of financial research. Extensive experiments on annotated benchmarks demonstrate that Pharos-ESG consistently outperforms both dedicated document parsing systems and general-purpose multimodal models. In addition, we release Aurora-ESG, the first large-scale public dataset of ESG reports, spanning Mainland China, Hong Kong, and U.S. markets, featuring unified structured representations of multi-modal content, enriched with fine-grained layout and semantic annotations to better support ESG integration in financial governance and decision-making.
Subjects:
Artificial Intelligence (cs.AI)
ACM classes: I.2.7
Cite as: arXiv:2511.16417 [cs.AI]
(or arXiv:2511.16417v3 [cs.AI] for this version)
https://doi.org/10.48550/arXiv.2511.16417
arXiv-issued DOI via DataCite
Submission history
From: Yu Zou [view email] [v1] Thu, 20 Nov 2025 14:41:44 UTC (5,894 KB) [v2] Wed, 25 Mar 2026 11:17:23 UTC (5,894 KB) [v3] Sun, 29 Mar 2026 07:20:09 UTC (4,573 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxiv
NevaMind AI: Advanced Memory for Proactive Agents
Unveiling memU: A Sophisticated Memory Solution for 24/7 Proactive AI Agents NevaMind AI is thrilled to introduce memU , an open-source project dedicated to providing advanced memory functionalities for AI agents operating around the clock. Designed with the demands of proactive systems in mind, such as the principles behind Moltbot (ClawDBot), memU aims to be a cornerstone for developing more intelligent and responsive AI. Why memU? In the rapidly evolving landscape of Artificial Intelligence, robust memory management is paramount for agents that need to perform complex tasks, maintain context over long interactions, and learn continuously. memU addresses this critical need by offering: 24/7 Proactive Operation : Ensures agents are always ready, minimizing latency and maximizing efficienc

AGI Won’t Automate Most Jobs—Economist Reveals Why They’re Not Worth It
Why AGI Won't Steal Your Job—And That Might Be Worse The fear that artificial general intelligence (AGI) will render most human labor obsolete has become a staple of modern discourse. But what if the real story is more nuanced—and more unsettling—than the dystopian narrative suggests? A new paper by one of the world's foremost economists of automation challenges the assumption that AGI will simply replace human workers en masse. Instead, it reveals a paradox: many jobs won't be automated not because they're irreplaceable, but because they're not worth the effort to automate. Key Takeaways: The traditional view of AGI as a universal job-killer is being questioned by leading economists. Many jobs may remain untouched by automation, not due to their complexity, but because they lack economic
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers

Academic Proof-of-Work in the Age of LLMs
Written quickly as part of the Inkhaven Residency . Related: Bureaucracy as active ingredient , pain as active ingredient A widely known secret in academia is that many of the formalities serve in large part proof of work . That is, the reason expensive procedures exist is that some way of filtering must exist, and the amount of effort invested can often be a good proxy for the quality of the work. Specifically, the pool of research is vast, and good research can often be hard to identify. Even engaging in research enough to understand its quality can be expensive. As a result, people look toward signs of visible, expensive effort in order to determine whether to engage in the research at all. Why do people insist only on reading research that’s published in well-formatted, well-written pa

Signals – finding the most informative agent traces without LLM judges (arxiv.org)
Hello Peeps Salman, Shuguang and Adil here from Katanemo Labs (a DigitalOcean company). Wanted to introduce our latest research on agentic systems called Signals. If you've been building agents, you've probably noticed that there are far too many agent traces/trajectories to review one by one, and using humans or extra LLM calls to inspect all of them gets expensive really fast. The paper proposes a lightweight way to compute structured “signals” from live agent interactions so you can surface the trajectories most worth looking at, without changing the agent’s online behavior. Computing Signals doesn't require a GPU. Signals are grouped into a simple taxonomy across interaction, execution, and environment patterns, including things like misalignment, stagnation, disengagement, failure, lo
![[R] Looking for arXiv cs.LG endorser, inference monitoring using information geometry](https://d2xsxph8kpxj0f.cloudfront.net/310419663032563854/konzwo8nGf8Z4uZsMefwMr/default-img-earth-satellite-QfbitDhCB2KjTsjtXRYcf9.webp)
[R] Looking for arXiv cs.LG endorser, inference monitoring using information geometry
Hi r/MachineLearning , I’m looking for an arXiv endorser in cs.LG for a paper on inference-time distribution shift detection for deployed LLMs. The core idea: instead of monitoring input embeddings (which is what existing tools do), we monitor the statistical manifold of the model’s output distributions using Fisher-Rao geodesic distance. We then run adaptive CUSUM (Page-Hinkley) on the resulting z-score stream to catch slow drift that per-request spike detection misses entirely. The methodology is grounded in published work on information geometry (Figshare, DOIs available). We’ve validated the signal on real OpenAI API logprobs, CUSUM caught gradual domain drift in 7 steps with zero false alarms during warmup, while spike detection missed it entirely. If anyone with cs.LG endorsement is



Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!