Structural Feature Engineering for Generative Engine Optimization: How Content Structure Shapes Citation Behavior
arXiv:2603.29979v1 Announce Type: cross Abstract: The proliferation of AI-powered search engines has shifted information discovery from traditional link-based retrieval to direct answer generation with selective source citation, creating new challenges for content visibility. While existing Generative Engine Optimization (GEO) approaches focus primarily on semantic content modification, the role of structural features in influencing citation behavior remains underexplored. In this paper, we propose GEO-SFE, a systematic framework for structural feature engineering in generative engine optimization. Our approach decomposes content structure into three hierarchical levels: macro-structure (document architecture), meso-structure (information chunking), and micro-structure (visual emphasis), a
View PDF HTML (experimental)
Abstract:The proliferation of AI-powered search engines has shifted information discovery from traditional link-based retrieval to direct answer generation with selective source citation, creating new challenges for content visibility. While existing Generative Engine Optimization (GEO) approaches focus primarily on semantic content modification, the role of structural features in influencing citation behavior remains underexplored. In this paper, we propose GEO-SFE, a systematic framework for structural feature engineering in generative engine optimization. Our approach decomposes content structure into three hierarchical levels: macro-structure (document architecture), meso-structure (information chunking), and micro-structure (visual emphasis), and models their impact on citation probability across different generative engine architectures. We develop architecture-aware optimization strategies and predictive models that preserve semantic integrity while improving structural effectiveness. Experimental evaluation across six mainstream generative engines demonstrates consistent improvements in citation rate (17.3 percent) and subjective quality (18.5 percent), validating the effectiveness and generalizability of the proposed framework. This work establishes structural optimization as a foundational component of GEO, providing a data-driven methodology for enhancing content visibility in LLM-powered information ecosystems.
Comments: 12 pages, 5 figures. This paper proposes GEO-SFE, a structural feature engineering framework for generative engine optimization
Subjects:
Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR)
ACM classes: H.3.3; I.2.7
Cite as: arXiv:2603.29979 [cs.CL]
(or arXiv:2603.29979v1 [cs.CL] for this version)
https://doi.org/10.48550/arXiv.2603.29979
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: MuFeng Yang [view email] [v1] Tue, 31 Mar 2026 16:41:43 UTC (953 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modelannouncefeature
5 Steps to break free from alert fatigue and build resilient security operations
How many times has your SOC hit crisis mode at 2:00 AM, with the dashboard blaring red and analysts scrambling to separate real threats from useless noise? We’ve all been there, and if you’re still measuring success by the number of alerts closed, chances are you’re feeling the strain. The truth is, responding to everything is neither sustainable nor effective—and it puts resilience at risk. In this article, we’ll show you the five most important steps you can take to move from alert fatigue to business resilience, supported by hard data from the 2026 N-able State of the SOC Report . These are the practical habits security-driven IT leaders are adopting to future-proof their operations and protect what matters most. 1. Recognize the cost of noise: When “more alerts” means more risk Many SO
My most common advice for junior researchers
Written quickly as part of the Inkhaven Fellowship . At a high level, research feedback I give to more junior research collaborators often can fall into one of three categories: Doing quick sanity checks Saying precisely what you want to say Asking why one more time In each case, I think the advice can be taken to an extreme I no longer endorse. Accordingly, I’ve tried to spell out the degree to which you should implement the advice, as well as what “taking it too far” might look like. This piece covers doing quick sanity checks, which is the most common advice I give to junior researchers. I’ll cover the other two pieces of advice in a subsequent piece. Doing quick sanity checks Research is hard (almost by definition) and people are often wrong. Every researcher has wasted countless hours
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models

「半歩先」こそが実装の急所ーー量子アニーリング7年の実践で見えた、日本企業が勝つための条件
プロフィール:最首英裕(さいしゅ えいひろ) 株式会社グルーヴノーツ 代表取締役社長・創業者。 早稲田大学第一文学部にて詩人の鈴木志郎康に師事。卒業後は、都市再開発事業のコンサルタントを経て、都市空間におけるIT基盤の企画・開発に取り組む。その後、米国Apple Computerの製品開発プロジェクトの日本対応開発責任者として、様々な製品開発を手掛ける。株式会社グルーヴノーツ設立後は、AIと量子コンピュータを活用したサービスを開発。金融・物流を中心に数多くの社会課題を解決。金融分野における高度なインテリジェンス機能の実現や、物流分野におけるインテリジェンスと量子コンピュータの融合などに取り組む。 技術ありきではない——課題起点の量子導入 グルーヴノーツの創業は2011年。最首氏はそれまで経営してきた自らの会社を売却し、福岡にあった会社を買収して社名を変更、新たなスタートを切った。社名は「Groove(演奏者と聴衆が互いに盛り上がり最高の演奏ができている状態)」と「nauts(航海士)」を組み合わせた造語で、顧客や関わる全ての人たちがわくわくでき、社会全体が可能性に満ちあふれるように——という思いを込めた。 技術の進化により、シンプルな構造で少人数の方が優れたシステムを作れる時代になった。にもかかわらず、現場は相変わらず人数と予算の規模を競っている。 「お金と時間をかけない方が良いものができるのに、なぜ日本のIT業界は人数と予算の規模を競うのか」——そんな問題意識から、創業以来、最先端技術をわかりやすく使えるプロダクト開発に取り組んできた。当初は量子ではなくAI分野に注力し、2017〜2018年頃にはディープラーニングの実装を進めていた。 量子との出会いは2018年。コールセンターの電話本数予測プロジェクトでのことだ。グルーヴノーツが作成した予測モデルの精度は高かったが、その
My most common advice for junior researchers
Written quickly as part of the Inkhaven Fellowship . At a high level, research feedback I give to more junior research collaborators often can fall into one of three categories: Doing quick sanity checks Saying precisely what you want to say Asking why one more time In each case, I think the advice can be taken to an extreme I no longer endorse. Accordingly, I’ve tried to spell out the degree to which you should implement the advice, as well as what “taking it too far” might look like. This piece covers doing quick sanity checks, which is the most common advice I give to junior researchers. I’ll cover the other two pieces of advice in a subsequent piece. Doing quick sanity checks Research is hard (almost by definition) and people are often wrong. Every researcher has wasted countless hours

Parameter Count Is the Worst Way to Pick a Model on 8GB VRAM
<h1> Parameter Count Is the Worst Way to Pick a Model on 8GB VRAM </h1> <p>I've been running local LLMs on an RTX 4060 8GB for six months. Qwen2.5-32B, Qwen3.5-9B/27B/35B-A3B, BGE-M3 — all crammed through Q4_K_M quantization. One thing I can say with certainty:</p> <p><strong>Parameter count is the worst metric for model selection.</strong></p> <p>Online comparisons rank models by size — "32B gives this quality," "7B gives that." Benchmarks like MMLU and HumanEval publish rankings by parameter count. But those assume abundant VRAM. On 8GB, parameter count fails to predict the actual experience.</p> <p>This article covers three rules I derived from real measurements, plus a decision framework for 8GB VRAM model selection. All data comes from <a href="https://qiita.com/plasmon" rel="noopener


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!