Models model language model announce product valuation perspective

EventChat: Implementation and user-centric evaluation of a large language model-driven conversational recommender system for exploring leisure events in an SME context

arXiv cs.IRby Hannes Kunstmann, Joseph Ollier, Joel Persson, Florian von WangenheimApril 1, 20262 min read0 views

arXiv:2407.04472v4 Announce Type: replace Abstract: Large language models (LLMs) present an enormous evolution in the strategic potential of conversational recommender systems (CRS). Yet to date, research has predominantly focused upon technical frameworks to implement LLM-driven CRS, rather than end-user evaluations or strategic implications for firms, particularly from the perspective of a small to medium enterprises (SME) that makeup the bedrock of the global economy. In the current paper, we detail the design of an LLM-driven CRS in an SME setting, and its subsequent performance in the field using both objective system metrics and subjective user evaluations. While doing so, we additionally outline a short-form revised ResQue model for evaluating LLM-driven CRS, enabling replicability

View PDF

Abstract:Large language models (LLMs) present an enormous evolution in the strategic potential of conversational recommender systems (CRS). Yet to date, research has predominantly focused upon technical frameworks to implement LLM-driven CRS, rather than end-user evaluations or strategic implications for firms, particularly from the perspective of a small to medium enterprises (SME) that makeup the bedrock of the global economy. In the current paper, we detail the design of an LLM-driven CRS in an SME setting, and its subsequent performance in the field using both objective system metrics and subjective user evaluations. While doing so, we additionally outline a short-form revised ResQue model for evaluating LLM-driven CRS, enabling replicability in a rapidly evolving field. Our results reveal good system performance from a user experience perspective (85.5% recommendation accuracy) but underscore latency, cost, and quality issues challenging business viability. Notably, with a median cost of $0.04 per interaction and a latency of 5.7s, cost-effectiveness and response time emerge as crucial areas for achieving a more user-friendly and economically viable LLM-driven CRS for SME settings. One major driver of these costs is the use of an advanced LLM as a ranker within the retrieval-augmented generation (RAG) technique. Our results additionally indicate that relying solely on approaches such as Prompt-based learning with ChatGPT as the underlying LLM makes it challenging to achieve satisfying quality in a production environment. Strategic considerations for SMEs deploying an LLM-driven CRS are outlined, particularly considering trade-offs in the current technical landscape.

Comments: Just accepted version

Subjects:

Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

MSC classes: 68T50

ACM classes: I.2.7; H.5.2

Cite as: arXiv:2407.04472 [cs.IR]

(or arXiv:2407.04472v4 [cs.IR] for this version)

https://doi.org/10.48550/arXiv.2407.04472

arXiv-issued DOI via DataCite

Related DOI:

https://doi.org/10.1145/3803546

DOI(s) linking to related resources

Submission history

From: Joseph Ollier Dr [view email] [v1] Fri, 5 Jul 2024 12:42:31 UTC (691 KB) [v2] Mon, 8 Jul 2024 14:50:49 UTC (693 KB) [v3] Tue, 9 Jul 2024 13:31:00 UTC (699 KB) [v4] Tue, 31 Mar 2026 08:47:21 UTC (1,169 KB)

Original source

arXiv cs.IR

https://arxiv.org/abs/2407.04472

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modellanguage modelannounce

ModelsLive

A Very Fine Untuning

How fine-tuning made my chatbot worse (and broke my RAG pipeline) I spent weeks trying to improve my personal chatbot, Virtual Alexandra , with fine-tuning. Instead I got increased hallucination rate and broken retrieval in my RAG system. Yes, this is a story about a failed attempt, not a successful one. My husband and I called fine tuning results “Drunk Alexandra” — incoherent answers that were initially funny, but quickly became annoying. After weeks of experiments, I reached a simple conclusion: for this particular project, a small chatbot that answers questions based on my writing and instructions, fine tuning was not a good option. It was not just unnecessary, it actively degraded the experience and didn’t justify the extra time, cost, or complexity compared to the prompt + RAG system

Towards AI

11mabout 1 hour ago

ModelsLive

Google's TurboQuant saves memory, but won't save us from DRAM-pricing hell

<h4>Chocolate Factory’s compression tech clears the way to cheaper AI inference, not more affordable memory</h4> <p>When Google unveiled <a target="_blank" rel="nofollow" href="https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/">TurboQuant</a>, an AI data compression technology that promises to slash the amount of memory required to serve models, many hoped it would help with a memory shortage that has seen prices triple since last year. Not so much.…</p>

The Register AI/ML

1m41 minutes ago

ProductsLive

Writing Better RFCs and Design Docs

<p>RFCs (Request for Comments) and design docs are how engineering teams align on the “what” and “why” before writing code. Done well, they reduce rework and create a record of decisions. Done poorly, they sit unread or trigger endless debate. Here’s how to write <strong>better RFCs and design docs</strong> that get read, get feedback, and lead to decisions.</p> <h2> Why Write Them at All? </h2> <ul> <li> <strong>Alignment:</strong> Everyone works from the same understanding of the problem and the approach.</li> <li> <strong>Async review:</strong> People can respond in their own time, including across time zones.</li> <li> <strong>Memory:</strong> Later you have a record of why you chose X and what you rejected.</li> <li> <strong>Onboarding:</strong> New joiners (and future you) can unders

DEV Community

4m42 minutes ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 188 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

ModelsLive

A Very Fine Untuning

Towards AI

11mabout 1 hour ago

ModelsLive

Google's TurboQuant saves memory, but won't save us from DRAM-pricing hell

The Register AI/ML

1m41 minutes ago

ModelsLive

Introducing The Screwtape Ladders

The time has come for me to find a new home for my writings. Like many an author before me, I've enjoyed improving my craft and getting feedback on my essays here. LessWrong is a good incubator for honing one's skills in that arena. There's a chance to get your point out in front of a really broad audience of really smart people. There's been some cool moments. My oldest visible post, Write A Thousand Roads to Rome , got cited in a discussion with Eliezer Yudkowsky once. I keep seeing people bring up Loudly Give Up, Don't Quietly Fade as a motivator for speaking out. Sometimes it's really cool people working on awesome projects, and I feel a flash of sadness at 'aww, it's not going to happen' and also a bit of cool 'whoa, they remember that post?' You've all also let me get away with a lot

LessWrong AI

3mabout 1 hour ago

ModelsLive

Anthropic Executive Sees Cowork Agent as Bigger Than Claude Code - Bloomberg.com

<a href="https://news.google.com/rss/articles/CBMitgFBVV95cUxOM0VfSzdRYUNpT21XMlVuNXhsVEY4TUFxM3UzWUJDOEhFcUtJQnhTbjY2VjBXOUw1d1ZOUDRKeHVKMzkta3pFVWRWSGNoQkp3aWVndlRBQlpVUGxVN0ZnQW80OUZnYWN6RlhJWHRjT0V4RVhPcGhxMmE3b3oyVDlUV2RLY0g2NEx4M1dfMXhvTlhPTW50eFR1cEhxcHB3SXpURnRtbDZtZHp6bGQ2Z09IMjZBODBjdw?oc=5" target="_blank">Anthropic Executive Sees Cowork Agent as Bigger Than Claude Code</a> <font color="#6f6f6f">Bloomberg.com</font>

Google News: Claude

1mabout 1 hour ago