Products available update product application platform service

From 1.5s to 250ms: How We 6x'd API Latency with Spring Boot Optimization

Dev.to AIby Ankit Kumar ShawApril 3, 20268 min read0 views

The Hook: When Your System Meets Reality Picture this: It's Tuesday morning, and your leave management system—serving 20,000 employees across enterprise teams—suddenly becomes the bottleneck. Approvals that should take seconds are taking 1.5 seconds per request . Dashboard loads feel sluggish. Support tickets flood in. The database team reports 85% CPU utilization , and your ops team is preparing incident escalation. The system was built to "work." But production doesn't reward "working"—it demands reliability, speed, and efficiency at scale . This is the story of how we inherited a system with p99 latency of 1.5s on a platform serving 15K+ daily requests, and through methodical architectural profiling and optimization, reduced it to 250ms (6x faster) , while simultaneously cutting databas

The Hook: When Your System Meets Reality

Picture this: It's Tuesday morning, and your leave management system—serving 20,000 employees across enterprise teams—suddenly becomes the bottleneck. Approvals that should take seconds are taking 1.5 seconds per request. Dashboard loads feel sluggish. Support tickets flood in. The database team reports 85% CPU utilization, and your ops team is preparing incident escalation.

The system was built to "work." But production doesn't reward "working"—it demands reliability, speed, and efficiency at scale.

This is the story of how we inherited a system with p99 latency of 1.5s on a platform serving 15K+ daily requests, and through methodical architectural profiling and optimization, reduced it to 250ms (6x faster), while simultaneously cutting database CPU consumption by 40%.

Note: Even after migrating from legacy Python/Flask to Spring Boot, the latency bottleneck persisted. The migration provided better observability and scalability, but the root causes—inefficient queries, undersized connection pools, and missing caching—remained. The optimizations below solved the actual problem.

The Scenario: A System Under Pressure

Current State (Post-Migration, Spring Boot)

System: Newly migrated Spring Boot microservice (from legacy Python/Flask)
Scale: 20,000 employees, 15,000+ requests/day
Latency: p99 = 1.5 seconds (unacceptable for user experience)
Database Load: 85% CPU utilization on a 4-core instance
Problem: Migration alone didn't solve the performance issues; architectural inefficiencies remained

Why This Matters

At scale, every 100ms saved = better UX, lower operational costs, and reduced infrastructure spend. For 15K daily requests, cutting latency by 1.25s saves ~50 compute-hours daily. That's the difference between needing two database instances or one.

Root Cause Analysis: Finding the Bottleneck

Our investigation focused on three key areas:

1. Database Query Profiling

Using Spring Boot Actuator and MySQL slow query logs, we discovered the smoking gun: The N+1 Query Problem — A textbook horror:

// ❌ BEFORE: N+1 Query Anti-pattern public List getLeaveRequestsByDepartment(Long deptId) {  List depts = em.createQuery(  "SELECT d FROM Department d WHERE d.id = :deptId",   Department.class)  .setParameter("deptId", deptId)  .getResultList(); // Query 1: Fetch department

// ❌ BEFORE: N+1 Query Anti-pattern public List getLeaveRequestsByDepartment(Long deptId) {  List depts = em.createQuery(  "SELECT d FROM Department d WHERE d.id = :deptId",   Department.class)  .setParameter("deptId", deptId)  .getResultList(); // Query 1: Fetch department

for (Department dept : depts) { List employees = em.createQuery( "SELECT e FROM Employee e WHERE e.department.id = :deptId", Employee.class) .setParameter("deptId", dept.getId()) .getResultList(); // Query 2-N: Fetch all employees (1 per loop)

for (Employee emp : employees) { List leaves = em.createQuery( "SELECT l FROM LeaveRequest l WHERE l.employee.id = :empId", LeaveRequest.class) .setParameter("empId", emp.getId()) .getResultList(); // Query 3-N²: Fetch leaves per employee } } }`

Enter fullscreen mode

Exit fullscreen mode

Impact: For a department with 500 employees, this triggered 1 + 500 + (500 * avg_leaves) = ~2000+ queries.*

Database response time per request: ~1.2 seconds (just waiting for the database).

2. Connection Pool Exhaustion

HikariCP was configured with:

spring.datasource.hikari.maximum-pool-size=10

Enter fullscreen mode

Exit fullscreen mode

With 15K requests/day and 1.2s per database roundtrip, connections were being held too long. We hit connection pool saturation, causing requests to queue.

3. Missing Caching Layer

User roles, department hierarchies, and leave policies were fetched on every request—data that changes infrequently but was being queried thousands of times daily.

Solution 1: Hibernate JOIN FETCH (Eliminate N+1)

The most impactful change was using Hibernate's JOIN FETCH to eagerly load relationships in a single query:

// ✅ AFTER: Single JOIN FETCH query @Query(value = """  SELECT DISTINCT d FROM Department d  LEFT JOIN FETCH d.employees e  LEFT JOIN FETCH e.leaveRequests l  WHERE d.id = :deptId  """) List getLeaveRequestsByDepartment(@Param("deptId") Long deptId);

// ✅ AFTER: Single JOIN FETCH query @Query(value = """  SELECT DISTINCT d FROM Department d  LEFT JOIN FETCH d.employees e  LEFT JOIN FETCH e.leaveRequests l  WHERE d.id = :deptId  """) List getLeaveRequestsByDepartment(@Param("deptId") Long deptId);

Enter fullscreen mode

Exit fullscreen mode

Trade-off: This fetches all data upfront (potentially unnecessary if you only need recent leaves). But in this case, the leave request list was always needed.

Result: From 2000+ queries → 1 query. Database latency dropped from 1.2s → 350ms.

Solution 2: HikariCP Connection Pool Tuning

Before diving into pool size, we profiled connection lifecycle:

# ❌ BEFORE (Undersized) spring.datasource.hikari.maximum-pool-size=10 spring.datasource.hikari.minimum-idle=2

# ❌ BEFORE (Undersized) spring.datasource.hikari.maximum-pool-size=10 spring.datasource.hikari.minimum-idle=2

✅ AFTER (Right-sized for concurrency)

spring.datasource.hikari.maximum-pool-size=50 spring.datasource.hikari.minimum-idle=10 spring.datasource.hikari.connection-timeout=10000 # 10s timeout before failing spring.datasource.hikari.idle-timeout=600000 # 10 min before closing idle connections spring.datasource.hikari.leak-detection-threshold=60000 # Detect connection leaks`

Enter fullscreen mode

Exit fullscreen mode

Why 50? Using Little's Law (Connections = Concurrent Requests × Query Time):

Measured concurrent requests at p99: 30-40 (from production JMX metrics)
Safety buffer for traffic spikes: +10-20
Total: 50 connections (30-40 + 10-20)

This sizing eliminates connection queue time while avoiding resource waste.

Result: Connection queue time eliminated. No more "Connection pool exhausted" errors.

Solution 3: Redis-Backed Caching Strategy

We implemented a two-tier caching strategy for frequently-accessed, infrequently-changed data:

@Service @RequiredArgsConstructor public class LeavePolicyCache {  private final RedisTemplate redisTemplate;  private final LeavePolicyRepository leavePolicyRepository;  private static final String CACHE_KEY_PREFIX = "leave:policy:";  private static final long TTL_SECONDS = 86400; // 24 hours

@Service @RequiredArgsConstructor public class LeavePolicyCache {  private final RedisTemplate redisTemplate;  private final LeavePolicyRepository leavePolicyRepository;  private static final String CACHE_KEY_PREFIX = "leave:policy:";  private static final long TTL_SECONDS = 86400; // 24 hours

/**

Fetch leave policy for an employee (vacation quota, sick leave, carry-forward rules).
Policies rarely change, so we cache them aggressively. / public LeavePolicy getLeavePolicy(Long employeeId) { String cacheKey = CACHE_KEY_PREFIX + employeeId;

// Try Redis first LeavePolicy cached = redisTemplate.opsForValue().get(cacheKey); if (cached != null) { return cached; // Cache hit: O(1) lookup, <1ms response }

// Cache miss: Fetch from database // (joins employee → employment_type → leave_policy tables) LeavePolicy policy = leavePolicyRepository.findByEmployeeId(employeeId) .orElseThrow();

// Populate cache with 24-hour TTL redisTemplate.opsForValue() .set(cacheKey, policy, Duration.ofSeconds(TTL_SECONDS));

return policy; }

@EventListener public void onLeavePolicyUpdated(LeavePolicyUpdatedEvent event) { // Invalidate cache when policies change (e.g., annual quota reset, policy update) String cacheKey = CACHE_KEY_PREFIX + event.getEmployeeId(); redisTemplate.delete(cacheKey); } }`

Enter fullscreen mode

Exit fullscreen mode

What We Cache (and Why):

Leave Policies: Vacation quotas, sick leave limits, carry-forward rules — accessed on every leave request submission/validation, but change only during annual resets or policy updates.
Approval Workflows: Manager hierarchies and approval chains — needed to route leave requests, but organization structure changes infrequently.

Cache Invalidation Strategy: Event-driven (on policy/org change, we emit events and invalidate) + 24-hour TTL.

Why Long TTL? Leave policies are extremely stable (change annually or when employee switches departments). 24-hour TTL ensures we catch manual DB changes without event emission.

Result: Leave policy lookups: 120ms (database join across 3 tables) → <1ms (Redis). Reduced database CPU by ~40% since these queries were happening on every leave request view/submission.

Architecture Overview: Before vs. After

Before Optimization

Client Request  ↓ Spring Boot Controller  ↓ Service Layer (Business Logic)  ↓ [N+1 Queries] → Database (1.2s latency)  ↓ [No Caching]  ↓ Response (1.5s p99)

Client Request  ↓ Spring Boot Controller  ↓ Service Layer (Business Logic)  ↓ [N+1 Queries] → Database (1.2s latency)  ↓ [No Caching]  ↓ Response (1.5s p99)

Enter fullscreen mode

Exit fullscreen mode

After Optimization

Client Request  ↓ Spring Boot Controller  ↓ Cache Check (Redis) ← 1ms ✅  ├─ Hit: Return cached Leave Policy  └─ Miss: Continue to database  ↓ Service Layer (Business Logic)  ↓ [Single JOIN FETCH Query] → Database (350ms)  ↓ [HikariCP Optimized] (50-connection pool)  ↓ Response (250ms p99) ✅

Client Request  ↓ Spring Boot Controller  ↓ Cache Check (Redis) ← 1ms ✅  ├─ Hit: Return cached Leave Policy  └─ Miss: Continue to database  ↓ Service Layer (Business Logic)  ↓ [Single JOIN FETCH Query] → Database (350ms)  ↓ [HikariCP Optimized] (50-connection pool)  ↓ Response (250ms p99) ✅

Enter fullscreen mode

Exit fullscreen mode

Results: The Metrics That Matter

Metric Before After Improvement

p99 Latency 1.5s 250ms

6x faster ✅

p50 Latency 850ms 180ms 4.7x faster

Database CPU 85% 51%

40% reduction ✅

Queries/Request 2000+ 1 99.95% fewer

Redis Hit Rate N/A 87% —

Connection Pool Timeout Errors 45/day 0 100% elimination

Business Impact

User Experience: Dashboard loads feel snappy (250ms vs 1.5s)
Infrastructure: Supports 15K+ daily requests on smaller database instance
Reliability: No more p99 latency spikes during business hours

Lessons Learned: Engineering Trade-offs

What We Sacrificed

Flexibility: JOIN FETCH loads all data. If you only needed recent leaves, you'd fetch unnecessary historical data. Solution: Separate queries for different use cases (leave summary vs. full history).
Memory: Increased HikariCP pool size (10 → 50) uses more heap. Reality: Better to use extra memory than have request queues. Monitoring showed peak heap: 1.2GB / 4GB available.
Cache Consistency: Redis introduces eventual consistency. Mitigation: Event-driven invalidation + TTL ensures freshness within 1 hour.

What Worked

JOIN FETCH: Single, largest impact. 99.95% query reduction. Connection pooling: Eliminated queueing entirely. Caching strategy: 87% hit rate validates our data access patterns.

Key Takeaways

Profile First, Optimize Second Don't guess where the bottleneck is. We assumed CPU was the issue—it was actually database queries. Use Spring Boot Actuator, MySQL slow logs, and flamegraphs.
N+1 is Insidious With 20K employees and historical leave records, N+1 queries cascaded into 2000+ database roundtrips per request. Always use EXPLAIN PLAN and test with realistic data sizes.
Connection Pooling Matters More Than You Think Undersized pools (10 connections) caused request queueing, which is invisible in application metrics but devastating to latency. Right-size based on concurrency math, not gut feel.
Caching is Not Free Cache invalidation is hard. We chose event-driven + TTL because it's reliable and simple. Measure hit rates to validate your strategy.
6x Latency Improvement = 6x Better UX Users feel the difference between 1.5s and 250ms. This wasn't just a technical victory—it was a product improvement.

Original source

Dev.to AI

https://dev.to/ankitkumarshaw/from-15s-to-250ms-how-we-6xd-api-latency-with-spring-boot-optimization-27e0

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

availableupdateproduct

ReleasesFresh

HotJobs launched as Sri Lanka’s first AI-driven recruitment platform - Daily FT

HotJobs launched as Sri Lanka’s first AI-driven recruitment platform Daily FT

Google News - AI Sri Lanka

1mabout 7 hours ago

ProductsLive

Desktop Canary v2.1.48-canary.26

🐤 Canary Build — v2.1.48-canary.26 Automated canary build from canary branch. ⚠️ Important Notes This is an automated canary build and is NOT intended for production use. Canary builds are triggered by build / fix / style commits on the canary branch. May contain unstable or incomplete changes . Use at your own risk. It is strongly recommended to back up your data before using a canary build. 📦 Installation Download the appropriate installer for your platform from the assets below. Platform File macOS (Apple Silicon) .dmg (arm64) macOS (Intel) .dmg (x64) Windows .exe Linux .AppImage / .deb

LobeChat Releases

1mabout 1 hour ago

ProductsLive

Which cloud architecture decision do tech leaders regret most? Treating AI like just another workload

For years, cloud strategy rewarded standardization. Treat everything as a workload, abstract the differences, optimize for scale and cost. That mindset helped enterprises modernize faster than any previous infrastructure shift. Applying that same mindset to AI is one of the most consequential architectural mistakes I see senior IT leaders make. In executive rooms, the logic is understandable. We already have a hardened cloud platform. We have guardrails, FinOps processes, security controls and autoscaling policies. Why not onboard AI into the same architecture and move quickly? Because AI is not just another workload category. It is a different behavioral system. That distinction sounds subtle. In practice, it changes everything. The assumption that worked for everything else Traditional c

CIO Magazine

7m40 minutes ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 150 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

From 1.5s to 250ms: How We 6x'd API Latency with Spring Boot Optimization

The Hook: When Your System Meets Reality

The Scenario: A System Under Pressure

Current State (Post-Migration, Spring Boot)

Why This Matters

Root Cause Analysis: Finding the Bottleneck

1. Database Query Profiling

2. Connection Pool Exhaustion

3. Missing Caching Layer

Solution 1: Hibernate JOIN FETCH (Eliminate N+1)

Solution 2: HikariCP Connection Pool Tuning

✅ AFTER (Right-sized for concurrency)

Solution 3: Redis-Backed Caching Strategy

Architecture Overview: Before vs. After

Before Optimization

After Optimization

Results: The Metrics That Matter

Business Impact

Lessons Learned: Engineering Trade-offs

What We Sacrificed

What Worked

Key Takeaways

Daily AI Digest

More about

HotJobs launched as Sri Lanka’s first AI-driven recruitment platform - Daily FT

Desktop Canary v2.1.48-canary.26

Which cloud architecture decision do tech leaders regret most? Treating AI like just another workload

Knowledge Map

Connected Articles — Knowledge Graph

Discussion

More in Products

Desktop Canary v2.1.48-canary.26

Google Vids 2.0

Baidu’s AI Assistant Reaches Milestone of 200 Million Monthly Active Users - WSJ

Which cloud architecture decision do tech leaders regret most? Treating AI like just another workload