From 1.5s to 250ms: How We 6x'd API Latency with Spring Boot Optimization
The Hook: When Your System Meets Reality Picture this: It's Tuesday morning, and your leave management system—serving 20,000 employees across enterprise teams—suddenly becomes the bottleneck. Approvals that should take seconds are taking 1.5 seconds per request . Dashboard loads feel sluggish. Support tickets flood in. The database team reports 85% CPU utilization , and your ops team is preparing incident escalation. The system was built to "work." But production doesn't reward "working"—it demands reliability, speed, and efficiency at scale . This is the story of how we inherited a system with p99 latency of 1.5s on a platform serving 15K+ daily requests, and through methodical architectural profiling and optimization, reduced it to 250ms (6x faster) , while simultaneously cutting databas
The Hook: When Your System Meets Reality
Picture this: It's Tuesday morning, and your leave management system—serving 20,000 employees across enterprise teams—suddenly becomes the bottleneck. Approvals that should take seconds are taking 1.5 seconds per request. Dashboard loads feel sluggish. Support tickets flood in. The database team reports 85% CPU utilization, and your ops team is preparing incident escalation.
The system was built to "work." But production doesn't reward "working"—it demands reliability, speed, and efficiency at scale.
This is the story of how we inherited a system with p99 latency of 1.5s on a platform serving 15K+ daily requests, and through methodical architectural profiling and optimization, reduced it to 250ms (6x faster), while simultaneously cutting database CPU consumption by 40%.
Note: Even after migrating from legacy Python/Flask to Spring Boot, the latency bottleneck persisted. The migration provided better observability and scalability, but the root causes—inefficient queries, undersized connection pools, and missing caching—remained. The optimizations below solved the actual problem.
The Scenario: A System Under Pressure
Current State (Post-Migration, Spring Boot)
-
System: Newly migrated Spring Boot microservice (from legacy Python/Flask)
-
Scale: 20,000 employees, 15,000+ requests/day
-
Latency: p99 = 1.5 seconds (unacceptable for user experience)
-
Database Load: 85% CPU utilization on a 4-core instance
-
Problem: Migration alone didn't solve the performance issues; architectural inefficiencies remained
Why This Matters
At scale, every 100ms saved = better UX, lower operational costs, and reduced infrastructure spend. For 15K daily requests, cutting latency by 1.25s saves ~50 compute-hours daily. That's the difference between needing two database instances or one.
Root Cause Analysis: Finding the Bottleneck
Our investigation focused on three key areas:
1. Database Query Profiling
Using Spring Boot Actuator and MySQL slow query logs, we discovered the smoking gun: The N+1 Query Problem — A textbook horror:
// ❌ BEFORE: N+1 Query Anti-pattern public List getLeaveRequestsByDepartment(Long deptId) { List depts = em.createQuery( "SELECT d FROM Department d WHERE d.id = :deptId", Department.class) .setParameter("deptId", deptId) .getResultList(); // Query 1: Fetch department// ❌ BEFORE: N+1 Query Anti-pattern public List getLeaveRequestsByDepartment(Long deptId) { List depts = em.createQuery( "SELECT d FROM Department d WHERE d.id = :deptId", Department.class) .setParameter("deptId", deptId) .getResultList(); // Query 1: Fetch departmentfor (Department dept : depts) { List employees = em.createQuery( "SELECT e FROM Employee e WHERE e.department.id = :deptId", Employee.class) .setParameter("deptId", dept.getId()) .getResultList(); // Query 2-N: Fetch all employees (1 per loop)
for (Employee emp : employees) { List leaves = em.createQuery( "SELECT l FROM LeaveRequest l WHERE l.employee.id = :empId", LeaveRequest.class) .setParameter("empId", emp.getId()) .getResultList(); // Query 3-N²: Fetch leaves per employee } } }`
Enter fullscreen mode
Exit fullscreen mode
Impact: For a department with 500 employees, this triggered 1 + 500 + (500 * avg_leaves) = ~2000+ queries.*
Database response time per request: ~1.2 seconds (just waiting for the database).
2. Connection Pool Exhaustion
HikariCP was configured with:
spring.datasource.hikari.maximum-pool-size=10
Enter fullscreen mode
Exit fullscreen mode
With 15K requests/day and 1.2s per database roundtrip, connections were being held too long. We hit connection pool saturation, causing requests to queue.
3. Missing Caching Layer
User roles, department hierarchies, and leave policies were fetched on every request—data that changes infrequently but was being queried thousands of times daily.
Solution 1: Hibernate JOIN FETCH (Eliminate N+1)
The most impactful change was using Hibernate's JOIN FETCH to eagerly load relationships in a single query:
// ✅ AFTER: Single JOIN FETCH query @Query(value = """ SELECT DISTINCT d FROM Department d LEFT JOIN FETCH d.employees e LEFT JOIN FETCH e.leaveRequests l WHERE d.id = :deptId """) List getLeaveRequestsByDepartment(@Param("deptId") Long deptId);// ✅ AFTER: Single JOIN FETCH query @Query(value = """ SELECT DISTINCT d FROM Department d LEFT JOIN FETCH d.employees e LEFT JOIN FETCH e.leaveRequests l WHERE d.id = :deptId """) List getLeaveRequestsByDepartment(@Param("deptId") Long deptId);Enter fullscreen mode
Exit fullscreen mode
Trade-off: This fetches all data upfront (potentially unnecessary if you only need recent leaves). But in this case, the leave request list was always needed.
Result: From 2000+ queries → 1 query. Database latency dropped from 1.2s → 350ms.
Solution 2: HikariCP Connection Pool Tuning
Before diving into pool size, we profiled connection lifecycle:
# ❌ BEFORE (Undersized) spring.datasource.hikari.maximum-pool-size=10 spring.datasource.hikari.minimum-idle=2# ❌ BEFORE (Undersized) spring.datasource.hikari.maximum-pool-size=10 spring.datasource.hikari.minimum-idle=2✅ AFTER (Right-sized for concurrency)
spring.datasource.hikari.maximum-pool-size=50 spring.datasource.hikari.minimum-idle=10 spring.datasource.hikari.connection-timeout=10000 # 10s timeout before failing spring.datasource.hikari.idle-timeout=600000 # 10 min before closing idle connections spring.datasource.hikari.leak-detection-threshold=60000 # Detect connection leaks`
Enter fullscreen mode
Exit fullscreen mode
Why 50? Using Little's Law (Connections = Concurrent Requests × Query Time):
-
Measured concurrent requests at p99: 30-40 (from production JMX metrics)
-
Safety buffer for traffic spikes: +10-20
-
Total: 50 connections (30-40 + 10-20)
This sizing eliminates connection queue time while avoiding resource waste.
Result: Connection queue time eliminated. No more "Connection pool exhausted" errors.
Solution 3: Redis-Backed Caching Strategy
We implemented a two-tier caching strategy for frequently-accessed, infrequently-changed data:
@Service @RequiredArgsConstructor public class LeavePolicyCache { private final RedisTemplate redisTemplate; private final LeavePolicyRepository leavePolicyRepository; private static final String CACHE_KEY_PREFIX = "leave:policy:"; private static final long TTL_SECONDS = 86400; // 24 hours@Service @RequiredArgsConstructor public class LeavePolicyCache { private final RedisTemplate redisTemplate; private final LeavePolicyRepository leavePolicyRepository; private static final String CACHE_KEY_PREFIX = "leave:policy:"; private static final long TTL_SECONDS = 86400; // 24 hours/**
- Fetch leave policy for an employee (vacation quota, sick leave, carry-forward rules).
- Policies rarely change, so we cache them aggressively. / public LeavePolicy getLeavePolicy(Long employeeId) { String cacheKey = CACHE_KEY_PREFIX + employeeId;
// Try Redis first LeavePolicy cached = redisTemplate.opsForValue().get(cacheKey); if (cached != null) { return cached; // Cache hit: O(1) lookup, <1ms response }
// Cache miss: Fetch from database // (joins employee → employment_type → leave_policy tables) LeavePolicy policy = leavePolicyRepository.findByEmployeeId(employeeId) .orElseThrow();
// Populate cache with 24-hour TTL redisTemplate.opsForValue() .set(cacheKey, policy, Duration.ofSeconds(TTL_SECONDS));
return policy; }
@EventListener public void onLeavePolicyUpdated(LeavePolicyUpdatedEvent event) { // Invalidate cache when policies change (e.g., annual quota reset, policy update) String cacheKey = CACHE_KEY_PREFIX + event.getEmployeeId(); redisTemplate.delete(cacheKey); } }`
Enter fullscreen mode
Exit fullscreen mode
What We Cache (and Why):
-
Leave Policies: Vacation quotas, sick leave limits, carry-forward rules — accessed on every leave request submission/validation, but change only during annual resets or policy updates.
-
Approval Workflows: Manager hierarchies and approval chains — needed to route leave requests, but organization structure changes infrequently.
Cache Invalidation Strategy: Event-driven (on policy/org change, we emit events and invalidate) + 24-hour TTL.
Why Long TTL? Leave policies are extremely stable (change annually or when employee switches departments). 24-hour TTL ensures we catch manual DB changes without event emission.
Result: Leave policy lookups: 120ms (database join across 3 tables) → <1ms (Redis). Reduced database CPU by ~40% since these queries were happening on every leave request view/submission.
Architecture Overview: Before vs. After
Before Optimization
Client Request ↓ Spring Boot Controller ↓ Service Layer (Business Logic) ↓ [N+1 Queries] → Database (1.2s latency) ↓ [No Caching] ↓ Response (1.5s p99)Client Request ↓ Spring Boot Controller ↓ Service Layer (Business Logic) ↓ [N+1 Queries] → Database (1.2s latency) ↓ [No Caching] ↓ Response (1.5s p99)Enter fullscreen mode
Exit fullscreen mode
After Optimization
Client Request ↓ Spring Boot Controller ↓ Cache Check (Redis) ← 1ms ✅ ├─ Hit: Return cached Leave Policy └─ Miss: Continue to database ↓ Service Layer (Business Logic) ↓ [Single JOIN FETCH Query] → Database (350ms) ↓ [HikariCP Optimized] (50-connection pool) ↓ Response (250ms p99) ✅Client Request ↓ Spring Boot Controller ↓ Cache Check (Redis) ← 1ms ✅ ├─ Hit: Return cached Leave Policy └─ Miss: Continue to database ↓ Service Layer (Business Logic) ↓ [Single JOIN FETCH Query] → Database (350ms) ↓ [HikariCP Optimized] (50-connection pool) ↓ Response (250ms p99) ✅Enter fullscreen mode
Exit fullscreen mode
Results: The Metrics That Matter
Metric Before After Improvement
p99 Latency 1.5s 250ms
6x faster ✅
p50 Latency 850ms 180ms 4.7x faster
Database CPU 85% 51%
40% reduction ✅
Queries/Request 2000+ 1 99.95% fewer
Redis Hit Rate N/A 87% —
Connection Pool Timeout Errors 45/day 0 100% elimination
Business Impact
-
User Experience: Dashboard loads feel snappy (250ms vs 1.5s)
-
Infrastructure: Supports 15K+ daily requests on smaller database instance
-
Reliability: No more p99 latency spikes during business hours
Lessons Learned: Engineering Trade-offs
What We Sacrificed
-
Flexibility: JOIN FETCH loads all data. If you only needed recent leaves, you'd fetch unnecessary historical data. Solution: Separate queries for different use cases (leave summary vs. full history).
-
Memory: Increased HikariCP pool size (10 → 50) uses more heap. Reality: Better to use extra memory than have request queues. Monitoring showed peak heap: 1.2GB / 4GB available.
-
Cache Consistency: Redis introduces eventual consistency. Mitigation: Event-driven invalidation + TTL ensures freshness within 1 hour.
What Worked
JOIN FETCH: Single, largest impact. 99.95% query reduction. Connection pooling: Eliminated queueing entirely. Caching strategy: 87% hit rate validates our data access patterns.
Key Takeaways
-
Profile First, Optimize Second Don't guess where the bottleneck is. We assumed CPU was the issue—it was actually database queries. Use Spring Boot Actuator, MySQL slow logs, and flamegraphs.
-
N+1 is Insidious With 20K employees and historical leave records, N+1 queries cascaded into 2000+ database roundtrips per request. Always use EXPLAIN PLAN and test with realistic data sizes.
-
Connection Pooling Matters More Than You Think Undersized pools (10 connections) caused request queueing, which is invisible in application metrics but devastating to latency. Right-size based on concurrency math, not gut feel.
-
Caching is Not Free Cache invalidation is hard. We chose event-driven + TTL because it's reliable and simple. Measure hit rates to validate your strategy.
-
6x Latency Improvement = 6x Better UX Users feel the difference between 1.5s and 250ms. This wasn't just a technical victory—it was a product improvement.
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
availableupdateproductDesktop Canary v2.1.48-canary.26
🐤 Canary Build — v2.1.48-canary.26 Automated canary build from canary branch. ⚠️ Important Notes This is an automated canary build and is NOT intended for production use. Canary builds are triggered by build / fix / style commits on the canary branch. May contain unstable or incomplete changes . Use at your own risk. It is strongly recommended to back up your data before using a canary build. 📦 Installation Download the appropriate installer for your platform from the assets below. Platform File macOS (Apple Silicon) .dmg (arm64) macOS (Intel) .dmg (x64) Windows .exe Linux .AppImage / .deb

Which cloud architecture decision do tech leaders regret most? Treating AI like just another workload
For years, cloud strategy rewarded standardization. Treat everything as a workload, abstract the differences, optimize for scale and cost. That mindset helped enterprises modernize faster than any previous infrastructure shift. Applying that same mindset to AI is one of the most consequential architectural mistakes I see senior IT leaders make. In executive rooms, the logic is understandable. We already have a hardened cloud platform. We have guardrails, FinOps processes, security controls and autoscaling policies. Why not onboard AI into the same architecture and move quickly? Because AI is not just another workload category. It is a different behavioral system. That distinction sounds subtle. In practice, it changes everything. The assumption that worked for everything else Traditional c
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Products
Desktop Canary v2.1.48-canary.26
🐤 Canary Build — v2.1.48-canary.26 Automated canary build from canary branch. ⚠️ Important Notes This is an automated canary build and is NOT intended for production use. Canary builds are triggered by build / fix / style commits on the canary branch. May contain unstable or incomplete changes . Use at your own risk. It is strongly recommended to back up your data before using a canary build. 📦 Installation Download the appropriate installer for your platform from the assets below. Platform File macOS (Apple Silicon) .dmg (arm64) macOS (Intel) .dmg (x64) Windows .exe Linux .AppImage / .deb

Which cloud architecture decision do tech leaders regret most? Treating AI like just another workload
For years, cloud strategy rewarded standardization. Treat everything as a workload, abstract the differences, optimize for scale and cost. That mindset helped enterprises modernize faster than any previous infrastructure shift. Applying that same mindset to AI is one of the most consequential architectural mistakes I see senior IT leaders make. In executive rooms, the logic is understandable. We already have a hardened cloud platform. We have guardrails, FinOps processes, security controls and autoscaling policies. Why not onboard AI into the same architecture and move quickly? Because AI is not just another workload category. It is a different behavioral system. That distinction sounds subtle. In practice, it changes everything. The assumption that worked for everything else Traditional c




Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!