Question 1: Concurrent Web Crawler Implementation

Design and implement a multi-threaded web crawler that efficiently crawls a specific domain and counts unique URLs

Requirements:

  • Handle circular references and invalid URLs
  • implement rate limiting and exponential back off
  • Process robots.txt compliance
  • Support different URL schemes and redirects
  • Optimize for memory usage with large datasets

Overview

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
[Seed URLs]
|
[Frontier in-memory queue] <--- [Persistent overflow queue (RocksDB/file)]
|
[Worker Pool: N threads]
|---> [RobotsCache] (check)
|---> [URL Normalizer]
|---> [Seen: BloomFilter (mem) + RocksDB (exact)]
|---> [RateLimiter per-host]
|---> [Fetcher (HTTP client)] --> [Redirect Handler]
| |
| v
| [Parser (streaming)]
| |
|------- enqueue new links ---------------
|
[Storage: only metadata (URL, status, timestamp)]
|
[Metrics/Logging/Monitor]

Frontier

1
2
3
4
5
6
7
8
function offer(url):
if not isValid(url): return
if inmemoryQueue.size < MAX_INMEM: push(inmemoryQueue, url)
else persistQueue.append(url)


function refillWorker():
if inmemoryQueue.isEmpty: load up to N from persistQueue

Seen (BloomFilter + RocksDB)

1
2
3
4
5
if not bloom.mightContain(url):
bloom.put(url)
if not rocksdb.contains(url):
rocksdb.put(url, meta)
uniqueCount++

Robots.txt

1
2
3
4
5
6
7
8
if not robotsCache.has(host):
rules = fetchRobots(host)
robotsCache.put(host, rules)
if not rules.isAllowed(userAgent, path):
skip
else
rps = rules.getCrawlDelayOrDefault()
hostRateLimiter.setRate(host, 1/rps)

Rate Limiter

1
2
lim = hostLimit.computeIfAbsent(host, h -> new TokenBucket(defaultRate, burst))
lim.acquire()

HTTP Fetcher

1
2
3
4
5
6
7
HttpRequest req = HttpRequest.newBuilder(uri).GET().build();
HttpResponse<InputStream> resp = client.send(req, BodyHandlers.ofInputStream());
int status = resp.statusCode();
if (status >=200 && status<300) processBody(resp.body());
else if (status >=300 && status<400) handleRedirect(resp.headers().firstValue("location"));
else if (status==429 || status>=500) retryWithBackoff();
else // 4xx -> record and drop

Exponential Backoff + Jitter

sleep = base * 2^attempt + random(0, jitter)

1
2
3
4
5
6
7
8
for attempt in 0..max:
try fetch()
if status ok: break
if status is retryable:
sleep = min(maxBackoff, base * 2^attempt)
sleep = random(0, sleep) // full jitter
sleepMillis(sleep)
else break

Canonicalization

1
2
3
4
5
6
7
8
9
from urllib.parse import urlparse, urlunparse, parse_qsl, urlencode

def normalize(u):
p = urlparse(u)
scheme = p.scheme.lower()
host = p.hostname.lower()
port = '' if p.port in (80,443,None) else f":{p.port}"
q = sorted([(k,v) for k,v in parse_qsl(p.query) if not is_tracking_param(k)])
return urlunparse((scheme, host+port, norm_path(p.path), '', urlencode(q), ''))

Parser

1
2
3
for link in streamingParse(htmlStream):
abs = resolve(baseUrl, link)
if isSameDomain(abs): offer(abs)

Worker Pool

ExecutorService + NIO Queue

Monitor

Prometheus metrics + Grafana dashboards

Question 2: Distributed LLM Inference System Design

Design a high-throughput system for serving large language model inference requests at scale.

Requirements:

  • Support 15,000+ requests per second
  • Handle multiple model variants and sizes
  • Implement dynamic load balancing
  • Achieve sub0100ms latency for standard requests
  • Support zero-downtime model updates
  • Include comprehensive monitoring and alerting

Question 3: High-Performance Document Similarity Search

Implement an efficient algorithm for finding top-k most similar documents from a massive corpus

Requirements:

  • Handle 15M+ document corpus
  • Average document length: 1500+ words
  • Query response time: < 30 ms
  • Support real-time document additions
  • Optimize memory footprint

Question 4: Distributed System Debugging Challenge

Debug a meesage queue system experiencing performance degradation under high load

Common Issues:

  • Race conditions in concurrent message processing
  • Memory leaks from unclosed connections
  • Improper error handling causing silent failures
  • Inefficient connection pooling strategies
  • Deadlocks in multi-threaded environment

Question 5: Real-time Data Processing Pipeline

Design a system for processing streaming data with low latency and high reliability.

Requirements:

  • Process 100,000+ events per second
  • Maintain exactly-once processing semantics
  • Support complex event transformations
  • Handle out-of-order events
  • Implement fault tolerance and recovery

Question 6: Machine Learning Model Serving Optimization

Optimize an existing ML model serving system for better performance and resource utilization

Focus Areas:

  • Batch processing optimization
  • GPU memory management
  • Model parallelism strategies
  • Caching mechanisms for predictions
  • Auto-scaling based on demand

Question 7: Database Performance Tuning

Analyze and optimize a database system experiencing slow query performance.

Common Problems:

  • Missing or inefficient indexes
  • Poor query optimization
  • Lock contention issues
  • Inadequate connection pooling
  • Suboptimal schema design