Question 1: Concurrent Web Crawler Implementation

Design and implement a multi-threaded web crawler that efficiently crawls a specific domain and counts unique URLs

Requirements:

Handle circular references and invalid URLs
implement rate limiting and exponential back off
Process robots.txt compliance
Support different URL schemes and redirects
Optimize for memory usage with large datasets

Overview

[Seed URLs]
    |
  [Frontier in-memory queue] <--- [Persistent overflow queue (RocksDB/file)]
    |
  [Worker Pool: N threads]
    |---> [RobotsCache] (check)
    |---> [URL Normalizer]
    |---> [Seen: BloomFilter (mem) + RocksDB (exact)]
    |---> [RateLimiter per-host]
    |---> [Fetcher (HTTP client)] --> [Redirect Handler]
    |                                   |
    |                                   v
    |                                  [Parser (streaming)]
    |                                   |
    |------- enqueue new links ---------------
    |
  [Storage: only metadata (URL, status, timestamp)]
    |
  [Metrics/Logging/Monitor]

Frontier

function offer(url):
    if not isValid(url): return
    if inmemoryQueue.size < MAX_INMEM: push(inmemoryQueue, url)
    else persistQueue.append(url)


function refillWorker():
    if inmemoryQueue.isEmpty: load up to N from persistQueue

Seen (BloomFilter + RocksDB)

if not bloom.mightContain(url):
    bloom.put(url)
    if not rocksdb.contains(url):
        rocksdb.put(url, meta)
        uniqueCount++

Robots.txt

if not robotsCache.has(host):
    rules = fetchRobots(host)
    robotsCache.put(host, rules)
if not rules.isAllowed(userAgent, path):
    skip
else
    rps = rules.getCrawlDelayOrDefault()
    hostRateLimiter.setRate(host, 1/rps)

Rate Limiter

1 2	lim = hostLimit.computeIfAbsent(host, h -> new TokenBucket(defaultRate, burst)) lim.acquire()

HTTP Fetcher

HttpRequest req = HttpRequest.newBuilder(uri).GET().build();
HttpResponse<InputStream> resp = client.send(req, BodyHandlers.ofInputStream());
int status = resp.statusCode();
if (status >=200 && status<300) processBody(resp.body());
else if (status >=300 && status<400) handleRedirect(resp.headers().firstValue("location"));
else if (status==429 || status>=500) retryWithBackoff();
else // 4xx -> record and drop

Exponential Backoff + Jitter

sleep = base * 2^attempt + random(0, jitter)

for attempt in 0..max:
    try fetch()
    if status ok: break
    if status is retryable:
    sleep = min(maxBackoff, base * 2^attempt)
    sleep = random(0, sleep) // full jitter
    sleepMillis(sleep)
    else break

Canonicalization

from urllib.parse import urlparse, urlunparse, parse_qsl, urlencode

def normalize(u):
    p = urlparse(u)
    scheme = p.scheme.lower()
    host = p.hostname.lower()
    port = '' if p.port in (80,443,None) else f":{p.port}"
    q = sorted([(k,v) for k,v in parse_qsl(p.query) if not is_tracking_param(k)])
    return urlunparse((scheme, host+port, norm_path(p.path), '', urlencode(q), ''))

Parser

1
2
3

for link in streamingParse(htmlStream):
    abs = resolve(baseUrl, link)
    if isSameDomain(abs): offer(abs)

Worker Pool

ExecutorService + NIO Queue

Monitor

Prometheus metrics + Grafana dashboards

Question 2: Distributed LLM Inference System Design

Design a high-throughput system for serving large language model inference requests at scale.

Requirements:

Support 15,000+ requests per second
Handle multiple model variants and sizes
Implement dynamic load balancing
Achieve sub0100ms latency for standard requests
Support zero-downtime model updates
Include comprehensive monitoring and alerting

Question 3: High-Performance Document Similarity Search

Implement an efficient algorithm for finding top-k most similar documents from a massive corpus

Requirements:

Handle 15M+ document corpus
Average document length: 1500+ words
Query response time: < 30 ms
Support real-time document additions
Optimize memory footprint

Question 4: Distributed System Debugging Challenge

Debug a meesage queue system experiencing performance degradation under high load

Common Issues:

Race conditions in concurrent message processing
Memory leaks from unclosed connections
Improper error handling causing silent failures
Inefficient connection pooling strategies
Deadlocks in multi-threaded environment

Question 5: Real-time Data Processing Pipeline

Design a system for processing streaming data with low latency and high reliability.

Requirements:

Process 100,000+ events per second
Maintain exactly-once processing semantics
Support complex event transformations
Handle out-of-order events
Implement fault tolerance and recovery

Question 6: Machine Learning Model Serving Optimization

Optimize an existing ML model serving system for better performance and resource utilization

Focus Areas:

Batch processing optimization
GPU memory management
Model parallelism strategies
Caching mechanisms for predictions
Auto-scaling based on demand

Question 7: Database Performance Tuning

Analyze and optimize a database system experiencing slow query performance.

Common Problems:

Missing or inefficient indexes
Poor query optimization
Lock contention issues
Inadequate connection pooling
Suboptimal schema design