Top Causes of ANRs in Android and How to Eliminate Them (Complete Developer Guide)

Introduction

Imagine a user opens your Android app, taps a button, and nothing happens.

The screen freezes.

After a few seconds, Android shows:

Application Not Responding (ANR)

Most users don’t wait.

They close the app and uninstall it.

That’s why ANRs are one of the most dangerous performance issues in Android apps.

According to Android Developers documentation, ANRs happen when the main thread is blocked and cannot respond to user input within a specific time.

And if your ANR rate becomes high, Google Play can reduce your app visibility.

So understanding ANRs is not optional — it’s critical.

**What is an ANR? (Simple Explanation)

Official Definition**

ANR occurs when the UI thread is blocked and cannot process user input or draw frames.

Android shows ANR when:

  • Input event not handled within 5 seconds
  • BroadcastReceiver runs too long
  • Service takes too long to start
  • Job or foreground service delays response

Real-World Example

Think of your app like a restaurant.

  • UI Thread = Waiter
  • Background Thread = Kitchen
  • User = Customer

Good Flow

Customer orders → Waiter sends to kitchen → Kitchen prepares → Waiter delivers

Everything runs smoothly.

Bad Flow (ANR)

Customer orders → Waiter goes to kitchen and starts cooking himself

Now:

  • No one takes new orders
  • No one serves food
  • Restaurant stops working

This is exactly what happens when UI thread does heavy work.

Top Causes of ANRs

1. Heavy Work on Main Thread ❌

Problem

Running:

  • Network calls
  • Database queries
  • File reading
  • JSON parsing
  • Image processing

on UI thread.

Android documentation clearly states:

Keep main thread unblocked and move heavy work to worker threads

2. Long Database or Network Operations ❌

Example:

val data = api.getUsers() // running on main thread

This blocks UI.

Result → ANR.

3. BroadcastReceiver Doing Heavy Work ❌

BroadcastReceiver should do small quick tasks only.

If it runs too long → ANR.

Android recommends moving work to background threads.

4. Service Not Starting in Time ❌

Foreground service must call:

startForeground()

within 5 seconds.

Otherwise → ANR.

5. Deadlocks and Thread Blocking ❌

Example:

Thread A waiting for Thread B
Thread B waiting for Thread A

Result:

App freezes.

Android documentation calls this deadlock, a major ANR cause.

**Common Developer Mistakes

❌ Running API calls in Activity**

override fun onCreate() {
    super.onCreate()

    val users = api.getUsers()
}

Wrong approach.

❌ Large JSON parsing on UI thread

val json = File("data.json").readText()

ANR risk.

Best Practices to Avoid ANRs

✅ Use Coroutines

viewModelScope.launch {
    val users = repository.getUsers()
}

Moves work off main thread.

✅ Use Dispatchers.IO

withContext(Dispatchers.IO) {
    api.getUsers()
}

Optimized for I/O.

✅ Use Flow for Continuous Data

repository.getUsersFlow()
    .flowOn(Dispatchers.IO)

Efficient and non-blocking.

✅ Use WorkManager for Background Tasks

Good for:

  • Sync
  • Upload
  • Downloads
  • Scheduling

✅ Keep BroadcastReceiver Lightweight

override fun onReceive(context: Context, intent: Intent) {

    CoroutineScope(Dispatchers.IO).launch {
        repository.sync()
    }
}

Modern Kotlin Code Example
Repository

class UserRepository(
    private val api: UserApi
) {

    suspend fun getUsers(): List<User> {
        return withContext(Dispatchers.IO) {
            api.getUsers()
        }
    }
}

ViewModel

@HiltViewModel
class UserViewModel @Inject constructor(
    private val repository: UserRepository
) : ViewModel() {

    private val _users = MutableStateFlow<List<User>>(emptyList())
    val users = _users.asStateFlow()

    init {
        loadUsers()
    }

    private fun loadUsers() {
        viewModelScope.launch {
            _users.value = repository.getUsers()
        }
    }
}

Jetpack Compose UI

@Composable
fun UserScreen(viewModel: UserViewModel) {

    val users by viewModel.users.collectAsState()

    LazyColumn {
        items(users) { user ->
            Text(user.name)
        }
    }
}

Real-World Use Case
Video Fetching App

Scenario:

App scans device videos.

Wrong

Scanning files in Activity.

UI freezes.

ANR occurs.

Correct

  • Splash loads videos
  • Repository uses Dispatchers.IO
  • Flow emits data
  • UI updates smoothly

Result:

✔ No ANR
✔ Smooth scrolling
✔ Fast loading
✔ Better Play Store rating

**Key Takeaways

Core Rule**

Never block the main thread

Important Points

✔ UI thread must always be free
✔ Move heavy work to background threads
✔ Use Coroutines and Flow
✔ Keep BroadcastReceiver lightweight
✔ Use WorkManager for long tasks
✔ Avoid deadlocks and thread blocking
✔ Monitor ANR in Play Console

Conclusion

ANRs are not just performance issues.

They directly impact:

  • User experience
  • App rating
  • Play Store ranking
  • Revenue

The safest strategy is simple:

Keep the UI thread clean and move all heavy work to background threads.

If you follow Android’s official guidelines and modern Kotlin practices, ANRs can be almost eliminated.

Feel free to reach out to me with any questions or opportunities at (aahsanaahmed26@gmail.com)
LinkedIn (https://www.linkedin.com/in/ahsan-ahmed-39544b246/)
Facebook (https://www.facebook.com/profile.php?id=100083917520174).
YouTube (https://www.youtube.com/@mobileappdevelopment4343)
Instagram (https://www.instagram.com/ahsanahmed_03/)

The Cartographer’s Confession: How PostGIS Turned Me from a SQL Hack into a Spatial Artist

Let me start with a confession. For years, I treated geospatial data like a messy closet—shove everything in, slam the door, and pray nobody asks for a “nearby” anything. Then came the project that broke me: a real-time delivery tracker with 50k points and a naive WHERE sqrt((x1-x2)^2 + (y1-y2)^2) < 0.01 query that took forty-five seconds. My CTO’s Slack message just said: “Oof.”

That night, I discovered PostGIS. And I learned that working with space on a computer isn’t just math—it’s an art form. One where you’re both the cartographer and the gallery curator.

So grab coffee. Let me walk you through the journey from “it works on my laptop” to “this scales like a dream.” No marketing fluff. Just the battle scars and the beautiful abstractions that saved my sanity.

Act I: The Naive Cartographer (or, Why Euclidean Distance Lies)

You know the scene. You have a restaurants table with lat and lon as plain decimals. A user wants all taco joints within 1 km. Your first instinct:

SELECT * FROM restaurants
WHERE sqrt((lat - 40.7128)^2 + (lon - -74.0060)^2) < 0.009;  -- ~1km in deg?!

This is wrong on two levels. First, degrees are not kilometers—unless you enjoy eating polar-bear tacos at the equator. Second, that query will do a full table scan every time. Your database is now screaming like a dying server fan.

The awakening: PostGIS introduces geometry types and a proper spatial relationship model. The same query becomes:

SELECT * FROM restaurants
WHERE ST_DWithin(
  geom, 
  ST_SetSRID(ST_MakePoint(-74.0060, 40.7128), 4326),
  1000  -- meters, thank you very much
);

But wait—that still scanned everything? Right. Because we forgot the most important part.

Act II: The Index as a Legend (GIST is Your Compass)

Here’s where the art begins. A normal B-tree index is like alphabetizing a bookshelf—great for “title = X”. But spatial data is a map. You don’t search a map by flipping pages; you fold it, you zoom, you glance at regions.

Enter GIST (Generalized Search Tree). Think of it as an origami master that folds your 2D (or 3D, or 4D) space into a tree of bounding boxes. When you query “find points within 1 km,” PostGIS uses the index to discard entire continents of data instantly.

Create it:

CREATE INDEX idx_restaurants_geom ON restaurants USING GIST (geom);

That one line turned my 45-second query into 80 milliseconds. I literally laughed out loud. My cat left the room.

But indexing isn’t magic—it’s a trade-off. GIST indexes are slightly slower to update (insert/update/delete) than B-trees. For a write-heavy geospatial table, you’ll need to tune autovacuum or batch your writes. More on that later.

Art lesson: A GIST index is like the legend on a map—it doesn’t show every tree, but it tells you exactly how to find the forest.

Act III: The Palette of Spatial Functions (Don’t Paint with a Hammer)

PostGIS has hundreds of functions. You only need a dozen to be dangerous. Here’s my everyday toolkit, refined through actual pain:

What you want The function Why it’s beautiful
Distance filter ST_DWithin(geom1, geom2, radius) Uses index. Always. Don’t use ST_Distance in WHERE.
True intersection ST_Intersects(geom1, geom2) Handles boundaries, overlaps, touches.
Nearest neighbor geom <-> ST_SetSRID(...) The “knight move” of spatial indexes—uses KNN.
Area of a polygon ST_Area(geom::geography) Returns square meters. Geography type respects Earth’s curve.
Convert lat/lon to geometry ST_SetSRID(ST_MakePoint(lon, lat), 4326) Remember: longitude first. I’ve cried over swapped axes.

Real example: Find the 10 closest coffee shops to a user, within 5 km, ordered by distance.

SELECT name, ST_Distance(geom, user_geom) AS dist
FROM coffee_shops
WHERE ST_DWithin(geom, user_geom, 5000)
ORDER BY geom <-> user_geom
LIMIT 10;

That <-> operator? It’s the KNN (K-Nearest Neighbor) index-assisted magic. Without it, PostGIS would calculate distance for every shop within 5 km, then sort. With it, the index walks the tree and returns candidates in approximate order. It’s not exact until the final sort, but it’s blindingly fast.

Act IV: The Geometry vs. Geography Schism (A Tale of Two Earths)

You’ll hit this around 2 AM. Your polygons on a city scale work fine. Then you try to calculate the area of a country and get numbers that would make a flat-earther nod approvingly.

Geometry: Treats the Earth as a flat Cartesian plane. Good for local projects (a few hundred km). Fast. Simple. Wrong for global distances.

Geography: Uses a spheroidal model (WGS84 by default). Accurate for distance, area, and bearing across the globe. Slower, because it’s doing real math.

My rule of thumb:

  • Store as geometry with SRID 4326 (lat/lon coordinates). It’s lightweight.
  • Use geography casting when you need Earth-aware calculations: geom::geography.
  • Index both – but a GIST on geography is larger and slightly slower.

Pro tip: For large tables with global queries, add a geog column as geography(Point, 4326) and index that. Then you can write clean queries like:

SELECT * FROM sensors
WHERE ST_DWithin(geog, ST_MakePoint(lon, lat)::geography, 50000); -- 50 km

No casting in the query means the index gets used without hesitation.

Act V: The Performance Trap (What They Don’t Put in the Brochure)

You’ve indexed everything. Queries are snappy. Then you deploy to production and… it’s slow again. Why?

Three silent killers:

  1. Implicit casting in the WHERE clause

    WHERE ST_DWithin(geom::geography, ...) – the cast happens before the index lookup. PostGIS can’t use a GIST on geometry for a geography query. Keep types consistent.

  2. Using ST_Distance for filtering

   -- This is a full scan. Always.
   WHERE ST_Distance(geom, point) < 1000

ST_DWithin exists for a reason. Use it.

  1. Over-indexing on large polygons
    A GIST index on a column full of complex polygons (e.g., country borders) can be huge. Consider storing a simplified “envelope” geometry for coarse filtering, then refine with exact ST_Intersects.

Real story: We had a table of 2M GPS traces. Queries were fast in dev (10k rows). In prod, EXPLAIN ANALYZE showed a bitmap heap scan—PostGIS was reading half the table anyway. Why? The distribution was clustered, but our random test data wasn’t. We added CLUSTER idx_restaurants_geom ON restaurants to physically reorder rows by spatial locality. Query time dropped from 4 seconds to 200ms.

Act VI: The Artistic Workflow (How to Think Spatially)

After two years of wrestling with PostGIS, I’ve developed a kind of intuition. It’s like learning to see negative space in a drawing. Here’s my mental checklist before writing any spatial query:

  1. Draw it first – I keep a whiteboard or a quick QGIS window. Visualizing bounding boxes and intersections saves hours.
  2. Start with the index – Write the query assuming the index will do the heavy lifting. Filter early, refine late.
  3. Test with a point – Run EXPLAIN (ANALYZE, BUFFERS) on a single coordinate. Look for “Seq Scan” – if you see it, your index isn’t being used.
  4. Think in meters, store in degrees – Use geography for distances, geometry for operations. Cast explicitly.
  5. Batch your writes – A GIST index rebuild on 1M rows takes minutes. Do it nightly, not per insert.

Epilogue: You Are Now a Spatial Artist

PostGIS isn’t just a library. It’s a lens that changes how you see data. Suddenly every “near me” button, every delivery route, every heatmap becomes a solvable puzzle instead of a performance nightmare.

The journey from sqrt(lat^2 + lon^2) to elegant ST_DWithin with a GIST index is the difference between a child’s crayon scribble and a Monet. You’ve learned the brushstrokes. Now go paint some maps.

And when someone asks you, “Can you find all points within a polygon?” – smile, open your terminal, and whisper: “Watch this.”

Mandelbrot Set in JS – Smooth Scroll Zoom & Fixing Floating-Point Precision

This is a follow-up to Mandelbrot Set in JS – Zoom In.
In that article we built a Mandelbrot renderer using Canvas and Web Workers, with click-to-zoom.
This post covers what broke after ~16 zooms, why it broke (floating-point precision),
and how we replaced click zoom with a smooth scroll-based zoom that also lets you zoom back out.

The Problem: Everything Turns Black After ~16 Clicks

If you played with the previous demo long enough, you noticed something strange: after zooming in about 16 times, the fractal starts looking pixelated, blocky, and eventually the entire canvas turns solid black.

This isn’t a bug in the Mandelbrot math. The set is infinitely detailed, there’s always more structure to see. The problem is in how computers store decimal numbers.

Root Cause: JavaScript Numbers Have Limited Precision

JavaScript (like most languages) stores all numbers as 64-bit IEEE 754 doubles. This is just the standard format computers use for decimal numbers, and it gives you about 15 to 17 significant digits of precision. That sounds like a lot, but zoom burns through those digits very fast.

How the old zoom worked

Each click zoomed to a window of 2 × ZOOM_FACTOR × canvas_width pixels centered on the click point. With ZOOM_FACTOR = 0.1, each zoom reduced the visible range to 20% of the previous range:

const zfw = WIDTH * ZOOM_FACTOR;  // 800 * 0.1 = 80px on each side
REAL_SET = {
  start: getRelativePoint(e.pageX - canvas.offsetLeft - zfw, WIDTH, REAL_SET),
  end:   getRelativePoint(e.pageX - canvas.offsetLeft + zfw, WIDTH, REAL_SET),
};

The coordinate range after N clicks shrinks like this:

range_after_N = initial_range × 0.2^N
Clicks Real axis range
0 3.0 (from -2 to 1)
5 ~0.00077
10 ~2.4 × 10⁻⁷
15 ~7.5 × 10⁻¹²
16 ~1.5 × 10⁻¹²

At click 15, the range is 7.5e-12. If your center is around -0.7, the coordinates look like:

start: -0.700000000003750
end:   -0.700000000003751

Those two numbers share 15 digits. With only 15 to 17 digits of total precision, the difference between adjacent pixels becomes too small to represent. Every pixel ends up mapping to the same value. Result: a grid of identical colors, pixelation, black.

This is called catastrophic cancellation: when you subtract two numbers that are almost the same, you lose all the useful digits.

The Fix, Part 1: Replace Click with Scroll Zoom

The first change is switching from click to wheel (scroll). This gives us:

  • Zoom in (scroll up) and zoom out (scroll down) with the same gesture
  • Smooth, step-by-step control over the zoom level
  • The zoom always centers on the cursor position

Here is the complete new listener:

const ZOOM_FACTOR = 0.8; // each scroll step = 80% of current range (zoom in)
const MIN_RANGE = 1e-12; // safety limit, stop before precision breaks down

const startListeners = () => {
  canvas.addEventListener('wheel', (e) => {
    e.preventDefault();
    const zoomIn = e.deltaY < 0;
    const factor = zoomIn ? ZOOM_FACTOR : 1 / ZOOM_FACTOR;

    const realRange = REAL_SET.end - REAL_SET.start;
    const imagRange = IMAGINARY_SET.end - IMAGINARY_SET.start;
    const newRealRange = realRange * factor;
    const newImagRange = imagRange * factor;

    // Stop zooming in before precision collapses
    if (newRealRange < MIN_RANGE || newImagRange < MIN_RANGE) return;

    // Map cursor pixel to a point in the complex plane
    const mouseX = e.pageX - canvas.offsetLeft;
    const mouseY = e.pageY - canvas.offsetTop;
    const centerReal = getRelativePoint(mouseX, WIDTH, REAL_SET);
    const centerImag = getRelativePoint(mouseY, HEIGHT, IMAGINARY_SET);

    REAL_SET = {
      start: centerReal - newRealRange / 2,
      end:   centerReal + newRealRange / 2,
    };
    IMAGINARY_SET = {
      start: centerImag - newImagRange / 2,
      end:   centerImag + newImagRange / 2,
    };

    Mandelbrot();
  }, { passive: false }); // passive: false is required to call e.preventDefault()
};

Let’s go through each decision:

e.preventDefault() + { passive: false }

By default, browsers treat wheel events as passive for performance, assuming you won’t stop the default scroll behavior. We need to prevent the page from scrolling while the user zooms the fractal, so we have to opt out. Without { passive: false }, calling preventDefault() does nothing and the page scrolls anyway.

factor = zoomIn ? ZOOM_FACTOR : 1 / ZOOM_FACTOR

Zooming in multiplies the range by 0.8 (makes it smaller). Zooming out divides by 0.8 (makes it bigger). This keeps zoom in and out symmetric, so ten zooms in followed by ten zooms out brings you back to exactly where you started.

Centering on the cursor

The new approach maps the cursor pixel to a point in the complex plane, then builds the new window symmetrically around it:

const centerReal = getRelativePoint(mouseX, WIDTH, REAL_SET);
const centerImag = getRelativePoint(mouseY, HEIGHT, IMAGINARY_SET);

getRelativePoint converts a pixel position to a coordinate using a simple formula:

const getRelativePoint = (pixel, length, set) =>
  set.start + (pixel / length) * (set.end - set.start);

ZOOM_FACTOR = 0.8 instead of 0.1

With 0.1, each click zoomed to 20% of the range, which was very aggressive. The precision limit was hit in 16 steps. With 0.8, each scroll step reduces the range by only 20%, so you can zoom about 130 times before hitting the same limit. It also feels much smoother to use.

The Fix, Part 2: The Precision Guard

if (newRealRange < MIN_RANGE || newImagRange < MIN_RANGE) return;

With MIN_RANGE = 1e-12 we stop zooming in when the coordinate window gets too small. At that scale, the numbers don’t have enough precision left to render a meaningful image. Instead of turning black, the fractal just stays frozen at the last good zoom level. The scroll event is silently ignored.

How the Renderer Still Works

For context, here is how each column of pixels is computed in the Web Worker. This part is the same as in the previous article:

// worker.ts, runs in a separate thread via Vite's ?worker import

const MAX_ITERATION = 1000;

function mandelbrot(c: { x: number; y: number }): [number, boolean] {
  let z = { x: 0, y: 0 };
  let n = 0;
  let d = 0;
  do {
    const p = {
      x: Math.pow(z.x, 2) - Math.pow(z.y, 2),
      y: 2 * z.x * z.y,
    };
    z = { x: p.x + c.x, y: p.y + c.y };
    d = 0.5 * (Math.pow(z.x, 2) + Math.pow(z.y, 2));
    n += 1;
  } while (d <= 2 && n < MAX_ITERATION);
  return [n, d <= 2];
}

This is the core iteration: z → z² + c. A point c is in the Mandelbrot set if |z| never escapes 2 after MAX_ITERATION steps. Points that do escape get colored by how fast they did it (the value of n).

The main thread sends one message per column and the worker replies with the results:

// columns are dispatched in random order for a cool reveal effect
const launchTasks = () => {
  while (TASKS.length > 0) {
    const [col] = TASKS.splice(Math.floor(Math.random() * TASKS.length), 1);
    worker.postMessage({ col });
  }
};

Current Limitations

Here is an honest list of what this implementation still can’t do:

Limitation Why it happens
~130 scroll steps max zoom JavaScript number precision (15-17 digits). You need a different approach to go deeper.
Re-renders the full canvas on every scroll event The worker is restarted on each zoom. Fast scrolling queues many full renders.
No mobile support wheel events don’t fire on touch screens. You’d need to handle pinch gestures separately.
Single worker for all columns One worker handles all 800 columns. Multiple workers could be faster.
Fixed MAX_ITERATION = 1000 Deep zoom areas need more iterations to look good, but raising this constant slows everything down.

Future Improvements

1. Arbitrary Precision with decimal.js

To zoom beyond ~130 steps you need more than the standard 64-bit number format. The decimal.js library lets you set how many digits of precision you want:

import Decimal from 'decimal.js';

Decimal.set({ precision: 50 }); // 50 significant digits

const newRange = new Decimal(realRange).mul(factor);
const center  = new Decimal(realSet.start)
  .plus(new Decimal(mouseX).div(WIDTH).mul(new Decimal(realSet.end).minus(realSet.start)));

The downside is that this kind of math is 10 to 100 times slower than normal numbers, so you would need to lower the canvas resolution or the number of iterations to keep things running at a good speed.

2. Perturbation Theory

This is the technique used by professional deep-zoom renderers like Kalles Fraktaler. The idea is to compute one very precise reference point and then calculate all other pixels as small adjustments relative to that point, using regular numbers. This can reach zoom depths of 10^1000 and beyond, with good performance, but it requires a solid math background to implement.

3. Adaptive MAX_ITERATION

Instead of a fixed limit, scale the number of iterations based on how deep the zoom is, so shallow views are fast and deep views show more detail:

const maxIter = Math.floor(100 + zoomLevel * 50);

4. RAF Throttle

The scroll event fires much faster than the renderer can keep up. Using requestAnimationFrame would skip frames that come in too quickly and only render when the browser is ready:

let rafId: number;
canvas.addEventListener('wheel', (e) => {
  e.preventDefault();
  updateCoordinates(e);
  cancelAnimationFrame(rafId);
  rafId = requestAnimationFrame(() => Mandelbrot());
}, { passive: false });

5. Pinch-to-Zoom (Mobile)

Handle touchstart and touchmove with two fingers to calculate a scale factor and apply the same zoom logic.

Summary of Changes

What changed Before After
Interaction click, zoom in only wheel, zoom in and out
Zoom center Approximate click pixel Exact cursor coordinate
Zoom step 20% of range per click 20% of range per scroll tick
Precision guard None, canvas turns black Stops at 1e-12 range
Max useful zooms ~16 ~130
Page scroll behavior Not a concern Blocked with passive: false

Try It Live

You can see the demo running on quijosakaf.com and find the full source on GitHub.

Repository:
github

If you want to experiment, try changing ZOOM_FACTOR between 0.5 (aggressive) and 0.95 (very smooth). The math works the same either way, it’s just a personal preference.

Thanks for Reading

If you made it this far, thank you so much. This kind of topic can get complicated fast, and I appreciate you sticking with it.

I want to be honest: this post was written with the help of AI (Claude). Concepts like IEEE 754, catastrophic cancellation, arbitrary precision arithmetic, and perturbation theory were things I did not know about before I started digging into why the zoom was breaking. The AI helped me understand why each thing was happening and gave me the right words to describe it, which made it much easier to explain here.

The demo will keep improving. The improvements listed above (RAF throttling, adaptive iterations, arbitrary precision, pinch-to-zoom) are real next steps I plan to work on. If you have ideas, found a bug, or just want to talk about fractals, drop a comment below.

AWS Data Centres Got Bombed — 5 Cloud Engineering Roles Every Business Needs Now

The cloud was never abstract. It was always a building with an address — and on March 1, 2026, that address got hit by a drone.

Iranian Shahed drones struck two Amazon Web Services data centres in the United Arab Emirates and damaged a third facility in Bahrain. This was not a cyberattack. This was not a software vulnerability. This was kinetic warfare — missiles and drones targeting the physical infrastructure that powers the digital economy.

The consequences were immediate and devastating. Abu Dhabi Commercial Bank, Emirates NBD, First Abu Dhabi Bank, ride-hailing platform Careem, payment platforms Hubpay and Alaan, and enterprise data platform Snowflake — all experienced outages. AWS confirmed that two of three Availability Zones in the UAE region (ME-CENTRAL-1) were “significantly impaired.” The third zone stayed up, but with cascading degradation across services that depended on cross-zone redundancy.

Then it got worse. On April 1, fresh Iranian strikes hit an AWS data centre in Bahrain again. The Islamic Revolutionary Guard Corps (IRGC) named 18 US tech companies — including Microsoft, Google, Apple, Meta, Oracle, Intel, and Nvidia — as “legitimate military targets.” The statement was explicit: for every assassination, an American company’s infrastructure would be destroyed.

This is the first time in history that a nation-state has deliberately targeted commercial cloud data centres during wartime. And it changes everything about how businesses need to think about their infrastructure.

Why Multi-AZ Failed — And What That Means for Every Cloud Customer

Before we get into the five roles, we need to understand exactly what broke — because it challenges the foundational assumption most businesses make about cloud reliability.

AWS regions are designed with multiple Availability Zones (AZs) — physically separate data centres within the same geographic area. The promise is simple: if one AZ goes down, your workloads failover to another. This is the basis of every “highly available” architecture.

But here’s what happened in ME-CENTRAL-1: two out of three AZs were hit simultaneously. The drones didn’t respect availability zone boundaries. Standard multi-AZ redundancy models assume independent failure domains — a power outage here, a hardware failure there. They do not account for a military strike that takes out multiple facilities in the same city.

As one cloud architect at ABN AMRO Clearing Bank put it bluntly after the attacks: Multi-AZ is NOT disaster recovery. It protects you from hardware failures, not a missile hitting an entire availability zone cluster in the same city.

AWS’s own response confirmed the severity. They advised customers to replicate critical data out of the ME-SOUTH-1 (Bahrain) region entirely — an implicit admission that the region itself was compromised as a safe location. They waived all usage charges for ME-CENTRAL-1 for the entire month of March.

The lesson is clear: multi-AZ gives you high availability. It does not give you disaster recovery. And in a world where data centres are military targets, the distinction between those two concepts is the difference between staying online and going dark.

With that context, here are the five engineering capabilities your business needs — whether you hire for them, build them internally, or partner with an agency that can deliver them.

1. Multi-Cloud and Disaster Recovery Engineer

The Gap Exposed

The AWS attacks exposed a painful truth: most businesses have a single-provider dependency they’ve never stress-tested against a regional catastrophe. The October 2025 AWS outage had already cost an estimated $581 million globally. Now we’re looking at physical destruction — something no SLA covers.

Standard commercial property and business interruption insurance policies frequently exclude acts of war. Companies that had workloads running in ME-CENTRAL-1 or ME-SOUTH-1 discovered they had no financial recourse, no fallback infrastructure, and no tested plan for regional failure.

Paradoxically, Amazon’s stock rallied approximately 3% after the attacks. Why? Analysts predicted that enterprises would now be forced to adopt multi-region and multi-cloud deployments — effectively increasing their cloud spend across providers.

Understanding DR Strategies

Disaster recovery isn’t one-size-fits-all. AWS defines four DR strategies, each with different cost and recovery characteristics:

Backup and Restore is the simplest and cheapest approach. You regularly back up data to cloud storage in another region and restore when needed. Recovery Time Objective (RTO) — how long it takes to get back online — is measured in hours. Recovery Point Objective (RPO) — how much data you lose — depends on backup frequency. This is the bare minimum every business should have.

Pilot Light keeps a minimal version of your environment running in a secondary region. Core infrastructure like databases are replicated, but application servers aren’t running. When disaster strikes, you spin up the full environment. RTO is measured in tens of minutes to hours.

Warm Standby runs a scaled-down but fully functional copy of your production environment in another region. It can handle traffic immediately, albeit at reduced capacity. RTO drops to minutes.

Multi-Site Active/Active is the gold standard. You run fully functional deployments in multiple regions simultaneously. Traffic is distributed across all regions via global load balancers. There’s no failover because all regions are always serving traffic. If one goes down, the others absorb the load automatically. RTO is effectively zero, but cost is highest.

The Tech Stack

For cross-region replication within AWS, the key services are S3 Cross-Region Replication for object storage, DynamoDB Global Tables for NoSQL databases, Aurora Global Database for relational workloads, and AWS Elastic Disaster Recovery (formerly CloudEndure) for server replication.

But single-provider replication isn’t enough anymore. True resilience requires multi-cloud capability:

Terraform is the most critical tool here. As an infrastructure-as-code (IaC) platform, it’s cloud-agnostic — you can define your infrastructure once and deploy it to AWS, GCP, Azure, or any combination. If your current provider’s region goes dark, you can redeploy your entire stack elsewhere from code. Pulumi and AWS CloudFormation are alternatives, but Terraform’s multi-cloud support makes it the clear choice for DR scenarios.

Zerto provides real-time replication across cloud providers with automated failover. Veeam handles hybrid backup scenarios across on-premises and multi-cloud environments. For infrastructure configuration recovery — DNS, CDN, identity providers, network settings — ControlMonkey fills a gap most backup tools miss: they recover your data, but not the infrastructure configuration that makes it accessible.

Global load balancers are the traffic routing layer that makes all of this work. AWS Route 53 (with health checks and failover routing), Cloudflare (with their global Anycast network), or GCP Cloud DNS can automatically reroute traffic away from impaired regions.

What You Should Do Now

Start with an audit. Is your production workload in a single region? A single provider? If the answer to either is yes, you have a single point of failure that is now a known attack vector.

The lowest-effort, highest-impact step is enabling S3 Cross-Region Replication to a region on a different continent. If you’re running databases, enable cross-region read replicas at minimum.

Most importantly, codify your infrastructure with Terraform or equivalent IaC. If your infrastructure exists only as manually configured resources in a console, you cannot redeploy it elsewhere quickly. IaC is your portability insurance.

Finally, test your failover. Quarterly. An untested DR plan is no plan at all. Use AWS Fault Injection Simulator or Gremlin to simulate regional failures and verify your recovery actually works.

At Innovatrix, we build every client’s infrastructure with IaC from day one — from EC2 deployments to S3 backup automation to EBS snapshot scheduling. It’s not optional; it’s foundational. When a region goes dark, the businesses that survive are the ones who can redeploy from code.

2. Data Sovereignty and Compliance Engineer

The Gap Exposed

Here’s a fact that startled many businesses after the attacks: they had no idea their data was even routed through Middle East regions.

Data localization mandates — laws requiring that certain data be physically stored within a country’s borders — had driven hyperscalers to build aggressively in the Gulf. The UAE’s data centre market was projected to more than double from $3.29 billion in 2026 to $7.7 billion by 2031. Businesses that needed to serve Gulf customers were required to process data locally.

Now that data is in an active war zone. And the legal implications are cascading.

Many businesses had workloads routed through Gulf regions without explicit awareness. Their cloud provider optimised for latency, and traffic flowed through the nearest data centre. When that data centre was struck, the business discovered that “the cloud” had a very specific geographic address they hadn’t consented to.

The Regulatory Landscape

Data sovereignty is no longer a checkbox exercise. It’s a strategic imperative that intersects with national security.

India’s Digital Personal Data Protection Act (DPDP), 2023 is rolling out in phases. Phase 1 (November 2025) established the Data Protection Board. Phase 2 (November 2026) makes consent manager frameworks operational. Phase 3 (May 2027) brings all substantive provisions into effect. Significant Data Fiduciaries (SDFs) — entities handling large volumes of sensitive data — may face mandatory data localization within India. CERT-In requires enabling logs and retaining them for 180 days within India.

The DPDP Rules mandate that organisations audit how personal data enters, moves through, and exits their systems. They must document what is stored in India versus outside India and justify the rationale. Cloud and hosting agreements must support data residency needs, breach reporting timelines (the notification window starts from awareness, not occurrence), audit rights, and sub-processor transparency.

GDPR in Europe requires adequacy decisions or Standard Contractual Clauses for data transfers. Post-Schrems II, routing data through active conflict zones raises questions that no existing compliance framework anticipated.

RBI data localization mandates that payment system data must be stored exclusively in India — no exceptions, no routing through intermediary regions.

The convergence of these regulations with physical conflict creates a new compliance category that didn’t exist before: geopolitical data risk.

The Tech Stack

Data sovereignty starts with visibility. You cannot comply with data residency requirements if you don’t know where your data actually lives.

Data classification and discovery tools like BigID, OneTrust, and Securiti.ai scan your entire cloud footprint to discover where personal data resides, how it moves across regions, and which jurisdictions apply. This isn’t a one-time audit — it needs to be continuous, because cloud providers can change routing and replication behaviour.

Cloud-native policy enforcement is the next layer. AWS Config Rules can enforce region restrictions — flagging or preventing resource creation outside approved regions. Azure Policy and GCP Organization Policy Constraints offer equivalent capabilities. These are your guardrails: even if a developer accidentally spins up a resource in the wrong region, the policy blocks it.

Sovereign cloud providers are emerging as alternatives to hyperscaler regions in sensitive geographies. India-specific options include Jio Cloud, ESDS, and BharathCloud — providers offering India-based hosting with DPDP-aligned compliance features. Globally, IBM launched Sovereign Core in January 2026, and Microsoft’s Azure Local (formerly Azure Stack HCI) enables running Azure workloads on on-premises hardware, keeping data within your physical control.

Encryption and key management close the loop. AWS KMS, Azure Key Vault, and HashiCorp Vault enable envelope encryption where you control the keys. The critical requirement: keys must reside in the same jurisdiction as the data. A data centre in India with encryption keys stored in Virginia isn’t truly sovereign.

For compliance audit trails, ensure your provider offers ISO 27001 and SOC 2 certifications, and that your contracts explicitly address breach notification timelines, sub-processor governance, and data deletion procedures.

What You Should Do Now

Run a data residency audit immediately. Tools like AWS Config or third-party platforms can show you exactly which regions your data touches. You may be surprised.

Review your cloud provider contracts for war exclusion clauses in insurance and SLAs. If your workloads ran through a conflict zone, understand your legal exposure.

Implement geo-fencing at the infrastructure level — not just at the policy level. AWS Service Control Policies (SCPs) can hard-block API calls from creating resources in specific regions.

For businesses serving Indian customers, 2026 is the build-and-test year for DPDP compliance. Don’t wait for Phase 3 enforcement in May 2027.

At Innovatrix, we serve clients across India, UAE, UK, Singapore, and Australia — each with distinct data residency requirements. Our infrastructure setups are compliance-aware from the first architecture decision, whether that’s choosing an AWS region, configuring DNS, or setting up backup replication targets.

3. Edge Computing and Decentralised Infrastructure Specialist

The Gap Exposed

The fundamental flaw that the AWS attacks exposed isn’t just about multi-AZ or multi-region. It’s about the centralised model itself.

When Iran struck ME-CENTRAL-1, every service that depended on that region — banking apps, payment gateways, ride-hailing platforms, enterprise SaaS — went down in a cascading failure. The “cloud” was a single geographic location, and when that location was destroyed, the digital economy of an entire region collapsed.

This is the centralisation paradox: the cloud promised abstraction from physical infrastructure, but it actually concentrated risk into fewer, larger targets. A single data centre campus can host thousands of businesses. Destroy the campus, and you destroy them all simultaneously.

The numbers make the case for decentralisation. Gartner projects that 75% of enterprise data will be created and processed at the edge by 2026 — up from just 10% in 2018. Global IoT connections are projected to exceed 30 billion by 2026. And Cisco reports that AI agentic queries generate up to 25 times more network traffic than traditional chatbot queries — load that centralised architectures were never designed to handle.

The Architecture Shift

Edge computing doesn’t replace the cloud. It redistributes it. The model is layered:

Central cloud handles large-scale training, batch analytics, cold storage, and workloads where latency doesn’t matter. This is still AWS, GCP, or Azure — but it’s no longer the only tier.

Regional edge handles real-time inference, hot data, event processing, and latency-sensitive operations. These are smaller compute nodes distributed across metro areas, telecom exchanges, or customer premises.

Device edge handles on-device processing, sensor data pre-filtering, and offline-capable operations. This is where IoT, embedded systems, and mobile devices process data locally without any cloud dependency.

The resilience benefit is structural: there’s no single point of failure. If a regional edge node goes down, others absorb the load. If the central cloud is unreachable, edge nodes continue operating independently.

The Tech Stack

For web workloads, the easiest entry point into edge computing is Cloudflare Workers — serverless functions that run at over 300 edge locations globally. Your code executes at the edge location nearest to the user, with no central server dependency. Vercel Edge Functions and Deno Deploy offer similar capabilities, particularly useful for Next.js applications.

For AWS-native architectures, AWS Local Zones bring AWS infrastructure into metro areas (compute, storage, database services closer to end users), while AWS Outposts let you run AWS services on your own on-premises hardware. Azure IoT Edge and Google Distributed Cloud offer equivalent capabilities.

For edge AI inference, NVIDIA Jetson is the leading embedded AI platform — it can run computer vision, NLP, and sensor fusion models on device-grade hardware without cloud connectivity. ONNX Runtime enables cross-platform model deployment (train on any framework, deploy anywhere), and TensorRT optimises models for NVIDIA hardware specifically.

For container orchestration at the edge, K3s is a lightweight Kubernetes distribution designed for resource-constrained environments — it runs the same workloads as full Kubernetes but with a fraction of the memory and CPU footprint. Rafay provides multi-cluster Kubernetes management across edge and cloud environments from a single control plane.

For distributed databases, CockroachDB and YugabyteDB provide globally distributed SQL with automatic replication across regions and edge locations. They use consensus protocols that handle network partitions gracefully — exactly what you need when edge nodes have intermittent connectivity.

CDNs — Cloudflare, Fastly, AWS CloudFront — are the simplest form of edge infrastructure that most businesses already use. But post-attacks, think of your CDN not just as a performance layer but as an availability insurance policy. If your origin server goes down, a properly configured CDN can continue serving cached content.

What You Should Do Now

Identify your latency-critical and availability-critical workloads. These are your edge candidates. If a 200ms delay or a 5-minute outage costs you revenue or user trust, that workload should be at the edge.

Start with Cloudflare Workers or Vercel Edge for web workloads — lowest barrier to entry, no infrastructure to manage, and you get global distribution immediately.

For AI/ML workloads, evaluate whether inference can run at the edge. Smaller models — quantised to 4-bit or 8-bit precision — can run on surprisingly modest hardware. If you’re calling an API for every AI inference, you have a centralisation dependency.

Design for offline-first where possible. Edge nodes should degrade gracefully, not fail completely. If the user’s connection to your central cloud drops, what still works? That’s your resilience baseline.

At Innovatrix, our Next.js deployments leverage edge functions for critical paths, and our Cloudflare experience — from R2 storage to Workers — means we build distributed resilience into web applications by default, not as an afterthought.

4. Cloud Security and Cyber Warfare Specialist

The Gap Exposed

The AWS attacks were kinetic — physical drones hitting physical buildings. But they exist within a broader context of coordinated physical and cyber warfare.

Iran’s IRGC didn’t just bomb data centres. They named 18 US tech companies as military targets, signalling coordinated campaigns that combine physical strikes with cyber operations. This is hybrid warfare, and it creates a threat model that most businesses have never planned for.

The collateral damage problem is severe: your business doesn’t need to be a target. You just need to be ON the target’s infrastructure. When Iran struck AWS to disrupt US military AI operations running on the same cloud, every commercial customer in that region was collateral damage.

Meanwhile, 17 submarine cables pass through the Red Sea, carrying the majority of data traffic between Europe, Asia, and Africa. With Iran’s closure of the Strait of Hormuz and renewed Houthi threats in the Red Sea, both critical data chokepoints are now in active conflict zones simultaneously. As one network intelligence expert noted, both chokepoints being in conflict zones at the same time is unprecedented — there’s no historical parallel for the potential disruption.

And the threat surface keeps expanding. A 2025 Fortinet survey found that 62% of organisations consider securing edge environments more complex than protecting centralised data centres. Every edge node, every IoT device, every distributed compute instance is a potential attack surface.

The Security Architecture

Post-attacks, your security posture needs to evolve from “protect the perimeter” to “assume everything is compromised.”

Zero Trust Architecture is the foundational shift. The principle is simple: never trust, always verify. Every request — whether from inside or outside your network — must be authenticated, authorised, and encrypted. Google’s BeyondCorp model pioneered this. Practical implementations include Cloudflare Zero Trust (ZTNA — Zero Trust Network Access), Azure AD Conditional Access, and Tailscale (a WireGuard-based mesh VPN that creates encrypted point-to-point connections without exposing public endpoints).

WAF and DDoS protection is your outer shield. Cloudflare WAF, AWS Shield Advanced, and Azure DDoS Protection filter malicious traffic before it reaches your infrastructure. In a cyber warfare scenario, volumetric DDoS attacks are often the opening salvo — designed to overwhelm defences before targeted exploitation.

SIEM and continuous monitoring give you visibility. CrowdStrike Falcon provides endpoint detection and response. Wiz offers cloud-native security posture management — it maps your entire cloud footprint and identifies misconfigurations, exposed secrets, and lateral movement paths. AWS GuardDuty provides threat detection using machine learning to identify anomalous API calls and potentially compromised instances.

Secrets management ensures that API keys, database credentials, and encryption keys aren’t hardcoded or exposed. HashiCorp Vault, AWS Secrets Manager, and Doppler provide centralised, audited, rotatable secret storage.

DNS security is often overlooked but critical. Implement DNSSEC to prevent DNS spoofing, DNS-over-HTTPS to prevent eavesdropping, and ensure your SPF, DKIM, and DMARC records are properly configured to prevent email-based attacks that often precede infrastructure compromises.

Immutable backups are your last line of defence against both ransomware and physical destruction. WORM (Write Once Read Many) storage — available through AWS S3 Object Lock, Azure Immutable Blob Storage, or dedicated solutions like Veeam with immutability — ensures that backups cannot be encrypted, deleted, or modified by attackers. In a scenario where your primary infrastructure is physically destroyed and your backups are in a different region, immutable backups are what let you recover.

Incident Response and Chaos Engineering

Having security tools isn’t enough. You need documented runbooks and regular testing.

An incident response playbook answers: who does what when your primary region goes dark? Who is notified first? What’s the communication chain? Which workloads are restored first? How do you communicate with customers during the outage?

Chaos engineering tests your resilience before a real incident does. AWS Fault Injection Simulator and Gremlin let you simulate regional failures, network partitions, and service degradations in a controlled way. If your DR plan only exists on paper, the first time you test it shouldn’t be during an actual war.

What You Should Do Now

Implement zero trust today. Start with Tailscale or Cloudflare Zero Trust — both can be deployed in hours, not weeks.

Run a security audit against your cloud infrastructure. ScoutSuite (open-source, multi-cloud) or AWS Inspector can identify misconfigurations, open ports, and policy violations in minutes.

Harden your DNS. If you’ve done SPF, DKIM, and DMARC remediation, you’re ahead of most — but verify it’s current. DNS is often the first target in state-sponsored attacks.

Create an incident response playbook. Document it. Assign roles. Then drill it quarterly.

Enable immutable backups in a geographically isolated region. If your primary and backup are both in the same conflict zone, you have no backup.

At Innovatrix, we’ve done comprehensive DNS audits and SPF/DKIM/DMARC remediation across client domains, deploy infrastructure behind Tailscale-secured networks, and build security hardening into every deployment — because in this threat landscape, security isn’t a feature, it’s the foundation.

5. AI Infrastructure Relocation Engineer

The Gap Exposed

This is perhaps the most consequential role to emerge from the attacks — because it sits at the intersection of AI, cloud infrastructure, and geopolitics.

Here’s what happened: the US military was using Anthropic’s Claude AI model — hosted on AWS infrastructure — for intelligence analysis, target identification, and battle simulations during the Iran strikes. Iran’s stated rationale for attacking AWS data centres was precisely this: the infrastructure was supporting enemy military AI operations.

This means that commercial AI infrastructure is now a military target by association. If your AI workloads — your inference pipelines, your vector databases, your training jobs — share infrastructure with military AI, you are in the blast radius. Not metaphorically. Literally.

The deeper problem is that AI compute cannot be arbitrarily relocated. Unlike a web application that can be containerised and moved to a new region in hours, AI workloads are constrained by power availability, cooling infrastructure, GPU availability, network latency for distributed training, and the sheer volume of training data that needs to move with the compute.

As one research paper on AI infrastructure sovereignty noted: sovereignty strategies that focus solely on data localisation or model ownership risk becoming symbolic rather than effective. Without continuous visibility into infrastructure state and the ability to act on it in real time, operators lack practical control over AI systems.

The Sovereign AI Shift

The response to these attacks is accelerating a global trend: sovereign AI infrastructure.

Global spending on sovereign AI systems is projected to surpass $100 billion by 2026. Microsoft committed $10 billion to Japan AI infrastructure between 2026 and 2029 — a direct response to sovereign compute requirements forcing hyperscalers to partner with regional infrastructure players rather than deploying centralised data centres. The market noticed: Sakura Internet, a Japanese regional cloud provider, surged 20% on the announcement.

France has invested €109 billion in sovereign AI infrastructure, including a partnership with Fluidstack to build one of the world’s largest decarbonised AI supercomputers. India is accelerating through the IndiaAI mission and sovereign cloud mandates.

Forrester predicts 2026 is the year governments adopt “tech nationalism” — domestic-first AI procurement policies. And IDC forecasts that by 2028, 60% of organisations with digital sovereignty requirements will have migrated sensitive workloads to new cloud environments.

The writing is on the wall: AI infrastructure is becoming as geopolitically strategic as oil infrastructure. And just like oil, countries and businesses that don’t control their own supply are vulnerable.

The Tech Stack

Model portability is the first priority. If your AI models are locked into one provider’s format and one provider’s serving infrastructure, you can’t relocate them. ONNX (Open Neural Network Exchange) provides a standard format for model interoperability — train in PyTorch, export to ONNX, deploy anywhere. MLflow handles experiment tracking and model registry — versioning your models so you know exactly which model is running where and can reproduce it. Kubeflow provides Kubernetes-native ML pipelines for training and serving.

Self-hosted inference eliminates provider dependency entirely. vLLM is a high-throughput, memory-efficient inference engine for large language models — it can serve models on your own GPU hardware (cloud or on-premises) with performance rivalling managed API services. Ollama simplifies local LLM deployment for development and testing. llama.cpp enables CPU-based inference for smaller models.

GPU cloud alternatives beyond the hyperscalers provide options when AWS or Azure regions are compromised. Lambda Labs, CoreWeave, RunPod, and Paperspace offer GPU compute without the hyperscaler dependency. For India-specific sovereign GPU infrastructure, providers like BharathCloud are emerging with DPDP-aligned offerings.

Vector databases need to be portable too. If your RAG (Retrieval-Augmented Generation) pipeline depends on a managed vector database in a specific region, you need alternatives. pgvector — a PostgreSQL extension — is the most portable option: it runs anywhere PostgreSQL runs, which means you can deploy it on any cloud, any region, or on-premises. Qdrant, Milvus, and Weaviate are dedicated vector databases with self-hosted deployment options.

Model optimisation for relocation makes smaller, faster models that are easier to move and cheaper to serve. Quantisation (4-bit and 8-bit) reduces model size by 4-8x with minimal accuracy loss. Distillation trains smaller “student” models from larger “teacher” models. Pruning removes unnecessary weights. The result: models that can run on edge hardware, on modest cloud instances, or on-premises GPUs — dramatically increasing your deployment options.

The Hybrid AI Architecture

The winning pattern isn’t all-cloud or all-edge. It’s a hybrid:

Training stays in the cloud — where GPU clusters, large storage, and high-bandwidth interconnects are available. But training should happen in stable, geographically safe regions.

Inference moves to the edge or on-premises — where latency requirements, data sovereignty laws, or security concerns dictate. Quantised models served by vLLM on your own infrastructure give you full control.

Model synchronisation uses CI/CD pipelines to push updated models from training environments to inference endpoints. This is the same pattern as software deployment — just with model artifacts instead of code.

What You Should Do Now

Audit where your AI workloads physically run. Which region? Which provider? Which data centre? If you’re using a managed API (like calling an LLM provider), find out where they host their inference infrastructure.

Ensure model portability. Export your models to ONNX format. Use MLflow for versioning. If you can’t reproduce your model deployment from scratch in a new region within 24 hours, you have a portability problem.

For inference workloads, evaluate self-hosted options. vLLM on your own EC2 instance (in a stable region) gives you the same serving capability as a managed API, with full control over location and security.

Have a relocation playbook. If your AI provider’s region goes dark, can you serve models from elsewhere within 24 hours? Document the steps, test them, and keep them current.

Consider pgvector on India-hosted PostgreSQL for vector search workloads. It’s sovereign by default — your embeddings live on your infrastructure, in your jurisdiction, under your control.

At Innovatrix, we run AI automation pipelines on self-hosted n8n infrastructure with Anthropic API integrations, and our architecture for Pensiv — our cognitive continuity SaaS — uses pgvector on PostgreSQL precisely because portability and data sovereignty are non-negotiable for an AI-native product. We don’t just recommend sovereign AI infrastructure; we build on it.

What Happens Next

The AWS data centre attacks mark a permanent shift in how the world thinks about cloud infrastructure. The cloud was always physical. Now it’s geopolitical.

Here’s what’s coming:

Multi-cloud becomes the default, not the exception. Gartner’s projection of 75% multi-cloud adoption by 2026 was made before the attacks. Expect that number to accelerate. Single-provider architectures will be seen as reckless, not efficient.

Sovereign AI infrastructure becomes a national priority. India, Japan, France, and the EU are already investing billions. Expect every major economy to follow. Businesses that depend on foreign-hosted AI will face regulatory and competitive disadvantages.

Data centres get physical security upgrades. Air defence systems, reinforced construction, underground facilities — what was once the domain of military bunkers is becoming the standard for commercial data centres. The cost of cloud services will rise accordingly.

Edge computing accelerates from “nice-to-have” to “survival requirement.” The businesses that weather the next infrastructure attack will be the ones that don’t depend on a single geographic cluster.

Insurance and contracts get rewritten. War exclusion clauses, force majeure definitions, and SLA terms will all evolve to account for kinetic attacks on cloud infrastructure. If your contracts don’t address this, they’re already outdated.

You don’t need to hire five specialists tomorrow. But you need to start thinking in terms of these capabilities — resilience, sovereignty, distribution, security, and portability. The businesses that come out of this era strongest will be the ones who treated infrastructure as a strategic asset, not a commodity.

At Innovatrix Infotech, we help businesses build infrastructure that’s resilient by design — from multi-region cloud setups to self-hosted AI pipelines to compliance-aware deployments across India, UAE, UK, and beyond. If you’re unsure where your infrastructure stands, let’s talk.

Originally published at Innovatrix Infotech

I Built 23 Security Tools That AI Agents Can Use

I wanted a single interface where an AI agent could run WHOIS, pull SSL certs, enumerate subdomains, check CVEs, and query threat intel feeds — all from one prompt.

So I built 23 security tools as an MCP server. Any AI agent that speaks MCP can call them natively.

Here’s what I built, how to set it up, and what I learned.

Setup (2 minutes)

Let me start with the setup because it’s the simplest part.

Add this to your MCP client config:

{
  "mcpServers": {
    "contrast": {
      "command": "npx",
      "args": ["-y", "@anthropic-ai/mcp-remote", "https://api.contrastcyber.com/mcp/"]
    }
  }
}

Works with Claude Desktop, Cursor, Windsurf, Cline, VS Code — anything that speaks MCP.

No API key. No signup. 100 requests/hour free.

The 23 Tools

Recon — “What’s running on this domain?”

Tool What it does
domain_report Full security report — DNS, WHOIS, SSL, subdomains, risk score
dns_lookup A, AAAA, MX, NS, TXT, CNAME, SOA records
whois_lookup Registrar, creation date, expiry, nameservers
ssl_check Certificate chain, cipher suite, expiry, grade (A-F)
subdomain_enum Brute-force + Certificate Transparency logs
tech_fingerprint CMS, frameworks, CDN, analytics, server stack
scan_headers Live HTTP security headers — CSP, HSTS, X-Frame-Options
email_mx Mail provider, SPF/DMARC/DKIM validation
ip_lookup PTR, open ports, hostnames, reputation
asn_lookup AS number, holder, IP prefixes

Real scenario: “Check if any of our subdomains have expiring SSL certs” — the agent calls subdomain_enum, loops through each result with ssl_check, and reports which ones expire within 30 days. Zero code.

Vulnerability — “Is this CVE exploitable?”

Tool What it does
cve_lookup CVE details, CVSS, EPSS score, KEV status
cve_search Search by product, severity, or date range
exploit_lookup Public exploits from GitHub Advisory + ExploitDB

Real scenario: “Find all critical CVEs for Apache httpd from the last 6 months that have public exploits” — one sentence, three tool calls chained automatically.

Threat Intelligence — “Is this IOC malicious?”

Tool What it does
ioc_lookup Auto-detect IP/domain/URL/hash → ThreatFox + URLhaus
hash_lookup Malware hash reputation via MalwareBazaar
phishing_check Known phishing/malware URL check
password_check Breach check via HIBP (k-anonymity, password never sent)
email_disposable Disposable/temporary email detection

Real scenario: You get a suspicious URL in Slack. Paste it and ask “is this safe?” — the agent runs phishing_check + ioc_lookup and tells you if it’s a known threat.

Code Security — “Does my code have vulnerabilities?”

Tool What it does
check_secrets Detect hardcoded AWS keys, tokens, passwords in source
check_injection SQL injection, command injection, path traversal
check_headers Validate security header configuration

Real scenario: Before a PR merge, ask your agent to scan the diff for hardcoded secrets and injection vulnerabilities.

Phone & Email — “Is this contact legit?”

Tool What it does
phone_lookup Validation, country, carrier, line type

What It Looks Like

“Run a full security audit on example.com”

Domain: example.com
Risk Score: 32/100 (Low)

DNS: 6 records found
SSL: Grade A, expires 2027-01-15, TLS 1.3
Headers: 4/7 present (missing CSP, HSTS preload, Permissions-Policy)
Subdomains: 3 found
WHOIS: Registered 1995-08-14, ICANN
Tech: Akamai CDN, nginx

“Check if CVE-2024-3094 has public exploits”

CVE-2024-3094 (xz backdoor)
CVSS: 10.0 CRITICAL
EPSS: 0.947 (top 0.1%)
KEV: Yes — actively exploited
Exploits found: 3

“Is this password breached: hunter2”

EXPOSED in 17,043 breaches
Do NOT use this password.
(checked via k-anonymity — password was never transmitted)

Why MCP?

ContrastAPI is also a REST API with a Node.js SDK. You can curl it from any language.

But MCP changes the workflow:

Without MCP: Call endpoint → parse JSON → decide next step → call another endpoint → parse again → format output.

With MCP: “Audit this domain.” Done.

The agent picks the right tools, chains them, and gives you a summary. You focus on decisions, not plumbing.

Architecture

  • FastAPI + official MCP Python SDK
  • 30 REST endpoints, 23 MCP tools (same backend)
  • 1,115 tests (912 API + 203 C scanner)
  • Domain scanner written in C — scores SSL, DNS, headers, email in under 2 seconds
  • All data from free, public sources — no paid feeds, no vendor lock-in

What I Learned

1. No API key = fastest adoption.
I removed the API key requirement and traffic jumped immediately. Zero friction wins. The free tier (100 req/hr) is generous enough that nobody has hit the limit yet.

2. MCP users are stickier.
MCP users make more requests per session than REST users. Once an agent has access to the tools, it chains them naturally — a single prompt can trigger 5-10 tool calls.

3. Get listed everywhere, early.
mcp.so, mcpservers.org, Smithery — these directories drive most of the discovery right now. The ecosystem is early and low-competition.

Limitations

Being transparent about what this isn’t:

  • Passive only — no port scanning, no active exploitation. This is OSINT and public data, not a pentest tool.
  • Rate limited — 100 req/hr free, 1000/hr on Pro ($19/mo). Enough for individual use, not bulk scanning.
  • Solo project — I’m one developer. Response times are fast, but I don’t have an SRE team on-call.
  • You don’t need API keys — we handle the integrations (Shodan, AbuseIPDB, ThreatFox, NVD, and more). No vendor accounts to set up on your end.

Try It

  • GitHub: github.com/UPinar/contrastapi
  • MCP setup: contrastcyber.com/mcp-setup
  • Web scanner: contrastcyber.com
  • API docs: api.contrastcyber.com

Free. Open source. No API key.

If you find it useful, a ⭐ on GitHub helps more than you think.

What security tools do you wish your AI agent could use? I’m always looking for what to build next.

Java Annotated Monthly – April 2026

It’s safe to say March was defined by one thing: Java 26. In this issue of Java Annotated Monthly, we’ve curated a rich selection of articles to help you get the full picture of the release. Marit van Dijk joins us as the featured guest author, bringing her expertise to help you navigate the changes with confidence. Alongside our Java 26 coverage, you’ll find our regular roundup of AI developments, Spring updates, Kotlin news, industry trends, and community reads that caught our eye.

Featured Content

Marit van Dijk

Marit van Dijk is a Java Champion and Developer Advocate at JetBrains with over 20 years of software development experience. She’s passionate about building great software with great people, and making developers’ lives easier.

Marit regularly presents at international conferences and shares her expertise through webinars, podcasts, blog posts, videos, and tutorials. She’s also a contributor to the book 97 Things Every Java Programmer Should Know (O’Reilly Media).

March held a lot of interesting things for Java. First of all, there was the Java 26 release on March 17. You can read all about Java 26 in IntelliJ IDEA on the blog, and find more links on Java 26 in the Java sections below.

Also in March, JavaOne took place in Redwood Shores, USA. During the community keynote, our colleague Anton Arhipov talked about 25 years of IntelliJ IDEA. In case you missed it, we also did a Duke’s Corner podcast and a Foojay podcast on the same topic. And of course, the IntelliJ IDEA documentary was released this month. Also at JavaOne, we announced that Koog is coming to Java, if you want to try JetBrains’ Koog AI agent with Java instead of Kotlin.

IntelliJ IDEA 2026.1 was just released. Of course we have Java 26 support from day one, as well as improvements to the debugger for virtual threads, support for new Kotlin features, Spring Data and Spring Debugger features, new AI features, and more. You can read all about it on the blog or watch our release video.

The release of Java 26 also means that Piotr Przybył and I updated our talk, Learning modern Java the playful way, for Java 26. You can watch the recording from Voxxed Days Amsterdam, or catch us at multiple events around Europe. 

Java News

Check out all the Java news highlights in March: 

  • Java News Roundup 1, 2, 3, 4, 5
  • Java 26: What’s New?
  • HTTP Client Updates in Java 26
  • Java Performance Update: From JDK 21 to JDK 25
  • Quality Outreach Heads-up – JDK 27: Removal of ‘java.locale.useOldISOCodes’ System Property
  • Episode 51 “Unboxing Java 26 for Developers” 
  • Java 27 – Better Language, Better APIs, Better Runtime
  • Foojay Podcast #92: Java 26 Is Here: What’s New, What’s Gone, and Why It Matters in 2026
  • Java 26 in definitely UNDER 3 minutes
  • JDK 26 Security Enhancements

Java Tutorials and Tips

You can never have too many tips for getting more out of Java:

  • Java 26 for DevOps
  • Java 26 Is Here, And With It a Solid Foundation for the Future
  • Closed-world assumption in Java
  • JavaScript (No, Not That One): Modern Automation with Java
  • Redacting Sensitive Data from Java Flight Recorder Files
  • Foojay Podcast #91: 25 Years of IntelliJ IDEA: The IDE That Grew Up With Java
  • Vulnerable API usage: Is your Java code vulnerable?
  • Java 26 is boring, and that’s a good thing
  • Episode 49 “LazyConstants in JDK 26” 
  • Empty Should be Empty
  • Testing Elasticsearch. It just got simpler
  • A Bootiful Podcast: Cay Horstmann, legendary Java professor, author, lecturer 
  • Episode 50 “Towards Better Checked Exceptions” 
  • How is Leyden improving Java Performance? 1, 2, 3
  • Java Is Fast. Your Code Might Not Be.
  • Data Oriented Programming, Beyond Records 
  • Evolving the Java Language: An Inside Perspective
  • Hybrid search with Java: LangChain4j Elasticsearch integration
  • Secure Coding Guidelines for Java
  • Estimating value of pi (π) using Monte Carlo Simulation and Vector API
  • Javable: generate Java-friendly wrappers for Kotlin with KSP

Kotlin Corner

Stay sharp with the latest Kotlin news and practical tips:

  • Kotlin 2.3.20 Released 
  • Amper 0.10 – JDK Provisioning, a Maven Converter, Custom Compiler Plugins, and More 
  • The klibs.io source repository was made public.
  • Building a Deep Research Agent with Koog — Teaching Your Agent to Think in Phases 
  • Koog Comes to Java: The Enterprise AI Agent Framework From JetBrains
  • Introducing Tracy: The AI Observability Library for Kotlin 
  • KotlinConf’26 Speakers: In Conversation with Josh Long 

AI 

Plenty of AI reads this month. Pick what catches your eye:

  • Intelligent JVM Monitoring: Combining JDK Flight Recorder with AI
  • AI coding skills from the engineers who build the JVM ecosystem
  • Vibe Coding, But Production-Ready: A Specs-Driven Feedback Loop for AI-Assisted Development
  • Busting AI Myths and Embracing Realities in Privacy & Security
  • Shaping Jakarta Agentic AI Together – Watch the Open Conversation
  • how i automated my life with mcp servers
  • 10 things i hate about ai
  • Writing an agent skill 
  • Hacking AI – How to Survive the AI Uprising
  • Stop Fighting Your AI: Engineering Prompts That Actually Work
  • Four Patterns of AI Native Development
  • Interactive Rubber Ducking with GenAI 
  • The Oil and Water Moment in AI Architecture
  • Look Inside a Large Language Model to Become a Better Java Developer
  • A Senior Engineer Tries Vibe Coding
  • How We Built a Java AI Agent by Connecting the Dots the Ecosystem Already Had 

Languages, Frameworks, Libraries, and Technologies

Spring updates and more tech news, all in one place:

  • This Week in Spring 1, 2, 3, 4
  • Data Enrichment in MongoDB
  • Supercharge your JVM performance with Project Leyden and Spring Boot by Moritz Halbritter
  • A Typo Led to the Creation of Spring Cloud Contract • Marcin Grzejszczak & Jakub Pilimon • GOTO 2026
  • A Bootiful Podcast: Neo4j legend Jennifer Reif
  • A Bootiful Podcast: Spring Messaging Legend Soby Chacko
  • Blending Chat with Rich UIs with Spring AI and MCP Apps
  • Java Microservices(SCS) vs. Spring Modulith
  • Moving beyond Strings in Spring Data
  • Quarkus has great performance – and we have new evidence
  • Modeling One-to-Many Relationships in Java with MongoDB
  • Clean Architecture with Spring Boot and MongoDB

Conferences and Events

Pick your next events to attend:

  • Spring I/O – Barcelona, Spain, April 13–15; Come say hi at the JetBrains booth and join the community run! 
  • Java Day Istanbul – Istanbul, Türkiye, April 17–18; Anton Arhipov is a speaker.  
  • JCON EUROPE – Cologne, Germany, April 20–23; Marit van Dijk will talk about learning modern Java the playful way.
  • Great International Developer Summit – Bengaluru, India, April 21–24; Join Siva Katamreddy’s talk on Spring AI + MCP. 
  • Devoxx France – Paris, France, April 22–24; Check out the talks by Anton Arhipov and Marit van Dijk.  
  • Devoxx Greece – Athens, Greece, April 23–25; Marit van Dijk is a speaker. 
  • Voxxed Days Bucharest – Bucharest, Romania, April 28–29; And if you haven’t caught Marit van Dijk during this busy month of hers, here’s the last chance to hear her speak in April.

Culture and Community

Your go-to section to slow down and think about the industry, self-growth, and more:

  • Mindful Leadership in the Age of AI
  • Can we still make software that sparks joy?
  • Information Flow: The Hidden Driver of Engineering Culture
  • Beyond the Code: Hiring for Cultural Alignment
  • Build a Spaced Repetition Flashcard API with Spring Boot & MongoDB (Part 1)
  • Where Do Humans Fit in AI-Assisted Software Development?
  • Green IT: How to Reduce the Impact of AI on the Environment
  • Does Language Still Matter in the Age of AI? Yes — But the Tradeoff Has Changed
  • IntelliJ IDEA: The Documentary | An origin story 
  • The Software Architect Elevator 

And Finally…

Top picks from the IntelliJ IDEA blog:

  • What’s fixed in IntelliJ IDEA 2026.1
  • Java 26 in IntelliJ IDEA
  • IntelliJ IDEA’s New Kotlin Coroutine Inspections, Explained
  • Cursor Joined the ACP Registry and Is Now Live in Your JetBrains IDE
  • Sunsetting Code With Me
  • Koog Comes to Java: The Enterprise AI Agent Framework From JetBrains
  • AI-Assisted Java Application Development with Agent Skills
  • Core JavaScript and TypeScript Features Become Free in IntelliJ IDEA

That’s it for today! We’re always collecting ideas for the next Java Annotated Monthly – send us your suggestions via email or X by April 20. Don’t forget to check out our archive of past JAM issues for any articles you might have missed!