I’ve been building an Android app that overlays aircraft labels on a live camera feed when you point your phone at the sky. You fetch a plane’s position from an ADS-B API, you have your own GPS location, and the goal is to draw a label at the correct pixel on screen.
The problem sounds straightforward until you try to implement it. There are four distinct coordinate spaces between the aircraft’s GPS position and that pixel. Wrong output, no exception, no warning. The math compiles fine either way.
I tried to look for something similar but I couldn’t really find any proper resource that gets it all done, they all work with either one of those coordinate spaces, not integrate them all together. So here is all that I had implemented to get it all working, for you guys!
This post walks through the entire pipeline from first principles: every formula, every sign rule, and a full numerical walkthrough using real captured values so you can verify your own implementation against known results.
The Pipeline at a Glance

The full transformation chain looks like this:
Geodetic (lat, lon, alt)
↓ Stage 1 — flat-earth approximation
ENU — East, North, Up (metres)
↓ Stage 2 — R⊤ from the Android rotation vector sensor
Device frame (dX, dY, dZ)
↓ Stage 3 — one sign flip
Camera frame (Cx, Cy, Cz)
↓ Stage 4 — perspective divide + FOV normalisation
Screen pixels (xpx, ypx)
Each stage is a separate coordinate system with its own axis convention. Confusing any two of them produces a result that looks plausible, points in roughly the right direction, and is still wrong.
Stage 1: Geodetic to ENU
What ENU is
ENU stands for East-North-Up. It is a local Cartesian frame centred on your position. The aircraft’s ENU vector is its displacement from you in metres along three orthogonal axes: East, North, and Up. The user is always at the origin.
This frame is called a local tangent plane because it approximates Earth’s surface as flat in the immediate vicinity of the observer. That approximation is valid for distances well under 100 km, which covers the entire ADS-B reception range for any ground station or aircraft receiver.
The flat-earth conversion
Given:
- User position:
(ϕu,λu,hu)(phi_u, lambda_u, h_u)(ϕu,λu,hu)
- Aircraft position:
(ϕa,λa,ha)(phi_a, lambda_a, h_a)(ϕa,λa,ha)
-
RE=6,371,000 mR_E = 6{,}371{,}000 text{ m}RE=6,371,000 m
(mean Earth radius)
- Altitude from the API in feet, converted:
hm=hft×0.3048h_m = h_{ft} times 0.3048hm=hft×0.3048
The ENU components are:
E=(λa−λu)×πRE180×cos!(πϕu180)
E = (lambda_a – lambda_u) times frac{pi R_E}{180} times cos!left(frac{pi phi_u}{180}right)
E=(λa−λu)×180πRE×cos!(180πϕu)
N=(ϕa−ϕu)×πRE180
N = (phi_a – phi_u) times frac{pi R_E}{180}
N=(ϕa−ϕu)×180πRE
U=ha−hu
U = h_a – h_u
U=ha−hu
Why the cosine appears in E but not N
This is the single most common implementation mistake. The
cos(ϕu)cos(phi_u)cos(ϕu)
term exists because meridians (lines of constant longitude) are not parallel — they converge toward the poles.


At the equator, one degree of longitude spans:
πRE180≈111,195 m/deg
frac{pi R_E}{180} approx 111{,}195 text{ m/deg}
180πRE≈111,195 m/deg
At latitude
ϕphiϕ
, the same one degree of longitude spans:
πRE180cosϕ m/deg
frac{pi R_E}{180} cosphi ;text{ m/deg}
180πREcosϕ m/deg
Lines of latitude, by contrast, are parallel. One degree of latitude spans the same arc length regardless of where you are. That is why the
NNN
component has no cosine correction.
At
ϕu=24.86°phi_u = 24.86°ϕu=24.86°
N (the reference location used in the walkthrough below),
cos(24.86°)≈0.907cos(24.86°) approx 0.907cos(24.86°)≈0.907
, so a one-degree longitude difference corresponds to about 9.3% fewer metres than the same latitude difference. An implementation that omits this factor will produce East-West positions that are correct at the equator and increasingly wrong as latitude increases, with no visible error at low latitudes to alert you.

Derived quantities
Once you have
(E,N,U)(E, N, U)(E,N,U)
, bearing, elevation, and range follow directly:
dhoriz=E2+N2
d_{horiz} = sqrt{E^2 + N^2}
dhoriz=E2+N2
d3D=E2+N2+U2
d_{3D} = sqrt{E^2 + N^2 + U^2}
d3D=E2+N2+U2
β=arctan2(E, N)×180π(mod360)
beta = arctan2(E,, N) times frac{180}{pi} pmod{360}
β=arctan2(E,N)×π180(mod360)
ε=arctan2(U, dhoriz)×180π
varepsilon = arctan2(U,, d_{horiz}) times frac{180}{pi}
ε=arctan2(U,dhoriz)×π180
Note the argument order in
arctan2(E,N)arctan2(E, N)arctan2(E,N)
for bearing: North is the reference direction, so it goes in the second position. Swapping these gives bearings rotated 90 degrees.
These values are not consumed by the projection math directly, but they are invaluable for debugging. If the projected screen position is wrong, comparing computed bearing and elevation against a known map position immediately tells you whether the error is in Stage 1 or in a later stage.
Accuracy
At 50 nautical miles (approximately 93 km, the practical ADS-B range limit), the relative error is around 0.11%, corresponding to roughly 100 metres of positional error. At the projection level, this is under 2 pixels on a 1080-wide screen with a 66-degree horizontal FOV. The approximation is entirely adequate for this application.
For distances above 100 km, the Haversine formula gives exact great-circle distance:
a=sin2!Δϕ2+cosϕucosϕasin2!Δλ2
a = sin^2!frac{Deltaphi}{2} + cosphi_u cosphi_a sin^2!frac{Deltalambda}{2}
a=sin2!2Δϕ+cosϕucosϕasin2!2Δλ
d=2REarctan2!(a, 1−a)
d = 2 R_E arctan2!left(sqrt{a},, sqrt{1-a}right)
d=2REarctan2!(a,1−a)
Stage 2: ENU to Device Frame via the Rotation Matrix
What the rotation matrix does
Android’s TYPE_ROTATION_VECTOR sensor fuses the accelerometer, gyroscope, and magnetometer to output a quaternion representing the phone’s orientation relative to Earth. Calling SensorManager.getRotationMatrixFromVector(R, values) converts that quaternion to a
3×33 times 33×3
rotation matrix stored in a FloatArray(9) in row-major order:
R=(R[0]R[1]R[2] R[3]R[4]R[5] R[6]R[7]R[8])
R = begin{pmatrix} R[0] & R[1] & R[2] R[3] & R[4] & R[5] R[6] & R[7] & R[8] end{pmatrix}
R=(R[0]R[1]R[2] R[3]R[4]R[5] R[6]R[7]R[8])
The column interpretation is the most important thing to understand about this matrix. Each column of
RRR
is a unit vector describing where one device axis points in the ENU world frame:
- Column 0: world direction of the device’s
+Xd+X_d+Xd
axis (right edge of phone)
- Column 1: world direction of the device’s
+Yd+Y_d+Yd
axis (top edge of phone)
- Column 2: world direction of the device’s
+Zd+Z_d+Zd
axis (out of screen, toward your face)
In other words,
RRR
transforms vectors from device frame into ENU world frame:
vENU=R vdevice
mathbf{v}{ENU} = R, mathbf{v}{device}
vENU=Rvdevice
Going the other direction: R transpose
The projection engine needs the inverse transform. Given an ENU vector (the aircraft’s position relative to the user), we want its representation in the device frame so we can reason about whether it is in front of or behind the camera.
Because
RRR
is orthonormal (it is a rotation matrix, so
R−1=R⊤R^{-1} = R^topR−1=R⊤
), the inverse is just the transpose:
vdevice=R⊤ vENU
mathbf{v}{device} = R^top, mathbf{v}{ENU}
vdevice=R⊤vENU
Expanding this using the row-major index layout of Android’s FloatArray(9):
dX=R[0]⋅E+R[1]⋅N+R[2]⋅U
dX = R[0] cdot E + R[1] cdot N + R[2] cdot U
dX=R[0]⋅E+R[1]⋅N+R[2]⋅U
dY=R[3]⋅E+R[4]⋅N+R[5]⋅U
dY = R[3] cdot E + R[4] cdot N + R[5] cdot U
dY=R[3]⋅E+R[4]⋅N+R[5]⋅U
dZ=R[6]⋅E+R[7]⋅N+R[8]⋅U
dZ = R[6] cdot E + R[7] cdot N + R[8] cdot U
dZ=R[6]⋅E+R[7]⋅N+R[8]⋅U
Magnitude preservation as a sanity check
Because
R⊤R^topR⊤
is orthonormal, it preserves vector magnitudes exactly:
∣R⊤ vENU∣=∣vENU∣=d3D
|R^top,mathbf{v}{ENU}| = |mathbf{v}{ENU}| = d_{3D}
∣R⊤vENU∣=∣vENU∣=d3D
This means after computing
(dX,dY,dZ)(dX, dY, dZ)(dX,dY,dZ)
, you can immediately check:
dX2+dY2+dZ2=?d3D
sqrt{dX^2 + dY^2 + dZ^2} stackrel{?}{=} d_{3D}
dX2+dY2+dZ2=?d3D
If these do not match (accounting for floating-point rounding), there is a matrix indexing error somewhere. This check costs essentially nothing and catches the most common implementation mistake immediately.
Stage 3: Device Frame to Camera Frame
Axis conventions
Android’s sensor frame and the camera frame have different axis conventions for the Z direction:
| Axis |
Device frame (
+Zd+Z_d+Zd
) |
Camera frame (
+Cz+C_z+Cz
) |
| Meaning |
Out of screen, toward your face |
Into the scene, away from lens |
These point in opposite directions. The full mapping for a phone held upright in portrait mode is:
| Device axis |
Physical meaning |
Camera axis |
+Xd+X_d+Xd
|
Right edge of phone |
+Cx+C_x+Cx
(camera right) |
+Yd+Y_d+Yd
|
Top edge of phone |
+Cy+C_y+Cy
(camera up) |
+Zd+Z_d+Zd
|
Out of screen |
−Cz-C_z−Cz
(negate) |
So the camera frame components are:
Cx=dX,Cy=dY,Cz=−dZ
C_x = dX, quad C_y = dY, quad C_z = -dZ
Cx=dX,Cy=dY,Cz=−dZ
The negation on
CzC_zCz
is the only required correction for portrait mode. No matrix, no remap call, just one sign flip.
The occlusion test
An aircraft is behind the camera if and only if
Cz≤0C_z leq 0Cz≤0
. At this point, perspective division would be undefined or produce a nonsensical result on the wrong side of the image plane. Any point with
Cz≤0C_z leq 0Cz≤0
must be classified as off-screen before proceeding.
Stage 4: Perspective Projection to Screen Pixels
The pinhole camera model

A point
p=(Cx,Cy,Cz)mathbf{p} = (C_x, C_y, C_z)p=(Cx,Cy,Cz)
in camera space projects onto the image plane via similar triangles. Setting the focal length to 1 (normalised):
nx=CxCz,ny=CyCz
n_x = frac{C_x}{C_z}, qquad n_y = frac{C_y}{C_z}
nx=CzCx,ny=CzCy
This is the perspective divide. An object twice as far away (
CzC_zCz
doubled) produces half the projected displacement from centre. This is what gives AR overlays their correct sense of depth and scale.
Field of view normalisation
The perspective divide alone gives values whose scale depends on the camera’s focal length. Normalising by the half-FOV tangent maps the visible frustum to
[−1,+1][-1, +1][−1,+1]
(Normalised Device Coordinates, NDC):
NDCx=nxtan(θH/2)=CxCz⋅tan(θH/2)
text{NDC}_x = frac{n_x}{tan(theta_H / 2)} = frac{C_x}{C_z cdot tan(theta_H / 2)}
NDCx=tan(θH/2)nx=Cz⋅tan(θH/2)Cx
NDCy=nytan(θV/2)=CyCz⋅tan(θV/2)
text{NDC}_y = frac{n_y}{tan(theta_V / 2)} = frac{C_y}{C_z cdot tan(theta_V / 2)}
NDCy=tan(θV/2)ny=Cz⋅tan(θV/2)Cy
where
θHtheta_HθH
and
θVtheta_VθV
are the horizontal and vertical field of view angles. An aircraft is inside the visible frustum when:
Cz>0and∣NDCx∣≤1and∣NDCy∣≤1
C_z > 0 quad text{and} quad |text{NDC}_x| leq 1 quad text{and} quad |text{NDC}_y| leq 1
Cz>0and∣NDCx∣≤1and∣NDCy∣≤1
For a typical Android rear camera in portrait mode:
θH≈66°theta_H approx 66°θH≈66°
,
θV≈50°theta_V approx 50°θV≈50°
.
NDC to screen pixels
Given screen dimensions
(W,H)(W, H)(W,H)
in pixels:
xpx=NDCx+12×W
x_{px} = frac{text{NDC}_x + 1}{2} times W
xpx=2NDCx+1×W
ypx=1−NDCy2×H
y_{px} = frac{1 – text{NDC}_y}{2} times H
ypx=21−NDCy×H
The
1−NDCy1 – text{NDC}_y1−NDCy
in the second formula is the Y-axis flip. Screen coordinates have
y=0y = 0y=0
at the top-left corner, increasing downward. Camera
+Cy+C_y+Cy
points up. These are opposite conventions, and the formula corrects for it. If you write
NDCy+12×Hfrac{text{NDC}_y + 1}{2} times H2NDCy+1×H
instead, aircraft above the horizon appear below screen centre and vice versa.
The full projection formula
Substituting NDC back in:
xpx=(CxCz⋅tan(θH/2)+1)W2
x_{px} = left(frac{C_x}{C_z cdot tan(theta_H/2)} + 1right) frac{W}{2}
xpx=(Cz⋅tan(θH/2)Cx+1)2W
ypx=(1−CyCz⋅tan(θV/2))H2
y_{px} = left(1 – frac{C_y}{C_z cdot tan(theta_V/2)}right) frac{H}{2}
ypx=(1−Cz⋅tan(θV/2)Cy)2H
Equivalence to the camera intrinsics matrix
This formula is identical to the standard camera intrinsics model
KKK
. The intrinsic matrix for this projection is:
K=(fx0px 0fypy 001)
K = begin{pmatrix} f_x & 0 & p_x 0 & f_y & p_y 0 & 0 & 1 end{pmatrix}
K=(fx0px 0fypy 001)
where:
fx=W2tan(θH/2),fy=H2tan(θV/2),px=W2,py=H2
f_x = frac{W}{2tan(theta_H/2)}, quad f_y = frac{H}{2tan(theta_V/2)}, quad p_x = frac{W}{2}, quad p_y = frac{H}{2}
fx=2tan(θH/2)W,fy=2tan(θV/2)H,px=2W,py=2H
With
W=1080W = 1080W=1080
,
θH=66°theta_H = 66°θH=66°
:
fx=1080/(2tan33°)≈831.5 pxf_x = 1080 / (2tan 33°) approx 831.5,text{px}fx=1080/(2tan33°)≈831.5px
.
The scalar formula and the matrix formulation are the same computation. Understanding the equivalence matters when you want to incorporate lens distortion correction later, which requires working in the intrinsics framework.
Off-Screen Direction Classification
When an aircraft is not in the frustum, a good AR overlay shows an edge indicator pointing toward it. The decision tree is:
-
Cz≤0C_z leq 0Cz≤0
: aircraft is behind the camera. Show a BEHIND indicator or suppress.
-
Cz>0C_z > 0Cz>0
and
NDCx<−1text{NDC}_x < -1NDCx<−1
: off the left edge.
-
Cz>0C_z > 0Cz>0
and
NDCx>+1text{NDC}_x > +1NDCx>+1
: off the right edge.
-
Cz>0C_z > 0Cz>0
and
NDCy>+1text{NDC}_y > +1NDCy>+1
: above the top edge.
-
Cz>0C_z > 0Cz>0
and
NDCy<−1text{NDC}_y < -1NDCy<−1
: below the bottom edge.
For the indicator screen position (with edge padding
ppp
):
LEFT: x = p, y = clamp(ypx, p, H-p)
RIGHT: x = W-p, y = clamp(ypx, p, H-p)
UP: x = clamp(xpx, p, W-p), y = p
DOWN: x = clamp(xpx, p, W-p), y = H-p
The
xpxx_{px}xpx
,
ypxy_{px}ypx
values from the projection formula (extrapolated even when off-screen) give the correct rotation angle for the indicator arrow, so compute them regardless of frustum status.
Full Numerical Walkthrough
You can skip this boring part
Real captured values from a live debugging session.
Given
| Symbol |
Description |
Value |
ϕu,λuphi_u, lambda_uϕu,λu
|
User position |
24.8600° N, 80.9813° E |
ϕa,λaphi_a, lambda_aϕa,λa
|
Aircraft position |
24.9321° N, 81.0353° E |
hah_aha
|
Aircraft altitude |
18,000 ft = 5,486.4 m |
W×HW times HW×H
|
Screen |
1080 × 1997 px |
θH,θVtheta_H, theta_VθH,θV
|
Camera FOV |
66°, 50° |
| Azimuth / Pitch / Roll |
Phone orientation |
33.0°, −4.3°, −6.5° |
Stage 1: Geodetic to ENU
πRE180=111,195 m/deg
frac{pi R_E}{180} = 111{,}195 text{ m/deg}
180πRE=111,195 m/deg
cos(24.86°)=0.9073 ⟹ πRE180cosϕu=100,891 m/deg
cos(24.86°) = 0.9073 implies frac{pi R_E}{180}cosphi_u = 100{,}891 text{ m/deg}
cos(24.86°)=0.9073⟹180πREcosϕu=100,891 m/deg
E=(81.0353−80.9813)×100,891=0.0540×100,891=6,048 m
E = (81.0353 – 80.9813) times 100{,}891 = 0.0540 times 100{,}891 = 6{,}048 text{ m}
E=(81.0353−80.9813)×100,891=0.0540×100,891=6,048 m
N=(24.9321−24.8600)×111,195=0.0721×111,195=8,017 m
N = (24.9321 – 24.8600) times 111{,}195 = 0.0721 times 111{,}195 = 8{,}017 text{ m}
N=(24.9321−24.8600)×111,195=0.0721×111,195=8,017 m
U=5,486.4−0=5,486 m
U = 5{,}486.4 – 0 = 5{,}486 text{ m}
U=5,486.4−0=5,486 m
Stage 2: Bearing and elevation
dhoriz=60482+80172=36,578,304+64,272,289=9,716 m
d_{horiz} = sqrt{6048^2 + 8017^2} = sqrt{36{,}578{,}304 + 64{,}272{,}289} = 9{,}716 text{ m}
dhoriz=60482+80172=36,578,304+64,272,289=9,716 m
d3D=97162+54862=94,400,656+30,095,396=11,167 m
d_{3D} = sqrt{9716^2 + 5486^2} = sqrt{94{,}400{,}656 + 30{,}095{,}396} = 11{,}167 text{ m}
d3D=97162+54862=94,400,656+30,095,396=11,167 m
β=arctan2(6048, 8017)×180π≈37.0° (NNE)
beta = arctan2(6048,, 8017) times frac{180}{pi} approx 37.0°text{ (NNE)}
β=arctan2(6048,8017)×π180≈37.0° (NNE)
ε=arctan2(5486, 9716)×180π≈29.4°
varepsilon = arctan2(5486,, 9716) times frac{180}{pi} approx 29.4°
ε=arctan2(5486,9716)×π180≈29.4°
Stage 3: Rotation matrix
The rotation matrix captured from getRotationMatrixFromVector() at the session:
R=(0.8290.544−0.135 −0.5490.836−0.001 0.1120.0750.991)
R = begin{pmatrix} 0.829 & 0.544 & -0.135 -0.549 & 0.836 & -0.001 0.112 & 0.075 & 0.991 end{pmatrix}
R=(0.8290.544−0.135 −0.5490.836−0.001 0.1120.0750.991)
Applying
R⊤ vENUR^top,mathbf{v}_{ENU}R⊤vENU
via column indexing:
dX=6048(0.829)+8017(−0.549)+5486(0.112)=5,014−4,401+614=1,227
dX = 6048(0.829) + 8017(-0.549) + 5486(0.112) = 5{,}014 – 4{,}401 + 614 = 1{,}227
dX=6048(0.829)+8017(−0.549)+5486(0.112)=5,014−4,401+614=1,227
dY=6048(0.544)+8017(0.836)+5486(0.075)=3,290+6,702+411=10,403
dY = 6048(0.544) + 8017(0.836) + 5486(0.075) = 3{,}290 + 6{,}702 + 411 = 10{,}403
dY=6048(0.544)+8017(0.836)+5486(0.075)=3,290+6,702+411=10,403
dZ=6048(−0.135)+8017(−0.001)+5486(0.991)=−817−8+5,437=4,612
dZ = 6048(-0.135) + 8017(-0.001) + 5486(0.991) = -817 – 8 + 5{,}437 = 4{,}612
dZ=6048(−0.135)+8017(−0.001)+5486(0.991)=−817−8+5,437=4,612
Applying the sign fix:
(Cx,Cy,Cz)=(dX,dY,−dZ)=(1,227, 10,403, −4,612)(C_x, C_y, C_z) = (dX, dY, -dZ) = (1{,}227,; 10{,}403,; -4{,}612)(Cx,Cy,Cz)=(dX,dY,−dZ)=(1,227,10,403,−4,612)
Magnitude check:
12272+104032+46122=1,505,529+108,222,409+21,270,544≈11,179 m
sqrt{1227^2 + 10403^2 + 4612^2} = sqrt{1{,}505{,}529 + 108{,}222{,}409 + 21{,}270{,}544} approx 11{,}179 text{ m}
12272+104032+46122=1,505,529+108,222,409+21,270,544≈11,179 m
Against
d3D=11,167d_{3D} = 11{,}167d3D=11,167
m. The 12 m difference is rounding from truncating
RRR
to 3 decimal places. ✓
Stage 4: Perspective projection
nx=CxCz=1227−4612≈−0.266
n_x = frac{C_x}{C_z} = frac{1227}{-4612} approx -0.266
nx=CzCx=−46121227≈−0.266
ny=CyCz=10403−4612≈−2.255
n_y = frac{C_y}{C_z} = frac{10403}{-4612} approx -2.255
ny=CzCy=−461210403≈−2.255
Both NDC values are large negatives because
Cz<0C_z < 0Cz<0
— the aircraft is behind the camera in this captured frame (the phone was not pointing at the aircraft). The off-screen indicator fires:
Cz≤0C_z leq 0Cz≤0
,
Cx>0C_x > 0Cx>0
, so direction is RIGHT.
This is the correct result. At azimuth 33.0° with the aircraft at bearing 37.0°, the aircraft is only 4° away horizontally. The pitch of −4.3° with elevation 29.4° means the phone is pointing far too low — the net vertical angle to the aircraft is 33.7°, well above the 25° half-FOV for the vertical axis. The phone needs to tilt up considerably to bring the aircraft into frame.
Error Budget
Every approximation in the pipeline introduces a bounded error. These are independent, so the total worst-case screen error is roughly their sum.
| Source |
Max positional error |
Screen error at 50 nm |
Mitigation |
| Flat-earth vs. Haversine |
~130 m at 100 km |
< 2 px |
Acceptable; use Haversine beyond 100 km |
| Spherical vs. WGS-84 ellipsoid |
~55 m at 100 km |
< 1 px |
Negligible |
| User altitude = 0 |
huserh_{user}huser
metres |
1–5 px near mountains |
Read GPS altitude |
| Sensor latency (normal pan) |
~0.1° |
< 3 px |
Acceptable |
| Sensor latency (fast pan, 60°/s) |
~0.6° |
~16 px |
Low-pass filter |
| Lens distortion (not corrected) |
< 10 px at corners |
< 10 px |
Camera2 distortion coefficients |
The flat-earth error derivation: expanding
arcsin(x)≈x+x3/6arcsin(x) approx x + x^3/6arcsin(x)≈x+x3/6
and retaining the leading correction term gives relative error
≈d2/(12RE2)approx d^2 / (12 R_E^2)≈d2/(12RE2)
. At 93 km this is 0.11%.
Quick Reference
ENU (flat-earth)
E = Δλ × (π·RE/180) × cos(φ_user)
N = Δφ × (π·RE/180)
U = h_aircraft - h_user
Distance and bearing from ENU
d_3D = sqrt(E² + N² + U²)
β = atan2(E, N) → [0, 360)
ε = atan2(U, sqrt(E² + N²))
World to device (R transpose — column indexing)
dX = R[0]·E + R[3]·N + R[6]·U
dY = R[1]·E + R[4]·N + R[7]·U
dZ = R[2]·E + R[5]·N + R[8]·U
Device to camera (portrait mode)
Cx = dX, Cy = dY, Cz = -dZ
Perspective projection
NDCx = Cx / (Cz · tan(θH/2))
NDCy = Cy / (Cz · tan(θV/2))
Screen pixels (origin top-left)
xpx = (NDCx + 1) / 2 × W
ypx = (1 - NDCy) / 2 × H
Visible when: Cz > 0 AND |NDCx| ≤ 1 AND |NDCy| ≤ 1
Default FOV (portrait, typical rear camera)
θH ≈ 66°, θV ≈ 50°
Here is also the full math reference document that I have created: Math Reference Document
Questions about any stage of the pipeline or anything else are welcome.
Hope this was helpful.