Arrowjet is now a Cross-Database Sync Tool in Python (PG, MySQL, Redshift)

I’ve been building Arrowjet, an open-source Python library for fast bulk data movement. It started as a Redshift speed tool, but it now supports PostgreSQL, MySQL, and cross-database transfers.

The latest addition: stateful sync that keeps tables in sync across databases.

The problem

Moving data between databases usually means writing custom scripts per source/destination pair. Add incremental logic, schema drift handling, retry on failure, and you’re maintaining a mini-ETL framework.

What sync does

One function call:

import arrowjet
from arrowjet_pro import sync

pg = arrowjet.Engine(provider="postgresql")
my = arrowjet.Engine(provider="mysql")

result = sync(
    source_engine=pg, source_conn=pg_conn,
    dest_engine=my, dest_conn=mysql_conn,
    table="orders",
    key_column="updated_at",  # incremental
)
# Sync SUCCESS: 12,000 rows (incremental)

It decides full vs incremental automatically based on previous state. Truncates destination on full sync. Validates row counts after. Retries with backoff on failure.

Schema-level sync

Sync an entire schema with filtering:

from arrowjet_pro import sync_schema

result = sync_schema(
    source_engine=pg, source_conn=pg_conn,
    dest_engine=my, dest_conn=mysql_conn,
    schema="public",
    exclude=["*_tmp", "*_backup"],
)
# Multi-table sync: ALL OK
#   Tables: 14/14 succeeded
#   Total rows: 2,340,000

YAML config for repeatable jobs

source:
  profile: my-postgres
destination:
  profile: my-mysql
defaults:
  mode: auto
  key_column: updated_at
  retry: 2
tables:
  - orders
  - users
  - name: products
    dest_table: product_catalog
    mode: full

CLI

arrowjet sync --table orders 
  --from-profile pg --to-profile mysql 
  --key-column updated_at

arrowjet sync --schema public 
  --from-profile pg --to-profile mysql 
  --exclude "*_tmp" --dry-run

Under the hood

All transfers use the fast path for each database:

PostgreSQL: COPY protocol (850x faster than INSERT)
MySQL: LOAD DATA LOCAL INFILE (6.6x faster)
Redshift: COPY/UNLOAD via S3

Arrow is the in-memory bridge between databases. No intermediate files, no serialization overhead.

Future

pip install arrowjet – bulk read/write/transfer, CLI, 3 database providers
pip install arrowjet-pro – sync, drift detection, schema auto-fix, alerting, operation log

GitHub: https://github.com/arrowjet/arrowjet PyPI: https://pypi.org/project/arrowjet/0.6.0/

Arrowjet is now a Cross-Database Sync Tool in Python (PG, MySQL, Redshift)

The problem

What sync does

Schema-level sync

YAML config for repeatable jobs

CLI

Under the hood

Future

Search

Quads Text

Recent Posts

Archives

Meta