The enterprise AI control that is still missing: code provenance

Enterprise AI governance keeps getting framed as a policy problem. Write acceptable-use rules. Turn on SSO. Add RBAC. Review risky PRs more carefully. That is all useful, but it still misses the one thing auditors, security teams, and incident responders actually need when AI-generated code reaches production: provenance.

Not “did someone use AI.” Not “did the vendor log usage.” Provenance.

When a critical bug lands in production, the question is not theoretical. Someone has to answer:

What was generated?
What was asked?
Which model produced it?
Which file did it land in?
Who accepted it?
Was it reviewed?
Can we trace that decision later?

Git blame does not answer those questions. Vendor audit logs usually do not either. In most enterprise setups, you end up with three separate blind spots:

A commit history that shows authorship, not generation.
A Copilot-style usage log that only covers one tool.
A pile of PR comments and comments in code that rely on human discipline.

That is not an audit trail. It is a loose collection of hints.

The missing control is code provenance.

LineageLens is built around that gap. It records the prompt, the model, the tool, the target file, the inserted code, and whether the edit was accepted or rejected. It does that in a self-hosted way, so the provenance stays inside your infrastructure instead of becoming another SaaS data trail.
This is also where most generic logging strategies break down. Datadog and Splunk are excellent when you already know what to instrument. They are not purpose-built for AI provenance. If you want them to solve this problem, you have to build custom instrumentation, define your own schema, and keep that instrumentation working across multiple coding tools as their protocols change.

That is why I do not think the enterprise answer is “use your observability stack.” Observability tells you what happened at runtime. Provenance tells you how code entered the repository.

That distinction matters more as AI coding becomes normal.

If your team uses one tool, maybe you can tolerate a partial log. If your team uses Cursor in the morning, Claude Code for refactors, and Copilot in the editor, partial logging becomes a governance gap. The risk is not just productivity drift. It is that nobody can later say, with evidence, how the code got there.

LineageLens is not a static analysis scanner and it is not a compliance certification product. It does not replace review, SAST, or policy enforcement. It does one narrower job: it records the provenance trail that those systems need but do not create.

That is why the product has multiple deployment modes. Base is local and offline. Lite is a single Docker container with SQLite. Plus adds PostgreSQL, semantic search, team visibility, and governance. Max adds graph lineage for teams that need ancestry across tools and sessions. Different orgs need different operational weight, but the underlying question is the same: can you prove where AI-generated code came from?

For enterprise teams, I think this is the right way to frame the conversation:

If the code is not provenance-tagged, then your review process is partly guesswork.
If the prompt is missing, then your audit trail is incomplete.
If the record is not self-hosted, then your governance data lives somewhere else.
If you only track one vendor, then you are not tracking the team.

That is the argument I would want to make in a security review.
If you want the deeper technical breakdown, I wrote a longer companion post for Hashnode and the product overview is on lineagelens-website.vercel.app.

Tags: ai, security, devops, opensource

End question: What is your team using today to prove that AI-generated code is actually traceable six months later?