End-to-End Neural Network Patent Coverage for Autonomous Driving
Executive summary
Two granted US patents (12,001,207 and 12,530,030) contain 33 claims covering a camera-based autonomous driving system. Claim 13 of US 12,530,030 explicitly covers deep learning via network topology for converting navigation instructions into directional and acceleration values. The patent specification names Nvidia's DAVE-2 end-to-end driving network as a suitable implementation.
The claims are implementation-agnostic. They cover the functional pipeline of taking camera images and navigation instructions in, and producing vehicle control outputs, regardless of whether the underlying network uses CNNs, transformers, vision-language-action models, or any other architecture. As model architectures evolve, the claims remain relevant.
This page is for companies building vision-first end-to-end learned driving systems: Wayve, comma.ai, Black Sesame, Momenta, Chinese OEMs developing foundation driving models, and any team training a neural network to convert camera images into vehicle control commands.
What Claim 13 covers
Claim 13 of US 12,530,030 covers "deep learning via network topology for converting navigation instructions into directional and acceleration values."
In plain terms: a trained neural network takes navigation intent (where to go) and produces motor commands (steering and throttle). That is the end-to-end formulation. The patent abstract puts it this way: the control module converts "the navigation instruction(s) and the camera images into control values and acceleration values." Camera images in, control values out.
Claim 13 is a dependent claim. It depends on Claim 1, which describes the full safety-gated control loop: the system computes a safety value by comparing live camera images to stored reference images, and only executes navigation instructions autonomously if that safety value exceeds a predetermined threshold. The deep learning conversion in Claim 13 operates within this safety framework.
This dependency is a feature, not a limitation. Commercial deployment of end-to-end driving requires safety constraints. No production system runs a raw neural network output straight to the actuators without checks. Tesla FSD, Wayve LINGO, comma.ai openpilot all include safety layers around their learned driving models. Claim 13 covers end-to-end neural network driving done within a safety-gated architecture, which is the version that actually gets deployed.
The specification names Nvidia DAVE-2
The patent specification identifies "Nvidia Dave 2 network topology" as a suitable implementation for the system. DAVE-2 is Nvidia's 2016 end-to-end driving network, described in the paper "End to End Learning for Self-Driving Cars" by Bojarski et al. DAVE-2 takes raw camera images as input and directly outputs steering commands through a convolutional neural network, with no intermediate perception steps.
This matters because it shows the patent contemplated end-to-end neural network driving from the start. The specification was written when end-to-end approaches were still considered experimental. By naming DAVE-2 as a target architecture, the inventors made clear that the claimed system was designed to work with learned end-to-end driving models.
DAVE-2 is an ancestor of the architectures now used in production. Tesla FSD v12+ replaced its modular perception stack with an end-to-end neural network that maps camera images to driving commands. Wayve's foundation driving model does the same thing. The functional pipeline is unchanged from DAVE-2: cameras in, controls out, neural network in between.
Imitation learning and behavioral cloning
The clear-passage-determining module (Claims 7-11) learns from watching human operators. The specification describes a concrete example: the system receives the instruction "turn left at the next junction," but oncoming traffic has right of way. During training, the system watched a human operator wait in this situation rather than turn. It learned to associate that visual pattern (oncoming vehicles at a left turn) with "do not proceed."
This is behavioral cloning, also called imitation learning from demonstrations. The system observes expert behavior and learns a policy that maps visual inputs to actions. The same training paradigm is used by Tesla FSD (which trains on millions of hours of human driving clips), Wayve (which trains its foundation model on human driving demonstrations), and comma.ai (which collects driving data from its user fleet for model training).
The patent's approach fits the standard behavioral cloning pipeline: a human demonstrates correct behavior, the system records the visual observations and corresponding actions, and a model learns to reproduce those decisions when it encounters similar situations. The left-turn example is just one instance. The same mechanism generalizes to any situation where the system learns driving behavior from recorded human demonstrations.
Training data feedback loop
Claim 1 describes a built-in mechanism for accumulating and updating training data. After the system executes a navigation instruction, it captures new images and compares them against expected visual outcomes to confirm the vehicle reached its target. These new images get stored alongside the existing reference data, growing the training dataset over time.
This is the data flywheel concept that companies like Tesla and Wayve rely on: more driving produces more data, which trains better models, which enables more driving. The patent claims this feedback loop as part of the core control method, not as a separate training process. Every drive is also a data collection run.
For companies building end-to-end driving systems, the data flywheel is often the primary competitive advantage. The patent's claim coverage over this update-and-verify cycle is relevant to any system that improves its driving model by accumulating new visual data from real-world operation.
Implementation-agnostic scope
The claims describe a functional pipeline: camera images and navigation instructions go in, safety assessment happens, control values come out. They do not specify a particular network architecture.
When the patent was filed, convolutional neural networks were the standard approach for vision-based driving (as in DAVE-2). Since then, the field has moved to transformers, vision-language models, diffusion policies, and various hybrid architectures. The claims cover the pipeline regardless of what sits in the middle.
This means the patent's relevance does not depend on any single architectural trend. Whether a company uses:
- Convolutional neural networks (the original end-to-end approach)
- Transformer-based vision models
- Vision-language-action models that take text instructions and camera images as input
- Diffusion-based policy models
- Any future architecture that converts camera images and navigation instructions into control values
...the functional pipeline described in the claims applies. The claims track the problem being solved (camera-to-control conversion with safety gating), not the specific neural network solving it.
Who this applies to
This patent coverage is relevant to companies that:
- Train neural networks on camera data to produce driving commands (the core end-to-end pipeline)
- Use behavioral cloning or imitation learning from human driving demonstrations to train driving models
- Build vision foundation models for autonomous driving that take images in and produce control outputs
- Operate data flywheels where production driving generates training data for model improvement
- Develop safety-gated architectures that wrap neural network driving in confidence-based checks
Specific categories include end-to-end driving startups (Wayve, comma.ai, Ghost Autonomy), Chinese OEMs and suppliers building learned driving systems (Momenta, Black Sesame, Horizon Robotics, Huawei), traditional OEMs adopting neural network driving (GM, Ford, Hyundai/Aptiv), and chip companies whose reference designs include end-to-end driving pipelines (Nvidia, Qualcomm, Mobileye).
Licensing
The patent portfolio (US 12,001,207 and US 12,530,030) is available for licensing. Both patents expire March 5, 2041, leaving over 15 years of protection. The portfolio includes 33 claims across method, system, and computer program product categories, plus a granted European patent (EP3786756B1) in the same family.
Licensing is available in exclusive and non-exclusive arrangements, with structures including per-vehicle royalties, field-of-use exclusives, and portfolio licenses.
For companies building end-to-end neural network driving systems, licensing provides freedom-to-operate coverage for the camera-to-control pipeline and the behavioral cloning training approach, backed by claims that explicitly name deep learning as a covered implementation.
Explore licensing options or contact us to discuss terms.
Related resources
- Patent portfolio overview -- full technical details for both US patents
- Continuation patent (US 12,530,030) -- Claim 13 deep learning coverage and clear-passage module
- Tesla FSD competitor patent licensing -- camera-first patent strategy for companies competing with Tesla
- Camera vs LiDAR patent landscape -- sensor-based patent risk comparison for startups
- Patent licensing options -- licensing structures and terms
- About the inventors -- research background and technology development
- Contact us -- discuss licensing for your specific use case
Ready to License This Patent Portfolio?
Contact us to discuss licensing opportunities for your autonomous vehicle or drone navigation projects.
GET IN TOUCH