Optimization and learning-dynamics

A research‑driven exploration of how learning emerges from optimization, signal flow, and dynamical systems implemented entirely from first principles using NumPy.

This project demonstrates mathematical maturity, engineering clarity, and the ability to reason about learning at a mechanistic level.

Project Vision

Modern ML frameworks hide the mechanics of learning. This repository rebuilds them from scratch to answer four foundational questions:

Stage 1: How does credit flow?

Autodiff + population signals

Construct a minimal Tensor class
Build computation graphs
Implement reverse‑mode autodiff
Explore distributed credit assignment through population‑coded activations

Phasee 2: What I Learned

Core Insight

Learning in neural networks is not explained by gradients alone. It is governed by the dynamics of the update rule, which turns optimization into a discrete-time dynamical system. Correct gradients do not guarantee learning; stability, symmetry, and step size critically shape outcomes.

1. Gradients Can Be Correct While Learning Fails

We learned that gradients can be locally correct yet still fail to produce learning because optimization unfolds over time through repeated updates. Gradient descent defines a state evolution process, not a single optimization step. If the update dynamics are unstable, oscillatory, or poorly conditioned, learning fails despite valid gradients.

Key takeaway: Learning failure is often a dynamical failure, not a gradient computation error.

2. Gradient Descent Is a Discrete-Time Dynamical System

Gradient descent is a difference equation:

$$ 𝜃_{𝑡+1}=𝜃_𝑡−𝜂∇𝐿(𝜃_𝑡) $$

This recursive rule evolves parameters over discrete time steps, approximating continuous gradient flow only when the step size is sufficiently small.

Key takeaway: Training trajectories must be analyzed using tools from dynamical systems, not just optimization theory.

3. Step Size Controls Stability, Not Just Speed

The learning rate determines whether updates:
- converge smoothly
- oscillate
- diverge
- explode.

Large step sizes break the approximation to continuous gradient flow and can push the system into unstable regimes.

Key takeaway: Step size defines the stability regime of learning dynamics, not merely training speed.

Instability Has a Precise Mathematical Meaning

Instability occurs when small perturbations grow over time. Formally, this happens when the Jacobian of the update map has eigenvalues with magnitude ≥ 1.

Key takeaway: Instability is diagnosable and predictable using linearized dynamics.

5. Population Symmetry Emerges from Architecture and Initialization

When units are:
- identically initialized, governed by the same update rules, and architecturally interchangeable, the system becomes permutation-equivariant, causing units to evolve identically.

Key takeaway: Symmetry is a structural property of the model and its dynamics, not an accident.

6. Symmetry Breaking Enables Learning

Symmetry breaks when gradients differ across units. This can arise from:
- random initialization
- noise
- architectural bottlenecks,
- unstable Jacobian modes.

Once symmetry breaks, units specialize and learning becomes expressive.

Key takeaway: Learning often requires symmetry breaking.

7. Parameters and Tensors Play Fundamentally Different Roles

Parameters are state variables of the learning dynamical system.
Tensors are intermediate values used for computation.

This separation clarifies why only parameters accumulate history and evolve across time.

Key takeaway: Learning dynamics act on parameters, not on transient computational values.

8. Failure Is Informative

Training failures expose:
- architectural limitations,
- unstable regimes
- symmetry traps
- poor dynamical conditioning.

Rather than being discarded, failures provide diagnostic insight into why learning is impossible under certain conditions.

Key takeaway: Failure reveals the structure and constraints of the learning system.

Week 2 Summary

By the end of Week 2, we shifted from viewing training as “gradient optimization” to understanding it as dynamical system evolution. This reframing explains instability, symmetry, failure modes, and the central role of step size—laying the foundation for deeper analysis of learning dynamics in neural and biologically inspired systems.

Stage 3: How do geometry & optimizers shape learning?

Loss surfaces + momentum

Visualize 1D/2D loss landscapes
Examine curvature, ridges, and basins
Implement momentum, RMSProp, Adam
Show how geometry influences optimizer trajectories

Stage 4 — How does learning become a dynamical system?

Gradient flow + NeuroAI bridge

Treat learning as an ODE
Simulate continuous‑time neural dynamics
Connect optimization principles to biological learning
Explore population‑based representations as dynamical systems

What This Repository Demonstrates

This project shows the ability to:

derive learning rules mathematically
reason about stability and convergence
connect discrete optimization to continuous dynamics
visualize and interpret loss geometry
design clean, research‑grade software
communicate complex ideas clearly and rigorously

These are core competencies for graduate‑level ML, robotics, and NeuroAI research.

Core Implementations

Minimal Tensor class
Reverse‑mode autograd engine
Backpropagation from first principles
Gradient descent + momentum‑based optimizers
Loss landscape visualization tools
Continuous‑time gradient flow simulators
Simple ODE‑based neural dynamics

Experiments:

XOR from scratch
Vanishing & exploding gradients
Stability of training
Gradient flow vs. gradient descent
Population‑coded activations & credit assignment

Each experiment is designed to reveal a specific phenomenon in learning dynamics.

A research‑driven exploration of how learning emerges from optimization, signal flow, and dynamical systems — implemented entirely from first principles using NumPy.

This project demonstrates mathematical maturity, engineering clarity, and the ability to reason about learning at a mechanistic level.

Motivation:

Can I derive learning rules?
Can I reason about stability
Can I connect discrete optimization to continuous dynamics? How does credit flow?
Autodiff + population signals How does learning unfold in time?
GD + stability
Mathematical Foundations
Design decisions
Key experiments
What does this teach about learning systems?
How does this connect to robotics and NeuroAI?

What it contains

Backprop from scratch
Autograd engine (minimal)
Gradient descent variants
Visualization of loss landscapes
Simple ODE-based neural dynamics

Tech

Numpy only
No Pytorch here

Core Components (implementation)

A. Minimal Tensor + Autograd Engine
B. Backpropagation From First Principles
C. Optimization Algorithms
D. Loss Landscapes & Geometry
E. Continuous-Time Gradient Flow

Core Experiments

1 XOR From Scratch
2 Vanishing/Exploding Gradients
3 Stability of Training
4 Gradient Flow vs GD

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
experiments		experiments
figures		figures
math_notes		math_notes
notebooks		notebooks
results		results
src		src
test		test
LICENSE		LICENSE
README.md		README.md
Recommendations-Specific Files		Recommendations-Specific Files
Use Cases.sty		Use Cases.sty
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Optimization and learning-dynamics

Project Vision

Stage 1: How does credit flow?

Autodiff + population signals

Phasee 2: What I Learned

Core Insight

1. Gradients Can Be Correct While Learning Fails

2. Gradient Descent Is a Discrete-Time Dynamical System

3. Step Size Controls Stability, Not Just Speed

5. Population Symmetry Emerges from Architecture and Initialization

6. Symmetry Breaking Enables Learning

7. Parameters and Tensors Play Fundamentally Different Roles

8. Failure Is Informative

Week 2 Summary

Stage 3: How do geometry & optimizers shape learning?

Loss surfaces + momentum

Stage 4 — How does learning become a dynamical system?

Gradient flow + NeuroAI bridge

What This Repository Demonstrates

Core Implementations

Motivation:

What it contains

Tech

Core Components (implementation)

Core Experiments

About

Uh oh!

Releases

Packages

Languages

License

SrEntropy/optimization-and-learning-dynamics

Folders and files

Latest commit

History

Repository files navigation

Optimization and learning-dynamics

Project Vision

Stage 1: How does credit flow?

Autodiff + population signals

Phasee 2: What I Learned

Core Insight

1. Gradients Can Be Correct While Learning Fails

2. Gradient Descent Is a Discrete-Time Dynamical System

3. Step Size Controls Stability, Not Just Speed

5. Population Symmetry Emerges from Architecture and Initialization

6. Symmetry Breaking Enables Learning

7. Parameters and Tensors Play Fundamentally Different Roles

8. Failure Is Informative

Week 2 Summary

Stage 3: How do geometry & optimizers shape learning?

Loss surfaces + momentum

Stage 4 — How does learning become a dynamical system?

Gradient flow + NeuroAI bridge

What This Repository Demonstrates

Core Implementations

Motivation:

What it contains

Tech

Core Components (implementation)

Core Experiments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages