Law #6: Stability (Death by Implementation Details)

Tests should survive refactoring. If your test breaks when you rename a private function or change how you store data internally, it's testing the how, not the what.

TL;DR

Test what your code promises (the public API, the contract), not how it delivers (implementation)
Don't mock your own functions. That's testing your test setup, not your code.
Refactoring shouldn't break tests. Only behavior changes should.

The Failure Mode

You refactor. You don't change any behavior. You run the tests. 40 tests fail.

None of them caught bugs. They just happened to be coupled to:

Internal method names
The order of function calls
How you store data internally (array vs map vs database table)
Variable names
The number of times a helper function gets called

You spend the next 3 hours "fixing" tests. You didn't fix bugs. You updated tests to match your refactor. This is waste.

Why It Kills You

You fear refactoring. You know changing internal code will break tests. So you don't refactor. Code rots.
You waste time. Instead of improving code, you're updating tests that caught zero bugs.
You lose trust. If tests fail for trivial reasons, you stop caring when they fail for real reasons.
New devs drown. Junior engineers see tests failing and don't know if they broke something or just need to "update the snapshots."

Examples: Test the Contract, Not the Implementation

Bad: Mocking Your Own Functions

// 😱 BAD: Testing that you call your own functions
test('creates user', () => {
  const userService = new UserService();
  const validateSpy = jest.spyOn(userService, 'validateEmail');
  const hashSpy = jest.spyOn(userService, 'hashPassword');
  
  userService.createUser('alice@example.com', 'password123');
  
  expect(validateSpy).toHaveBeenCalledWith('alice@example.com');
  expect(hashSpy).toHaveBeenCalledWith('password123');
});

// This test will break if you:
// - Rename validateEmail()
// - Move validation logic elsewhere
// - Combine validateEmail and hashPassword into one function
// - Change the order of operations
// None of these are behavior changes!

Good: Test the Outcome

// ✅ GOOD: Test what actually matters
test('creates user with valid credentials', async () => {
  const result = await createUser('alice@example.com', 'password123');
  
  expect(result.success).toBe(true);
  expect(result.user.email).toBe('alice@example.com');
  
  // Verify the user exists and password is hashed
  const user = await findUserByEmail('alice@example.com');
  expect(user).toBeDefined();
  expect(user.password).not.toBe('password123'); // hashed
  expect(await verifyPassword(user.password, 'password123')).toBe(true);
});

// This test only breaks if behavior changes:
// - User isn't created
// - Password isn't hashed
// - Email is wrong
// Refactor all you want—this test doesn't care HOW you do it.

Bad: Testing Internal Data Structures

# 😱 BAD: Coupled to internal implementation
def test_shopping_cart():
    cart = ShoppingCart()
    cart.add_item('apple', 1.50)
    
    # Testing internal storage format
    assert cart._items == {'apple': {'price': 1.50, 'quantity': 1}}
    assert len(cart._items) == 1
    assert cart._items['apple']['price'] == 1.50

# This breaks if you:
# - Change _items from dict to list
# - Rename _items to _products
# - Store items in a database
# - Change the internal data structure

Good: Test the Public API

# ✅ GOOD: Test through the public interface
def test_shopping_cart():
    cart = ShoppingCart()
    cart.add_item('apple', 1.50)
    
    assert cart.total() == 1.50
    assert cart.item_count() == 1
    assert cart.contains('apple')
    
    items = cart.get_items()
    assert len(items) == 1
    assert items[0].name == 'apple'
    assert items[0].price == 1.50

# Refactor the internal storage all you want.
# Use a dict, list, database, Redis—doesn't matter.
# This test only cares about the contract.

Bad: Testing Function Call Order

// 😱 BAD: Testing the exact sequence of operations
func TestProcessOrder(t *testing.T) {
    mockValidator := &MockValidator{}
    mockInventory := &MockInventory{}
    mockPayment := &MockPayment{}
    
    service := NewOrderService(mockValidator, mockInventory, mockPayment)
    service.ProcessOrder(order)
    
    // Testing implementation details
    if mockValidator.CallCount != 1 {
        t.Error("validator should be called exactly once")
    }
    if mockInventory.CalledBefore(mockPayment) == false {
        t.Error("inventory must be checked before payment")
    }
}

// This breaks if you:
// - Change the order of validation and inventory check
// - Add caching so validator isn't always called
// - Parallelize inventory and payment checks

Good: Test the Behavior

// ✅ GOOD: Test outcomes, not implementation
func TestProcessOrder(t *testing.T) {
    inventory := NewInMemoryInventory()
    inventory.Add("widget", 10)
    
    payment := NewFakePaymentGateway()
    payment.AddBalance("user123", 100.00)
    
    service := NewOrderService(inventory, payment)
    
    result := service.ProcessOrder(Order{
        UserID: "user123",
        Items: []Item{{Name: "widget", Quantity: 2}},
        Total: 20.00,
    })
    
    assert.True(t, result.Success)
    assert.Equal(t, 8, inventory.Count("widget")) // inventory reduced
    assert.Equal(t, 80.00, payment.Balance("user123")) // payment charged
}

// Refactor all you want. Optimize the order of operations.
// This test doesn't care HOW you do it, just that:
// 1. The order succeeds
// 2. Inventory is updated
// 3. Payment is charged

When to Break This Rule

Rare Exceptions

Performance-critical code: If the how matters for performance (e.g., you must use a specific algorithm), document why and test the implementation. But be explicit about the tradeoff.
Security-critical code: If you must verify that secrets are zeroed from memory or specific crypto algorithms are used, test the implementation. But isolate these tests and mark them clearly.
Compliance requirements: If regulations require specific methods (e.g., audit logging format), test them. But keep these separate from your main behavior tests.

Key Principles

✓ Do This

Test through public APIs
Assert on observable outcomes
Use real dependencies when possible
Mock external systems, not your own code
Test state changes, not how they're achieved

✗ Don't Do This

Spy on your own methods
Assert on private fields
Test function call counts/order
Mock everything "for speed"
Assert on internal data structures

The Bottom Line

Your test should break when behavior changes, not when code changes.

If you refactor and tests fail—but the feature still works—those tests were bad. They were testing how you did it, not what you promised.

"A good test is indifferent to refactoring. It only cares if you broke your promise."