Skip to content

Conversation

@samliok
Copy link
Collaborator

@samliok samliok commented Jan 14, 2026

No description provided.

@samliok samliok marked this pull request as ready for review January 21, 2026 20:05
@samliok samliok linked an issue Jan 22, 2026 that may be closed by this pull request
@yacovm yacovm mentioned this pull request Feb 2, 2026

n.BasicInMemoryNetwork.StartInstances()
go n.UpdateTime(n.config.TimeUpdateFrequency, n.config.TimeUpdateAmount, stop)
go n.startTransactionIssuance(stop)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this makes the test reproducible.

We are crashing nodes in parallel to sending transactions to them and in parallel to advancing their time.

We should unify the three goroutines to have a single reproducable scenario.

var crashedNodes []uint64
crashPhase := true

ticker := time.NewTicker(n.config.CrashInterval)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this is the way we should be approaching this.

We're doing uncontrolled crash and restart every 800ms.

Instead, we should make a random "plan" of which nodes will be crashing / restarted at a given time interval, and how many transactions should be issued at that time interval.

Then, we don't change anything until these transactions are committed.


# tmp folder

/tmp No newline at end of file
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the use of adding /tmp?

TimeUpdateFrequency: 100 * time.Millisecond,
TimeUpdateAmount: 500 * time.Millisecond,
CrashInterval: 800 * time.Millisecond,
LogDirectory: "tmp",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to create a temporary folder and if the test succeeds, delete the logs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way you don't need to delete the directory at the beginning of the test

@@ -0,0 +1,150 @@
package random_network
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license header

@@ -0,0 +1,140 @@
package random_network
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license header

@@ -0,0 +1,59 @@
package random_network
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license header

@@ -0,0 +1,4 @@
// Copyright (C) 2019-2025, Ava Labs, Inc. All rights reserved.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this file?

}

// Remove each file (but not subdirectories)
for _, entry := range entries {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't we just remove everything?

os.Remove() should remove the top directory logDir and it should be enough, no?

}

func (n *Network) StartInstances() {
panic("Call Run() Instead")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method isn't used. Why is it needed?

}

availableNodes := n.getAvailableNodesLocked(crashedNodes)
if len(availableNodes) == 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can we have 0 non crashed nodes if we only crash f or less?

numToCrash := n.randomness.Intn(maxAdditionalCrashes) + 1
numToCrash = min(numToCrash, len(availableNodes))

n.randomness.Shuffle(len(availableNodes), func(i, j int) {
Copy link
Collaborator

@yacovm yacovm Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using n.randomness.Perm will fit here better, because you can abort early, but up to you

}
}

func (m *Mempool) PackBlock(ctx context.Context, maxTxs int) []*TX {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we only call PackBlock from BuildBlock. I think we should move the call to WaitForPendingTxs to BuildBlock as it's more natural there.

}

func (m *Mempool) isTxInChain(txID txID, parentDigest simplex.Digest) bool {
block, exists := m.verifiedButNotAcceptedTXs[parentDigest]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this works for a case where we have a leader failover that creates a sibling block.

Suppose block a is notarized and then a block b is built on top of it and broadcast, and verified by a node but then the majority notarize the empty block.

In the next round, block b' is proposed on top of block a.
We need a way to rollback the verified but not accepted map for the parent a because both blocks b and b' are built on top of it.

One way of doing it, is move this book-keeping to the blocks themselves.
Specifically, have the instances of the blocks maintain pointers to blocks they are built upon, and then just have the transaction list inside a block a map, where when we encode the bytes of the block, we do it across some order that is set at the time we build the block.

// Create a copy of the message to avoid mutating shared state in the in-memory network
msgCopy := *msg

switch {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we extract this switch into its own function, or make some kind of copy() function for the *simplex.Message ?

@yacovm
Copy link
Collaborator

yacovm commented Feb 2, 2026

Can we rebase on top of main?

Base automatically changed from refactor-testutil to main February 3, 2026 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create a testing framework that generates random test cases

4 participants