-
Notifications
You must be signed in to change notification settings - Fork 4
Fuzz Network Test Framework #326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
* fix: test hang * test: restore double verification fail
|
|
||
| n.BasicInMemoryNetwork.StartInstances() | ||
| go n.UpdateTime(n.config.TimeUpdateFrequency, n.config.TimeUpdateAmount, stop) | ||
| go n.startTransactionIssuance(stop) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this makes the test reproducible.
We are crashing nodes in parallel to sending transactions to them and in parallel to advancing their time.
We should unify the three goroutines to have a single reproducable scenario.
| var crashedNodes []uint64 | ||
| crashPhase := true | ||
|
|
||
| ticker := time.NewTicker(n.config.CrashInterval) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure this is the way we should be approaching this.
We're doing uncontrolled crash and restart every 800ms.
Instead, we should make a random "plan" of which nodes will be crashing / restarted at a given time interval, and how many transactions should be issued at that time interval.
Then, we don't change anything until these transactions are committed.
|
|
||
| # tmp folder | ||
|
|
||
| /tmp No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the use of adding /tmp?
| TimeUpdateFrequency: 100 * time.Millisecond, | ||
| TimeUpdateAmount: 500 * time.Millisecond, | ||
| CrashInterval: 800 * time.Millisecond, | ||
| LogDirectory: "tmp", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to create a temporary folder and if the test succeeds, delete the logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This way you don't need to delete the directory at the beginning of the test
| @@ -0,0 +1,150 @@ | |||
| package random_network | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
license header
| @@ -0,0 +1,140 @@ | |||
| package random_network | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
license header
| @@ -0,0 +1,59 @@ | |||
| package random_network | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
license header
| @@ -0,0 +1,4 @@ | |||
| // Copyright (C) 2019-2025, Ava Labs, Inc. All rights reserved. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this file?
| } | ||
|
|
||
| // Remove each file (but not subdirectories) | ||
| for _, entry := range entries { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why can't we just remove everything?
os.Remove() should remove the top directory logDir and it should be enough, no?
| } | ||
|
|
||
| func (n *Network) StartInstances() { | ||
| panic("Call Run() Instead") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method isn't used. Why is it needed?
| } | ||
|
|
||
| availableNodes := n.getAvailableNodesLocked(crashedNodes) | ||
| if len(availableNodes) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how can we have 0 non crashed nodes if we only crash f or less?
| numToCrash := n.randomness.Intn(maxAdditionalCrashes) + 1 | ||
| numToCrash = min(numToCrash, len(availableNodes)) | ||
|
|
||
| n.randomness.Shuffle(len(availableNodes), func(i, j int) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using n.randomness.Perm will fit here better, because you can abort early, but up to you
| } | ||
| } | ||
|
|
||
| func (m *Mempool) PackBlock(ctx context.Context, maxTxs int) []*TX { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we only call PackBlock from BuildBlock. I think we should move the call to WaitForPendingTxs to BuildBlock as it's more natural there.
| } | ||
|
|
||
| func (m *Mempool) isTxInChain(txID txID, parentDigest simplex.Digest) bool { | ||
| block, exists := m.verifiedButNotAcceptedTXs[parentDigest] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this works for a case where we have a leader failover that creates a sibling block.
Suppose block a is notarized and then a block b is built on top of it and broadcast, and verified by a node but then the majority notarize the empty block.
In the next round, block b' is proposed on top of block a.
We need a way to rollback the verified but not accepted map for the parent a because both blocks b and b' are built on top of it.
One way of doing it, is move this book-keeping to the blocks themselves.
Specifically, have the instances of the blocks maintain pointers to blocks they are built upon, and then just have the transaction list inside a block a map, where when we encode the bytes of the block, we do it across some order that is set at the time we build the block.
| // Create a copy of the message to avoid mutating shared state in the in-memory network | ||
| msgCopy := *msg | ||
|
|
||
| switch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we extract this switch into its own function, or make some kind of copy() function for the *simplex.Message ?
|
Can we rebase on top of main? |
No description provided.