Skip to content

Conversation

@Ulthran
Copy link
Contributor

@Ulthran Ulthran commented Jan 30, 2026

Motivation

  • Make machine type resolution simpler and deterministic by fetching the TSV once at import time and falling back to a bundled mapping if the fetch fails.
  • Avoid in-memory cache/Mapping indirection and keep the public MACHINE_TYPES as a plain dictionary for simpler access.

Description

  • Add MACHINE_TYPES_FALLBACK and MACHINE_TYPES_URL, and replace the previous MachineTypesMapping/load_machine_types() flow with an import-time urlopen() attempt that parses the TSV into MACHINE_TYPES.
  • Emit a RuntimeWarning and assign MACHINE_TYPES = MACHINE_TYPES_FALLBACK when the network fetch fails, times out, or yields no usable rows.
  • Update test/test_illumina.py to monkeypatch urllib.request.urlopen, reload the seqBackupLib.illumina module with importlib.reload() via an illumina_module fixture, and reference illumina_module.MACHINE_TYPES and classes directly.
  • Remove the previous loader function and mapping class and streamline TSV parsing into a dict comprehension with header handling and validation.

Testing

  • Ran black . which reformatted the modified files successfully.
  • Ran pytest test and all tests passed: 23 passed.

Codex Task

Copilot AI review requested due to automatic review settings January 30, 2026 22:20
@Ulthran Ulthran merged commit d3276bd into master Jan 30, 2026
9 of 10 checks passed
@Ulthran Ulthran deleted the codex/retrieve-machine-mapping-from-external-source-h6mpk7 branch January 30, 2026 22:21
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR simplifies machine type loading by fetching machine type mappings from a TSV file at module import time, with a fallback to a bundled dictionary if the fetch fails. This replaces the previous MachineTypesMapping class and lazy-loading approach with a more direct, import-time solution.

Changes:

  • Added import-time TSV fetching with urlopen and fallback to MACHINE_TYPES_FALLBACK when network requests fail
  • Updated tests to use importlib.reload() with monkeypatched urlopen to control module loading behavior
  • Added tests for successful TSV loading and fallback scenarios with warnings

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File Description
seqBackupLib/illumina.py Replaced lazy-loading mechanism with import-time TSV fetch, added fallback constants and exception handling with runtime warnings
test/test_illumina.py Added illumina_module fixture using module reload for test isolation, updated parametrized tests to use fixture, added new tests for TSV loading and fallback behavior

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +22 to +54
def illumina_module(monkeypatch):
tsv_rows = ["instrument_code\tmachine_type"]
fallback = {
"VH": "Illumina-NextSeq",
"D": "Illumina-HiSeq",
"M": "Illumina-MiSeq",
"A": "Illumina-NovaSeq",
"NB": "Illumina-MiniSeq",
"LH": "Illumina-NovaSeqX",
"SH": "Illumina-MiSeq",
}
tsv_rows.extend(f"{code}\t{machine}" for code, machine in fallback.items())
tsv = "\n".join(tsv_rows) + "\n"

class FakeResponse:
def __init__(self, data: str):
self._data = data

def read(self):
return self._data.encode("utf-8")

def __enter__(self):
return self

def __exit__(self, exc_type, exc, tb):
return False

monkeypatch.setattr(
"urllib.request.urlopen", lambda *args, **kwargs: FakeResponse(tsv)
)
import seqBackupLib.illumina as illumina

return importlib.reload(illumina)
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using importlib.reload() modifies global module state and can cause test isolation issues if tests run in parallel. Consider using a more isolated approach such as extracting the TSV loading logic into a testable function, or document that these tests must run serially. The current approach where illumina_module fixture reloads the module means any test using this fixture will affect the global MACHINE_TYPES state.

Copilot uses AI. Check for mistakes.
Comment on lines +104 to +115
class FakeResponse:
def __init__(self, data: str):
self._data = data

def read(self):
return self._data.encode("utf-8")

def __enter__(self):
return self

def __exit__(self, exc_type, exc, tb):
return False
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FakeResponse class is duplicated in both illumina_module fixture and test_load_machine_types_from_tsv. Consider extracting this into a helper function or a separate fixture to reduce duplication and improve maintainability.

Copilot uses AI. Check for mistakes.
Comment on lines +37 to +42
except (URLError, TimeoutError, ValueError) as exc:
warnings.warn(
f"Falling back to bundled machine types; unable to load {MACHINE_TYPES_URL}: {exc}",
RuntimeWarning,
)
MACHINE_TYPES = MACHINE_TYPES_FALLBACK
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catching ValueError in the exception handler (line 37) will catch not only the "machine_types.tsv contained no usable rows" error but also any ValueError raised by csv.reader or during string decoding. While this is likely the intended behavior, it may mask unexpected errors. Consider being more specific about which ValueError instances should trigger the fallback, or add logging to distinguish between different failure modes.

Suggested change
except (URLError, TimeoutError, ValueError) as exc:
warnings.warn(
f"Falling back to bundled machine types; unable to load {MACHINE_TYPES_URL}: {exc}",
RuntimeWarning,
)
MACHINE_TYPES = MACHINE_TYPES_FALLBACK
except (URLError, TimeoutError) as exc:
warnings.warn(
f"Falling back to bundled machine types; unable to load {MACHINE_TYPES_URL}: {exc}",
RuntimeWarning,
)
MACHINE_TYPES = MACHINE_TYPES_FALLBACK
except ValueError as exc:
if str(exc) == "machine_types.tsv contained no usable rows":
warnings.warn(
f"Falling back to bundled machine types; unable to load {MACHINE_TYPES_URL}: {exc}",
RuntimeWarning,
)
MACHINE_TYPES = MACHINE_TYPES_FALLBACK
else:
raise

Copilot uses AI. Check for mistakes.
Comment on lines +23 to +42
try:
with urlopen(MACHINE_TYPES_URL, timeout=10) as response:
rows = list(
csv.reader(response.read().decode("utf-8").splitlines(), delimiter="\t")
)
if rows and rows[0] and rows[0][0].lower() in {"instrument_code", "code"}:
rows = rows[1:]
MACHINE_TYPES = {
row[0].strip(): row[1].strip()
for row in rows
if len(row) >= 2 and row[0].strip() and row[1].strip()
}
if not MACHINE_TYPES:
raise ValueError("machine_types.tsv contained no usable rows")
except (URLError, TimeoutError, ValueError) as exc:
warnings.warn(
f"Falling back to bundled machine types; unable to load {MACHINE_TYPES_URL}: {exc}",
RuntimeWarning,
)
MACHINE_TYPES = MACHINE_TYPES_FALLBACK
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The import-time network fetch is the stated design goal, but be aware of operational implications: (1) Python caches modules after first import, so subsequent imports in the same process won't make new requests; (2) However, every new Python process will make a request, which could cause issues in high-frequency scenarios like AWS Lambda cold starts or parallel test execution; (3) The 10-second timeout means slow networks could delay application startup. These tradeoffs are acceptable for many use cases but worth documenting.

Copilot uses AI. Check for mistakes.
}
if not MACHINE_TYPES:
raise ValueError("machine_types.tsv contained no usable rows")
except (URLError, TimeoutError, ValueError) as exc:
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception handler should catch OSError instead of just URLError and TimeoutError. While URLError is a subclass of OSError and will catch many network errors, other OSError subclasses like ConnectionRefusedError, ConnectionResetError, and SSL-related errors can also be raised by urlopen. Catching OSError would provide more comprehensive error handling and ensure the fallback is used for all network-related failures. Alternatively, catch Exception if you want to ensure any failure triggers the fallback.

Suggested change
except (URLError, TimeoutError, ValueError) as exc:
except (OSError, ValueError) as exc:

Copilot uses AI. Check for mistakes.
return importlib.reload(illumina)


@pytest.mark.parametrize("machine_type", machine_fixtures.keys())
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parametrization was changed from MACHINE_TYPES.keys() to machine_fixtures.keys(). This means if new machine types are added to MACHINE_TYPES_FALLBACK but not to machine_fixtures, they won't be tested. Consider adding a separate test that verifies all keys in MACHINE_TYPES_FALLBACK have corresponding fixtures in machine_fixtures, or document that machine_fixtures must be kept in sync with MACHINE_TYPES_FALLBACK.

Copilot uses AI. Check for mistakes.
fp = request.getfixturevalue(fixture_name)

d = IlluminaDir(fp.name)
d = illumina_module.IlluminaDir(fp.name)
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable d is not used.

Suggested change
d = illumina_module.IlluminaDir(fp.name)
illumina_module.IlluminaDir(fp.name)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants