-
Notifications
You must be signed in to change notification settings - Fork 5
Simplify machine type loading with TSV fetch and fallback #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify machine type loading with TSV fetch and fallback #21
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR simplifies machine type loading by fetching machine type mappings from a TSV file at module import time, with a fallback to a bundled dictionary if the fetch fails. This replaces the previous MachineTypesMapping class and lazy-loading approach with a more direct, import-time solution.
Changes:
- Added import-time TSV fetching with
urlopenand fallback toMACHINE_TYPES_FALLBACKwhen network requests fail - Updated tests to use
importlib.reload()with monkeypatchedurlopento control module loading behavior - Added tests for successful TSV loading and fallback scenarios with warnings
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
| seqBackupLib/illumina.py | Replaced lazy-loading mechanism with import-time TSV fetch, added fallback constants and exception handling with runtime warnings |
| test/test_illumina.py | Added illumina_module fixture using module reload for test isolation, updated parametrized tests to use fixture, added new tests for TSV loading and fallback behavior |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def illumina_module(monkeypatch): | ||
| tsv_rows = ["instrument_code\tmachine_type"] | ||
| fallback = { | ||
| "VH": "Illumina-NextSeq", | ||
| "D": "Illumina-HiSeq", | ||
| "M": "Illumina-MiSeq", | ||
| "A": "Illumina-NovaSeq", | ||
| "NB": "Illumina-MiniSeq", | ||
| "LH": "Illumina-NovaSeqX", | ||
| "SH": "Illumina-MiSeq", | ||
| } | ||
| tsv_rows.extend(f"{code}\t{machine}" for code, machine in fallback.items()) | ||
| tsv = "\n".join(tsv_rows) + "\n" | ||
|
|
||
| class FakeResponse: | ||
| def __init__(self, data: str): | ||
| self._data = data | ||
|
|
||
| def read(self): | ||
| return self._data.encode("utf-8") | ||
|
|
||
| def __enter__(self): | ||
| return self | ||
|
|
||
| def __exit__(self, exc_type, exc, tb): | ||
| return False | ||
|
|
||
| monkeypatch.setattr( | ||
| "urllib.request.urlopen", lambda *args, **kwargs: FakeResponse(tsv) | ||
| ) | ||
| import seqBackupLib.illumina as illumina | ||
|
|
||
| return importlib.reload(illumina) |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using importlib.reload() modifies global module state and can cause test isolation issues if tests run in parallel. Consider using a more isolated approach such as extracting the TSV loading logic into a testable function, or document that these tests must run serially. The current approach where illumina_module fixture reloads the module means any test using this fixture will affect the global MACHINE_TYPES state.
| class FakeResponse: | ||
| def __init__(self, data: str): | ||
| self._data = data | ||
|
|
||
| def read(self): | ||
| return self._data.encode("utf-8") | ||
|
|
||
| def __enter__(self): | ||
| return self | ||
|
|
||
| def __exit__(self, exc_type, exc, tb): | ||
| return False |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The FakeResponse class is duplicated in both illumina_module fixture and test_load_machine_types_from_tsv. Consider extracting this into a helper function or a separate fixture to reduce duplication and improve maintainability.
| except (URLError, TimeoutError, ValueError) as exc: | ||
| warnings.warn( | ||
| f"Falling back to bundled machine types; unable to load {MACHINE_TYPES_URL}: {exc}", | ||
| RuntimeWarning, | ||
| ) | ||
| MACHINE_TYPES = MACHINE_TYPES_FALLBACK |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Catching ValueError in the exception handler (line 37) will catch not only the "machine_types.tsv contained no usable rows" error but also any ValueError raised by csv.reader or during string decoding. While this is likely the intended behavior, it may mask unexpected errors. Consider being more specific about which ValueError instances should trigger the fallback, or add logging to distinguish between different failure modes.
| except (URLError, TimeoutError, ValueError) as exc: | |
| warnings.warn( | |
| f"Falling back to bundled machine types; unable to load {MACHINE_TYPES_URL}: {exc}", | |
| RuntimeWarning, | |
| ) | |
| MACHINE_TYPES = MACHINE_TYPES_FALLBACK | |
| except (URLError, TimeoutError) as exc: | |
| warnings.warn( | |
| f"Falling back to bundled machine types; unable to load {MACHINE_TYPES_URL}: {exc}", | |
| RuntimeWarning, | |
| ) | |
| MACHINE_TYPES = MACHINE_TYPES_FALLBACK | |
| except ValueError as exc: | |
| if str(exc) == "machine_types.tsv contained no usable rows": | |
| warnings.warn( | |
| f"Falling back to bundled machine types; unable to load {MACHINE_TYPES_URL}: {exc}", | |
| RuntimeWarning, | |
| ) | |
| MACHINE_TYPES = MACHINE_TYPES_FALLBACK | |
| else: | |
| raise |
| try: | ||
| with urlopen(MACHINE_TYPES_URL, timeout=10) as response: | ||
| rows = list( | ||
| csv.reader(response.read().decode("utf-8").splitlines(), delimiter="\t") | ||
| ) | ||
| if rows and rows[0] and rows[0][0].lower() in {"instrument_code", "code"}: | ||
| rows = rows[1:] | ||
| MACHINE_TYPES = { | ||
| row[0].strip(): row[1].strip() | ||
| for row in rows | ||
| if len(row) >= 2 and row[0].strip() and row[1].strip() | ||
| } | ||
| if not MACHINE_TYPES: | ||
| raise ValueError("machine_types.tsv contained no usable rows") | ||
| except (URLError, TimeoutError, ValueError) as exc: | ||
| warnings.warn( | ||
| f"Falling back to bundled machine types; unable to load {MACHINE_TYPES_URL}: {exc}", | ||
| RuntimeWarning, | ||
| ) | ||
| MACHINE_TYPES = MACHINE_TYPES_FALLBACK |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The import-time network fetch is the stated design goal, but be aware of operational implications: (1) Python caches modules after first import, so subsequent imports in the same process won't make new requests; (2) However, every new Python process will make a request, which could cause issues in high-frequency scenarios like AWS Lambda cold starts or parallel test execution; (3) The 10-second timeout means slow networks could delay application startup. These tradeoffs are acceptable for many use cases but worth documenting.
| } | ||
| if not MACHINE_TYPES: | ||
| raise ValueError("machine_types.tsv contained no usable rows") | ||
| except (URLError, TimeoutError, ValueError) as exc: |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exception handler should catch OSError instead of just URLError and TimeoutError. While URLError is a subclass of OSError and will catch many network errors, other OSError subclasses like ConnectionRefusedError, ConnectionResetError, and SSL-related errors can also be raised by urlopen. Catching OSError would provide more comprehensive error handling and ensure the fallback is used for all network-related failures. Alternatively, catch Exception if you want to ensure any failure triggers the fallback.
| except (URLError, TimeoutError, ValueError) as exc: | |
| except (OSError, ValueError) as exc: |
| return importlib.reload(illumina) | ||
|
|
||
|
|
||
| @pytest.mark.parametrize("machine_type", machine_fixtures.keys()) |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parametrization was changed from MACHINE_TYPES.keys() to machine_fixtures.keys(). This means if new machine types are added to MACHINE_TYPES_FALLBACK but not to machine_fixtures, they won't be tested. Consider adding a separate test that verifies all keys in MACHINE_TYPES_FALLBACK have corresponding fixtures in machine_fixtures, or document that machine_fixtures must be kept in sync with MACHINE_TYPES_FALLBACK.
| fp = request.getfixturevalue(fixture_name) | ||
|
|
||
| d = IlluminaDir(fp.name) | ||
| d = illumina_module.IlluminaDir(fp.name) |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable d is not used.
| d = illumina_module.IlluminaDir(fp.name) | |
| illumina_module.IlluminaDir(fp.name) |
Motivation
MACHINE_TYPESas a plain dictionary for simpler access.Description
MACHINE_TYPES_FALLBACKandMACHINE_TYPES_URL, and replace the previousMachineTypesMapping/load_machine_types()flow with an import-timeurlopen()attempt that parses the TSV intoMACHINE_TYPES.RuntimeWarningand assignMACHINE_TYPES = MACHINE_TYPES_FALLBACKwhen the network fetch fails, times out, or yields no usable rows.test/test_illumina.pyto monkeypatchurllib.request.urlopen, reload theseqBackupLib.illuminamodule withimportlib.reload()via anillumina_modulefixture, and referenceillumina_module.MACHINE_TYPESand classes directly.Testing
black .which reformatted the modified files successfully.pytest testand all tests passed:23 passed.Codex Task