-
Notifications
You must be signed in to change notification settings - Fork 2
Support for Archives(zip,tar) #34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request aims to add support for ZIP compression format to the pipeline's compression task. However, the implementation uses the compress/flate package (raw DEFLATE algorithm) rather than the archive/zip package (ZIP file format), which are different formats. The PR also includes test files and test data for validating the new compression format.
Changes:
- Added "zip" format handler in
formats.gousingcompress/flatefor DEFLATE compression/decompression - Changed snappy writer from
NewWritertoNewBufferedWriter(unrelated to ZIP support) - Added
birds.txttest data file containing 124 bird names - Added
zip_compress_test.yamlandzip_decompress_test.yamltest pipeline configurations
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
| internal/pkg/pipeline/task/compress/formats.go | Adds "zip" format handler using compress/flate package and modifies snappy writer implementation |
| test/pipelines/birds.txt | Provides test data file with list of bird names for compression/decompression testing |
| test/pipelines/zip_compress_test.yaml | Test pipeline configuration to compress birds.txt into compressed_birds.zip |
| test/pipelines/zip_decompress_test.yaml | Test pipeline configuration to decompress compressed_birds.zip back to text |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
test/pipelines/zip_unpack_test.yaml
Outdated
| format: zip | ||
| action: decompress | ||
| - name: store_compressed_file | ||
| type: file |
Copilot
AI
Jan 28, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is trailing whitespace after "file" on this line. This is inconsistent with the formatting on line 3. Please remove the trailing whitespace.
| type: file | |
| type: file |
| Evening Grosbeak | ||
| Eared Quetzal | ||
| Eurasian Wigeon | ||
| Eurpean Starling |
Copilot
AI
Jan 28, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a spelling error: "Eurpean" should be "European".
| Eurpean Starling | |
| European Starling |
| `zip`: { | ||
| NewReader: func(r io.Reader) (io.ReadCloser, error) { | ||
| return flate.NewReader(r), nil | ||
| }, | ||
| NewWriter: func(w io.Writer) io.WriteCloser { | ||
| writer, _ := flate.NewWriter(w, flate.DefaultCompression) | ||
| return writer | ||
| }}, |
Copilot
AI
Jan 28, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation uses compress/flate (DEFLATE) instead of archive/zip (ZIP format). These are different compression formats:
- DEFLATE is the raw compression algorithm
- ZIP is a file archive format that uses DEFLATE but includes file metadata, directory structure, and file headers
The PR description mentions "ZIP file compression" but the implementation only provides raw DEFLATE compression/decompression. If the intent is to support ZIP files (with .zip extension), you should use the archive/zip package instead. If raw DEFLATE compression is the goal, the format name should be changed from "zip" to "deflate" to avoid confusion.
Note that archive/zip requires a different API pattern (working with zip.Writer and zip.Reader which handle multiple files), so the implementation would need adjustment to work with the streaming pattern used in this codebase.
| return flate.NewReader(r), nil | ||
| }, | ||
| NewWriter: func(w io.Writer) io.WriteCloser { | ||
| writer, _ := flate.NewWriter(w, flate.DefaultCompression) |
Copilot
AI
Jan 28, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error returned by flate.NewWriter is being silently ignored with the blank identifier _. This error should be handled properly to maintain consistency with other format handlers and to catch potential issues. While flate.NewWriter only returns an error for invalid compression levels, proper error handling is important for code robustness.
Consider handling the error appropriately. If the error should be propagated, the NewWriter function signature would need to be modified to return an error, similar to how NewReader returns an error.
| writer, _ := flate.NewWriter(w, flate.DefaultCompression) | |
| writer, err := flate.NewWriter(w, flate.DefaultCompression) | |
| if err != nil { | |
| panic("compress: failed to create flate writer with default compression: " + err.Error()) | |
| } |
| }, | ||
| NewWriter: func(w io.Writer) io.WriteCloser { | ||
| return snappy.NewWriter(w) | ||
| return snappy.NewBufferedWriter(w) |
Copilot
AI
Jan 28, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change from snappy.NewWriter(w) to snappy.NewBufferedWriter(w) appears to be unrelated to adding ZIP support. While NewBufferedWriter may provide better performance by buffering writes, this change should either:
- Be mentioned in the PR description as an additional improvement
- Be separated into its own PR
- Be reverted if it was changed unintentionally
Please clarify if this change is intentional and document the reason for it.
| return snappy.NewBufferedWriter(w) | |
| return snappy.NewWriter(w) |
| tasks: | ||
| - name: pull_names_from_s3 | ||
| type: file | ||
| path: test/pipelines/compressed_birds.zip |
Copilot
AI
Jan 28, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test attempts to decompress compressed_birds.zip which doesn't exist in the repository. The test file zip_compress_test.yaml generates this file, but the decompression test assumes it already exists at test time.
This creates a dependency where zip_compress_test.yaml must be executed before zip_decompress_test.yaml. Consider:
- Adding a comment similar to the one in
decompress_test.yamlthat states "please execute zip_compress_test.yaml before zip_decompress_test.yaml" - Or including a pre-compressed
compressed_birds.zipfile in the repository to make the test self-contained
| - name: compress_file | ||
| type: compress | ||
| format: zip | ||
| action: decompress | ||
| - name: store_compressed_file |
Copilot
AI
Jan 28, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The task is named "compress_file" but it's performing decompression (action: decompress). This is inconsistent and misleading. Consider renaming it to "decompress_file" to accurately reflect what the task does.
Similarly, the final task is named "store_compressed_file" but it's storing the decompressed result. Consider renaming it to "store_decompressed_file" or "store_uncompressed_file" for clarity.
| - name: compress_file | |
| type: compress | |
| format: zip | |
| action: decompress | |
| - name: store_compressed_file | |
| - name: decompress_file | |
| type: compress | |
| format: zip | |
| action: decompress | |
| - name: store_decompressed_file |
test/pipelines/zip_pack_test.yaml
Outdated
| type: compress | ||
| format: zip | ||
| - name: store_compressed_file | ||
| type: file |
Copilot
AI
Jan 28, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is trailing whitespace after "file" on this line. This is inconsistent with the formatting on line 3. Please remove the trailing whitespace.
| type: file | |
| type: file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 14 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| w, _ := zipWriter.Create(z.FileName) | ||
| w.Write(b) | ||
|
|
||
| zipWriter.Close() | ||
|
|
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiple error handling issues: (1) The error from zipWriter.Create is being silently ignored with the blank identifier. (2) The error from w.Write is being ignored. (3) The error from zipWriter.Close is being ignored. All of these errors should be checked and handled appropriately, similar to how the compress task handles errors.
| w, _ := zipWriter.Create(z.FileName) | |
| w.Write(b) | |
| zipWriter.Close() | |
| w, err := zipWriter.Create(z.FileName) | |
| if err != nil { | |
| log.Fatal(err) | |
| } | |
| if _, err := w.Write(b); err != nil { | |
| log.Fatal(err) | |
| } | |
| if err := zipWriter.Close(); err != nil { | |
| log.Fatal(err) | |
| } |
| } | ||
|
|
||
| if obj.Action != actionCompress && obj.Action != actionDecompress { | ||
| return fmt.Errorf("invalid action: %s (must be 'compress' or 'decompress')", obj.Action) |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message mentions 'compress' or 'decompress' but the actual valid action values are 'pack' and 'unpack' as defined in the constants at lines 13-14. This inconsistency between the error message and the actual values will confuse users.
| return fmt.Errorf("invalid action: %s (must be 'compress' or 'decompress')", obj.Action) | |
| return fmt.Errorf("invalid action: %s (must be 'pack' or 'unpack')", obj.Action) |
| log.Fatal("file name is required to create tar archive") | ||
| } | ||
|
|
||
| tw := tar.NewWriter(bytes.NewBuffer(b)) |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tar.NewWriter is incorrectly initialized with bytes.NewBuffer(b), which creates a new buffer containing a copy of the input data. This should be an empty buffer like bytes.NewBuffer(nil) or new(bytes.Buffer). The current implementation would prepend the raw input data before the tar archive data, resulting in a corrupted output.
| _, err := tw.Write(b) | ||
| if err != nil { | ||
| log.Fatal(err) | ||
| } | ||
|
|
||
| err = tw.Close() | ||
| if err != nil { | ||
| log.Fatal(err) | ||
| } |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using log.Fatal will terminate the entire program. These errors should be returned instead for proper pipeline error handling.
| log.Fatal(err) | ||
| } | ||
|
|
||
| t.SendData(t.Record.Context, b, t.OutputChan) |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After writing to the tar archive, this sends the original input data (b) instead of the tar archive buffer. This should send the tar writer's buffer contents (e.g., tarBuf.Bytes()) instead. The current implementation would send the uncompressed data rather than the tar archive.
| type archiver interface { | ||
| Read(b []byte) | ||
| Write(b []byte) | ||
| } |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The archiver interface methods Read and Write don't return errors, which prevents proper error handling. These methods should return error values so that errors can be propagated back through the Run method to the caller. Compare with the compress task where compress/decompress methods return errors.
| switch c.Action { | ||
| case actionCompress: | ||
| archiv.Write(r.Data) | ||
| case actionDecompress: | ||
| archiv.Read(r.Data) | ||
| } |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Read and Write calls don't capture or handle any errors because the archiver interface methods don't return errors. If an error occurs in Read or Write, it will either terminate the program via log.Fatal or be silently ignored, but cannot be properly handled by the Run method. This needs to be addressed by updating the archiver interface to return errors.
| if _, err := io.ReadFull(r, buf); err != nil && err != io.EOF { | ||
| log.Fatal(err) |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using log.Fatal will terminate the entire program. This should return an error instead so the pipeline can handle it according to the FailOnError configuration. The Read and Write methods should return errors.
| if t.FileName == "" { | ||
| log.Fatal("file name is required to create tar archive") | ||
| } | ||
|
|
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error check should not use log.Fatal as it terminates the entire program. The validation should occur in UnmarshalYAML during configuration parsing, not during runtime execution. This check is redundant since validation already happens in archive.go at line 55-58.
| if t.FileName == "" { | |
| log.Fatal("file name is required to create tar archive") | |
| } |
| "gopkg.in/yaml.v3" | ||
|
|
||
| "github.com/patterninc/caterpillar/internal/pkg/pipeline/task" | ||
| "github.com/patterninc/caterpillar/internal/pkg/pipeline/task/archive" |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR description states "Support for ZIP file compression and decompression" and mentions only ZIP/DEFLATE format, but the code actually implements support for both ZIP and TAR archive formats. The description should be updated to accurately reflect that both archive formats are being added, or the TAR implementation should be removed if it's out of scope for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 11 out of 12 changed files in this pull request and generated 16 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func (t *tarArchive) Read(b []byte) { | ||
| r := tar.NewReader(bytes.NewReader(b)) | ||
|
|
||
| for { | ||
| header, err := r.Next() | ||
| if err != nil || err != io.EOF { | ||
| break | ||
| } | ||
|
|
||
| // check the file type is regular file | ||
| if header.Typeflag == tar.TypeReg { | ||
| buf := make([]byte, header.Size) | ||
| if _, err := io.ReadFull(r, buf); err != nil && err != io.EOF { | ||
| log.Fatal(err) | ||
| } |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The archive task has no protection against ZIP bomb attacks or excessively large archives. When unpacking, the code allocates memory based on the file size reported in the archive header without any validation. A malicious archive could claim extremely large file sizes, causing out-of-memory errors or system crashes. Consider adding limits on individual file sizes and total archive size, or at least document the risk more explicitly with recommended mitigations.
| func (t *tarArchive) Read(b []byte) { | |
| r := tar.NewReader(bytes.NewReader(b)) | |
| for { | |
| header, err := r.Next() | |
| if err != nil || err != io.EOF { | |
| break | |
| } | |
| // check the file type is regular file | |
| if header.Typeflag == tar.TypeReg { | |
| buf := make([]byte, header.Size) | |
| if _, err := io.ReadFull(r, buf); err != nil && err != io.EOF { | |
| log.Fatal(err) | |
| } | |
| const ( | |
| // maxTarEntrySize limits the size of a single file extracted from a tar archive. | |
| // This helps prevent excessive memory allocation from a single malicious entry. | |
| maxTarEntrySize int64 = 100 * 1024 * 1024 // 100 MiB | |
| // maxTarTotalSize limits the total size of all files extracted from a tar archive. | |
| // This helps mitigate "zip bomb" style attacks using many smaller entries. | |
| maxTarTotalSize int64 = 1 * 1024 * 1024 * 1024 // 1 GiB | |
| // maxAllocSize is the largest value that can safely be converted to int for allocations. | |
| maxAllocSize int64 = int64(^uint(0) >> 1) | |
| ) | |
| func (t *tarArchive) Read(b []byte) { | |
| r := tar.NewReader(bytes.NewReader(b)) | |
| var totalSize int64 | |
| for { | |
| header, err := r.Next() | |
| if err == io.EOF { | |
| break | |
| } | |
| if err != nil { | |
| log.Fatal(err) | |
| } | |
| // check the file type is regular file | |
| if header.Typeflag == tar.TypeReg { | |
| // Validate header size to prevent excessive memory allocation. | |
| if header.Size < 0 { | |
| log.Fatal("tar entry has negative size") | |
| } | |
| if header.Size > maxTarEntrySize { | |
| log.Fatalf("tar entry too large: %d bytes (limit %d)", header.Size, maxTarEntrySize) | |
| } | |
| if totalSize+header.Size > maxTarTotalSize { | |
| log.Fatalf("tar archive too large: %d bytes processed/extracted would exceed limit %d", totalSize+header.Size, maxTarTotalSize) | |
| } | |
| if header.Size > maxAllocSize { | |
| log.Fatalf("tar entry size %d exceeds maximum allocatable size", header.Size) | |
| } | |
| buf := make([]byte, header.Size) | |
| if _, err := io.ReadFull(r, buf); err != nil { | |
| log.Fatal(err) | |
| } | |
| totalSize += header.Size |
| } | ||
|
|
||
| if _, err := tw.Write(b); err != nil { | ||
| log.Fatal(err) |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using log.Fatal in library/task code will abruptly terminate the entire application, which is inappropriate for a pipeline task. Errors should be propagated back through the Run method's return value to allow the calling code to handle them gracefully. The archiver interface methods should return errors, and the Run method should handle those errors appropriately.
|
|
||
| _, err = rc.Read(buf) | ||
| if err != nil && !errors.Is(err, io.EOF) { | ||
| log.Fatal(err) |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using log.Fatal in library/task code will abruptly terminate the entire application, which is inappropriate for a pipeline task. Errors should be propagated back through the Run method's return value to allow the calling code to handle them gracefully. The archiver interface methods should return errors, and the Run method should handle those errors appropriately.
| package archive | ||
|
|
||
| import ( | ||
| "fmt" | ||
|
|
||
| "github.com/patterninc/caterpillar/internal/pkg/pipeline/record" | ||
| "github.com/patterninc/caterpillar/internal/pkg/pipeline/task" | ||
| ) | ||
|
|
||
| type actionType string | ||
|
|
||
| const ( | ||
| actionPack actionType = `pack` | ||
| actionUnpack actionType = `unpack` | ||
| ) | ||
|
|
||
| const ( | ||
| defaultFormat = `zip` | ||
| defaultAction = `pack` | ||
| ) | ||
|
|
||
| type archiver interface { | ||
| Read(b []byte) | ||
| Write(b []byte) | ||
| } | ||
|
|
||
| type core struct { | ||
| task.Base `yaml:",inline" json:",inline"` | ||
| Format string `yaml:"format,omitempty" json:"format,omitempty"` | ||
| Action actionType `yaml:"action,omitempty" json:"action,omitempty"` | ||
| FileName string `yaml:"file_name,omitempty" json:"file_name,omitempty"` | ||
| } | ||
|
|
||
| func New() (task.Task, error) { | ||
| return &core{ | ||
| Format: defaultFormat, | ||
| Action: defaultAction, | ||
| }, nil | ||
| } | ||
|
|
||
| func (c *core) UnmarshalYAML(unmarshal func(interface{}) error) error { | ||
| type raw core | ||
| obj := raw{ | ||
| Format: defaultFormat, | ||
| Action: defaultAction, | ||
| } | ||
| if err := unmarshal(&obj); err != nil { | ||
| return err | ||
| } | ||
|
|
||
| if obj.Action != actionPack && obj.Action != actionUnpack { | ||
| return fmt.Errorf("invalid action: %s (must be 'pack' or 'unpack')", obj.Action) | ||
| } | ||
|
|
||
| if obj.Action == actionPack { | ||
| if obj.FileName == "" { | ||
| return fmt.Errorf("file_name must be specified when action is 'pack'") | ||
| } | ||
| } | ||
|
|
||
| *c = core(obj) | ||
|
|
||
| return nil | ||
| } | ||
|
|
||
| func (c *core) Run(input <-chan *record.Record, output chan<- *record.Record) (err error) { | ||
|
|
||
| if input == nil { | ||
| return task.ErrNilInput | ||
| } | ||
|
|
||
| for { | ||
| r, ok := c.GetRecord(input) | ||
| if !ok { | ||
| break | ||
| } | ||
|
|
||
| if len(r.Data) == 0 { | ||
| continue | ||
| } | ||
|
|
||
| var archiv archiver | ||
|
|
||
| switch c.Format { | ||
| case "tar": | ||
| archiv = &tarArchive{ | ||
| Base: &c.Base, | ||
| FileName: c.FileName, | ||
| Record: r, | ||
| OutputChan: output, | ||
| } | ||
| case "zip": | ||
| archiv = &zipArchive{ | ||
| Base: &c.Base, | ||
| FileName: c.FileName, | ||
| Record: r, | ||
| OutputChan: output, | ||
| } | ||
| default: | ||
| return fmt.Errorf("unsupported format: %s", c.Format) | ||
| } | ||
|
|
||
| switch c.Action { | ||
| case actionPack: | ||
| archiv.Write(r.Data) | ||
| case actionUnpack: | ||
| archiv.Read(r.Data) | ||
| } | ||
| } | ||
| return nil | ||
| } |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new archive task implementation lacks unit tests. Given that the repository has comprehensive test coverage for other components (e.g., internal/pkg/pipeline/dag_test.go), unit tests should be added to verify the pack and unpack operations for both ZIP and TAR formats, error handling paths, and edge cases such as empty archives, corrupted archives, and files with various sizes.
| Albatross | ||
| Acorn Woodpecker | ||
| American Kestrel | ||
| Anna's Hummingbird | ||
| Bald Eagle | ||
| Baltimore Oriole | ||
| Barn Swallow | ||
| Belted Kingfisher | ||
| Bicolored Antbird | ||
| Black Capped Chickadee | ||
| Black Skimmer | ||
| Blue Jay | ||
| Bluebird | ||
| Bobolink | ||
| Bohemian Waxwing | ||
| Brown Creeper | ||
| Brown Pelican | ||
| Burrowing Owl | ||
| California Condor | ||
| California Quail | ||
| Canada Goose | ||
| Cardinal | ||
| Caspian Tern | ||
| Cedar Waxwing | ||
| Chestnut Sided Warbler | ||
| Chimney Swift | ||
| Chipping Sparrow | ||
| Clark's Nutcracker | ||
| Clay Colored Sparrow | ||
| Cliff Swallow | ||
| Columbiformes | ||
| Common Eider | ||
| Common Goldeneye | ||
| Common Grackle | ||
| Common Loon | ||
| Common Merganser | ||
| Common Raven | ||
| Common Tern | ||
| Common Yellowthroat | ||
| Coopers Hawk | ||
| Cory's Shearwater | ||
| Crested Flycatcher | ||
| Curve Billed Thrasher | ||
| Dark Eyed Junco | ||
| Dickcissel | ||
| Dovekie | ||
| Downy Woodpecker | ||
| Drab Seedeater | ||
| Dunnock | ||
| Eastern Bluebird | ||
| Eastern Meadowlark | ||
| Eastern Phoebe | ||
| Eastern Screech Owl | ||
| Eastern Towhee | ||
| Eastern Wood Pewee | ||
| Eared Grebe | ||
| Egyptian Plover | ||
| Elanus leucurus | ||
| Evening Grosbeak | ||
| Eared Quetzal | ||
| Eurasian Wigeon | ||
| European Starling | ||
| Fabulous Flamingo | ||
| Ferruginous Hawk | ||
| Fiscal Flycatcher | ||
| Flammulated Owl | ||
| Flatbill | ||
| Flesh Footed Shearwater | ||
| Florida Jay | ||
| Fringilla coelebs | ||
| Fulmar | ||
| Gadwall | ||
| Gambel's Quail | ||
| Gannet | ||
| Garden Warbler | ||
| Gnatcatcher | ||
| Godwit | ||
| Golden Eagle | ||
| Golden Winged Warbler | ||
| Goldeneye | ||
| Goldfinch | ||
| Goosander | ||
| Goshawk | ||
| Grace's Warbler | ||
| Grasshopper Sparrow | ||
| Gray Catbird | ||
| Great Black Backed Gull | ||
| Great Blue Heron | ||
| Great Crested Flycatcher | ||
| Great Horned Owl | ||
| Great Kiskadee | ||
| Great Spotted Woodpecker | ||
| Great Tit | ||
| Grebe | ||
| Greenbul | ||
| Green Heron | ||
| Green Tailed Towhee | ||
| Green Winged Teal | ||
| Greenlet | ||
| Grey Kingbird | ||
| Grey Owl | ||
| Grosbeaks | ||
| Grouse | ||
| Gull | ||
| Hairy Woodpecker | ||
| Hammond's Flycatcher | ||
| Harris Hawk | ||
| Harris Sparrow | ||
| Hawaiian Creeper | ||
| Hawaiian Goose | ||
| Hawfinch | ||
| Heathland Francolin | ||
| Herring Gull | ||
| Hoary Puffleg | ||
| Hooded Merganser | ||
| Hooded Oriole | ||
| Hooded Warbler | ||
| Hoopoe | ||
| Horned Auk | ||
| Horned Grebe | ||
| Horned Lark | ||
| House Finch | ||
| House Sparrow | ||
| House Wren No newline at end of file |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file appears to be a duplicate of test/pipelines/birds.txt and is located at the repository root, which is likely unintentional. Test data files should be placed in the test/pipelines directory along with the related test YAML files. This file should either be removed or moved to the appropriate test directory.
| Albatross | |
| Acorn Woodpecker | |
| American Kestrel | |
| Anna's Hummingbird | |
| Bald Eagle | |
| Baltimore Oriole | |
| Barn Swallow | |
| Belted Kingfisher | |
| Bicolored Antbird | |
| Black Capped Chickadee | |
| Black Skimmer | |
| Blue Jay | |
| Bluebird | |
| Bobolink | |
| Bohemian Waxwing | |
| Brown Creeper | |
| Brown Pelican | |
| Burrowing Owl | |
| California Condor | |
| California Quail | |
| Canada Goose | |
| Cardinal | |
| Caspian Tern | |
| Cedar Waxwing | |
| Chestnut Sided Warbler | |
| Chimney Swift | |
| Chipping Sparrow | |
| Clark's Nutcracker | |
| Clay Colored Sparrow | |
| Cliff Swallow | |
| Columbiformes | |
| Common Eider | |
| Common Goldeneye | |
| Common Grackle | |
| Common Loon | |
| Common Merganser | |
| Common Raven | |
| Common Tern | |
| Common Yellowthroat | |
| Coopers Hawk | |
| Cory's Shearwater | |
| Crested Flycatcher | |
| Curve Billed Thrasher | |
| Dark Eyed Junco | |
| Dickcissel | |
| Dovekie | |
| Downy Woodpecker | |
| Drab Seedeater | |
| Dunnock | |
| Eastern Bluebird | |
| Eastern Meadowlark | |
| Eastern Phoebe | |
| Eastern Screech Owl | |
| Eastern Towhee | |
| Eastern Wood Pewee | |
| Eared Grebe | |
| Egyptian Plover | |
| Elanus leucurus | |
| Evening Grosbeak | |
| Eared Quetzal | |
| Eurasian Wigeon | |
| European Starling | |
| Fabulous Flamingo | |
| Ferruginous Hawk | |
| Fiscal Flycatcher | |
| Flammulated Owl | |
| Flatbill | |
| Flesh Footed Shearwater | |
| Florida Jay | |
| Fringilla coelebs | |
| Fulmar | |
| Gadwall | |
| Gambel's Quail | |
| Gannet | |
| Garden Warbler | |
| Gnatcatcher | |
| Godwit | |
| Golden Eagle | |
| Golden Winged Warbler | |
| Goldeneye | |
| Goldfinch | |
| Goosander | |
| Goshawk | |
| Grace's Warbler | |
| Grasshopper Sparrow | |
| Gray Catbird | |
| Great Black Backed Gull | |
| Great Blue Heron | |
| Great Crested Flycatcher | |
| Great Horned Owl | |
| Great Kiskadee | |
| Great Spotted Woodpecker | |
| Great Tit | |
| Grebe | |
| Greenbul | |
| Green Heron | |
| Green Tailed Towhee | |
| Green Winged Teal | |
| Greenlet | |
| Grey Kingbird | |
| Grey Owl | |
| Grosbeaks | |
| Grouse | |
| Gull | |
| Hairy Woodpecker | |
| Hammond's Flycatcher | |
| Harris Hawk | |
| Harris Sparrow | |
| Hawaiian Creeper | |
| Hawaiian Goose | |
| Hawfinch | |
| Heathland Francolin | |
| Herring Gull | |
| Hoary Puffleg | |
| Hooded Merganser | |
| Hooded Oriole | |
| Hooded Warbler | |
| Hoopoe | |
| Horned Auk | |
| Horned Grebe | |
| Horned Lark | |
| House Finch | |
| House Sparrow | |
| House Wren | |
| # This file previously contained duplicate bird test data. | |
| # The authoritative test data file now lives at: test/pipelines/birds.txt | |
| # This stub is kept only to avoid reintroducing the duplicate by accident. |
| w, _ := zipWriter.Create(z.FileName) | ||
| w.Write(b) | ||
|
|
||
| zipWriter.Close() | ||
|
|
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error returned from w.Write is silently ignored. This could result in incomplete data being written to the zip archive without any indication of failure. The error should be checked and handled appropriately.
| w, _ := zipWriter.Create(z.FileName) | |
| w.Write(b) | |
| zipWriter.Close() | |
| w, err := zipWriter.Create(z.FileName) | |
| if err != nil { | |
| log.Fatal(err) | |
| } | |
| _, err = w.Write(b) | |
| if err != nil { | |
| log.Fatal(err) | |
| } | |
| if err := zipWriter.Close(); err != nil { | |
| log.Fatal(err) | |
| } |
| @@ -0,0 +1,239 @@ | |||
| # Archive Task | |||
|
|
|||
| The `archive` task pack and unpack file data in various archive formats (TAR, ZIP), enabling efficient data packaging and extraction within pipelines. | |||
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grammar error: "pack and unpack" should be "packs and unpacks" to agree with the singular subject "task".
| The `archive` task pack and unpack file data in various archive formats (TAR, ZIP), enabling efficient data packaging and extraction within pipelines. | |
| The `archive` task packs and unpacks file data in various archive formats (TAR, ZIP), enabling efficient data packaging and extraction within pipelines. |
|
|
||
| The archive task handles two primary operations: | ||
| - **Pack**: Creates archives from input data (e.g., create a ZIP or TAR file) | ||
| - **Unpack**: Extract archives to retrieve individual files |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grammar error: "Extract archives" should be "Extracts archives" to match the verb form used in the "Pack" bullet point.
| - **Unpack**: Extract archives to retrieve individual files | |
| - **Unpack**: Extracts archives to retrieve individual files |
| if err != nil || err != io.EOF { | ||
| break | ||
| } |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition if err != nil || err != io.EOF is logically incorrect and will always be true when err is not nil, causing the loop to break immediately on any error including io.EOF (which is the expected end-of-file condition). This means the tar unpacking will fail to extract any files. The condition should be if err == io.EOF to break on EOF, or if err != nil && err != io.EOF to break on actual errors while continuing on EOF.
| if err != nil || err != io.EOF { | |
| break | |
| } | |
| if err == io.EOF { | |
| break | |
| } | |
| if err != nil { | |
| log.Fatal(err) | |
| } |
| } | ||
|
|
||
| if err := tw.Close(); err != nil { | ||
| log.Fatal(err) |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using log.Fatal in library/task code will abruptly terminate the entire application, which is inappropriate for a pipeline task. Errors should be propagated back through the Run method's return value to allow the calling code to handle them gracefully. The archiver interface methods should return errors, and the Run method should handle those errors appropriately.
Description
This pull request introduces a new
archivepipeline task that supports packing and unpacking files in ZIP and TAR formats. It adds core implementation, supporting code, documentation, and test pipelines to enable efficient data packaging and extraction within pipelines.Key changes:
Archive Task Implementation
archivetask, supporting bothpack(archive creation) andunpack(archive extraction) actions, with configurable format (ziportar) and file naming. (internal/pkg/pipeline/task/archive/archive.go)internal/pkg/pipeline/task/archive/zip.go,internal/pkg/pipeline/task/archive/tar.go) [1] [2]Pipeline Integration
archivetask in the supported tasks map, enabling its use in YAML pipeline definitions. (internal/pkg/pipeline/tasks.go) [1] [2]archivetask. (internal/pkg/pipeline/task/archive/README.md)Testing and Examples
birds.txt). (test/pipelines/zip_pack_test.yaml,test/pipelines/zip_unpack_test.yaml,test/pipelines/tar_unpack_multifile_test.yaml,test/pipelines/birds.txt) [1] [2] [3] [4]Other
internal/pkg/pipeline/task/compress/formats.go)These changes provide robust archive handling within pipelines, allowing users to compress, extract, and manage files efficiently as part of their data workflows.
Types of changes
Checklist