Skip to content

Conversation

@ma-gk
Copy link
Contributor

@ma-gk ma-gk commented Jan 28, 2026

Description

This pull request introduces a new archive pipeline task that supports packing and unpacking files in ZIP and TAR formats. It adds core implementation, supporting code, documentation, and test pipelines to enable efficient data packaging and extraction within pipelines.

Key changes:

Archive Task Implementation

  • Added the archive task, supporting both pack (archive creation) and unpack (archive extraction) actions, with configurable format (zip or tar) and file naming. (internal/pkg/pipeline/task/archive/archive.go)
  • Implemented ZIP and TAR archiver logic for reading from and writing to archives, handling regular files and file metadata. (internal/pkg/pipeline/task/archive/zip.go, internal/pkg/pipeline/task/archive/tar.go) [1] [2]

Pipeline Integration

  • Registered the new archive task in the supported tasks map, enabling its use in YAML pipeline definitions. (internal/pkg/pipeline/tasks.go) [1] [2]
  • Added comprehensive documentation describing usage, configuration, formats, error handling, and security considerations for the archive task. (internal/pkg/pipeline/task/archive/README.md)

Testing and Examples

  • Added sample pipelines and data to test ZIP and TAR packing/unpacking, including real-world scenarios and a sample file (birds.txt). (test/pipelines/zip_pack_test.yaml, test/pipelines/zip_unpack_test.yaml, test/pipelines/tar_unpack_multifile_test.yaml, test/pipelines/birds.txt) [1] [2] [3] [4]

Other

  • Updated the Snappy compression writer to use a buffered writer for improved performance. (internal/pkg/pipeline/task/compress/formats.go)

These changes provide robust archive handling within pipelines, allowing users to compress, extract, and manage files efficiently as part of their data workflows.

Types of changes

  • Docs change / refactoring / dependency upgrade
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • My code follows the code style of this project.
  • My change requires a change to the documentation and I have updated the documentation accordingly.
  • I have added tests to cover my changes.

@ma-gk ma-gk requested a review from a team as a code owner January 28, 2026 07:38
Copilot AI review requested due to automatic review settings January 28, 2026 07:38
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request aims to add support for ZIP compression format to the pipeline's compression task. However, the implementation uses the compress/flate package (raw DEFLATE algorithm) rather than the archive/zip package (ZIP file format), which are different formats. The PR also includes test files and test data for validating the new compression format.

Changes:

  • Added "zip" format handler in formats.go using compress/flate for DEFLATE compression/decompression
  • Changed snappy writer from NewWriter to NewBufferedWriter (unrelated to ZIP support)
  • Added birds.txt test data file containing 124 bird names
  • Added zip_compress_test.yaml and zip_decompress_test.yaml test pipeline configurations

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 8 comments.

File Description
internal/pkg/pipeline/task/compress/formats.go Adds "zip" format handler using compress/flate package and modifies snappy writer implementation
test/pipelines/birds.txt Provides test data file with list of bird names for compression/decompression testing
test/pipelines/zip_compress_test.yaml Test pipeline configuration to compress birds.txt into compressed_birds.zip
test/pipelines/zip_decompress_test.yaml Test pipeline configuration to decompress compressed_birds.zip back to text

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

format: zip
action: decompress
- name: store_compressed_file
type: file
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is trailing whitespace after "file" on this line. This is inconsistent with the formatting on line 3. Please remove the trailing whitespace.

Suggested change
type: file
type: file

Copilot uses AI. Check for mistakes.
Evening Grosbeak
Eared Quetzal
Eurasian Wigeon
Eurpean Starling
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a spelling error: "Eurpean" should be "European".

Suggested change
Eurpean Starling
European Starling

Copilot uses AI. Check for mistakes.
Comment on lines 32 to 39
`zip`: {
NewReader: func(r io.Reader) (io.ReadCloser, error) {
return flate.NewReader(r), nil
},
NewWriter: func(w io.Writer) io.WriteCloser {
writer, _ := flate.NewWriter(w, flate.DefaultCompression)
return writer
}},
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation uses compress/flate (DEFLATE) instead of archive/zip (ZIP format). These are different compression formats:

  • DEFLATE is the raw compression algorithm
  • ZIP is a file archive format that uses DEFLATE but includes file metadata, directory structure, and file headers

The PR description mentions "ZIP file compression" but the implementation only provides raw DEFLATE compression/decompression. If the intent is to support ZIP files (with .zip extension), you should use the archive/zip package instead. If raw DEFLATE compression is the goal, the format name should be changed from "zip" to "deflate" to avoid confusion.

Note that archive/zip requires a different API pattern (working with zip.Writer and zip.Reader which handle multiple files), so the implementation would need adjustment to work with the streaming pattern used in this codebase.

Copilot uses AI. Check for mistakes.
return flate.NewReader(r), nil
},
NewWriter: func(w io.Writer) io.WriteCloser {
writer, _ := flate.NewWriter(w, flate.DefaultCompression)
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error returned by flate.NewWriter is being silently ignored with the blank identifier _. This error should be handled properly to maintain consistency with other format handlers and to catch potential issues. While flate.NewWriter only returns an error for invalid compression levels, proper error handling is important for code robustness.

Consider handling the error appropriately. If the error should be propagated, the NewWriter function signature would need to be modified to return an error, similar to how NewReader returns an error.

Suggested change
writer, _ := flate.NewWriter(w, flate.DefaultCompression)
writer, err := flate.NewWriter(w, flate.DefaultCompression)
if err != nil {
panic("compress: failed to create flate writer with default compression: " + err.Error())
}

Copilot uses AI. Check for mistakes.
},
NewWriter: func(w io.Writer) io.WriteCloser {
return snappy.NewWriter(w)
return snappy.NewBufferedWriter(w)
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change from snappy.NewWriter(w) to snappy.NewBufferedWriter(w) appears to be unrelated to adding ZIP support. While NewBufferedWriter may provide better performance by buffering writes, this change should either:

  1. Be mentioned in the PR description as an additional improvement
  2. Be separated into its own PR
  3. Be reverted if it was changed unintentionally

Please clarify if this change is intentional and document the reason for it.

Suggested change
return snappy.NewBufferedWriter(w)
return snappy.NewWriter(w)

Copilot uses AI. Check for mistakes.
tasks:
- name: pull_names_from_s3
type: file
path: test/pipelines/compressed_birds.zip
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test attempts to decompress compressed_birds.zip which doesn't exist in the repository. The test file zip_compress_test.yaml generates this file, but the decompression test assumes it already exists at test time.

This creates a dependency where zip_compress_test.yaml must be executed before zip_decompress_test.yaml. Consider:

  1. Adding a comment similar to the one in decompress_test.yaml that states "please execute zip_compress_test.yaml before zip_decompress_test.yaml"
  2. Or including a pre-compressed compressed_birds.zip file in the repository to make the test self-contained

Copilot uses AI. Check for mistakes.
Comment on lines 5 to 9
- name: compress_file
type: compress
format: zip
action: decompress
- name: store_compressed_file
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The task is named "compress_file" but it's performing decompression (action: decompress). This is inconsistent and misleading. Consider renaming it to "decompress_file" to accurately reflect what the task does.

Similarly, the final task is named "store_compressed_file" but it's storing the decompressed result. Consider renaming it to "store_decompressed_file" or "store_uncompressed_file" for clarity.

Suggested change
- name: compress_file
type: compress
format: zip
action: decompress
- name: store_compressed_file
- name: decompress_file
type: compress
format: zip
action: decompress
- name: store_decompressed_file

Copilot uses AI. Check for mistakes.
type: compress
format: zip
- name: store_compressed_file
type: file
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is trailing whitespace after "file" on this line. This is inconsistent with the formatting on line 3. Please remove the trailing whitespace.

Suggested change
type: file
type: file

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 14 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +59 to +63
w, _ := zipWriter.Create(z.FileName)
w.Write(b)

zipWriter.Close()

Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple error handling issues: (1) The error from zipWriter.Create is being silently ignored with the blank identifier. (2) The error from w.Write is being ignored. (3) The error from zipWriter.Close is being ignored. All of these errors should be checked and handled appropriately, similar to how the compress task handles errors.

Suggested change
w, _ := zipWriter.Create(z.FileName)
w.Write(b)
zipWriter.Close()
w, err := zipWriter.Create(z.FileName)
if err != nil {
log.Fatal(err)
}
if _, err := w.Write(b); err != nil {
log.Fatal(err)
}
if err := zipWriter.Close(); err != nil {
log.Fatal(err)
}

Copilot uses AI. Check for mistakes.
}

if obj.Action != actionCompress && obj.Action != actionDecompress {
return fmt.Errorf("invalid action: %s (must be 'compress' or 'decompress')", obj.Action)
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message mentions 'compress' or 'decompress' but the actual valid action values are 'pack' and 'unpack' as defined in the constants at lines 13-14. This inconsistency between the error message and the actual values will confuse users.

Suggested change
return fmt.Errorf("invalid action: %s (must be 'compress' or 'decompress')", obj.Action)
return fmt.Errorf("invalid action: %s (must be 'pack' or 'unpack')", obj.Action)

Copilot uses AI. Check for mistakes.
log.Fatal("file name is required to create tar archive")
}

tw := tar.NewWriter(bytes.NewBuffer(b))
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tar.NewWriter is incorrectly initialized with bytes.NewBuffer(b), which creates a new buffer containing a copy of the input data. This should be an empty buffer like bytes.NewBuffer(nil) or new(bytes.Buffer). The current implementation would prepend the raw input data before the tar archive data, resulting in a corrupted output.

Copilot uses AI. Check for mistakes.
Comment on lines 54 to 62
_, err := tw.Write(b)
if err != nil {
log.Fatal(err)
}

err = tw.Close()
if err != nil {
log.Fatal(err)
}
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using log.Fatal will terminate the entire program. These errors should be returned instead for proper pipeline error handling.

Copilot uses AI. Check for mistakes.
log.Fatal(err)
}

t.SendData(t.Record.Context, b, t.OutputChan)
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After writing to the tar archive, this sends the original input data (b) instead of the tar archive buffer. This should send the tar writer's buffer contents (e.g., tarBuf.Bytes()) instead. The current implementation would send the uncompressed data rather than the tar archive.

Copilot uses AI. Check for mistakes.
Comment on lines +22 to +25
type archiver interface {
Read(b []byte)
Write(b []byte)
}
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The archiver interface methods Read and Write don't return errors, which prevents proper error handling. These methods should return error values so that errors can be propagated back through the Run method to the caller. Compare with the compress task where compress/decompress methods return errors.

Copilot uses AI. Check for mistakes.
Comment on lines 103 to 108
switch c.Action {
case actionCompress:
archiv.Write(r.Data)
case actionDecompress:
archiv.Read(r.Data)
}
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Read and Write calls don't capture or handle any errors because the archiver interface methods don't return errors. If an error occurs in Read or Write, it will either terminate the program via log.Fatal or be silently ignored, but cannot be properly handled by the Run method. This needs to be addressed by updating the archiver interface to return errors.

Copilot uses AI. Check for mistakes.
Comment on lines +32 to +33
if _, err := io.ReadFull(r, buf); err != nil && err != io.EOF {
log.Fatal(err)
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using log.Fatal will terminate the entire program. This should return an error instead so the pipeline can handle it according to the FailOnError configuration. The Read and Write methods should return errors.

Copilot uses AI. Check for mistakes.
Comment on lines +43 to +46
if t.FileName == "" {
log.Fatal("file name is required to create tar archive")
}

Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error check should not use log.Fatal as it terminates the entire program. The validation should occur in UnmarshalYAML during configuration parsing, not during runtime execution. This check is redundant since validation already happens in archive.go at line 55-58.

Suggested change
if t.FileName == "" {
log.Fatal("file name is required to create tar archive")
}

Copilot uses AI. Check for mistakes.
"gopkg.in/yaml.v3"

"github.com/patterninc/caterpillar/internal/pkg/pipeline/task"
"github.com/patterninc/caterpillar/internal/pkg/pipeline/task/archive"
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description states "Support for ZIP file compression and decompression" and mentions only ZIP/DEFLATE format, but the code actually implements support for both ZIP and TAR archive formats. The description should be updated to accurately reflect that both archive formats are being added, or the TAR implementation should be removed if it's out of scope for this PR.

Copilot uses AI. Check for mistakes.
@ma-gk ma-gk changed the title Support for ZIP file compression and decompression. Support for Archives(zip,tar) Jan 30, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 12 changed files in this pull request and generated 16 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 20 to 34
func (t *tarArchive) Read(b []byte) {
r := tar.NewReader(bytes.NewReader(b))

for {
header, err := r.Next()
if err != nil || err != io.EOF {
break
}

// check the file type is regular file
if header.Typeflag == tar.TypeReg {
buf := make([]byte, header.Size)
if _, err := io.ReadFull(r, buf); err != nil && err != io.EOF {
log.Fatal(err)
}
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The archive task has no protection against ZIP bomb attacks or excessively large archives. When unpacking, the code allocates memory based on the file size reported in the archive header without any validation. A malicious archive could claim extremely large file sizes, causing out-of-memory errors or system crashes. Consider adding limits on individual file sizes and total archive size, or at least document the risk more explicitly with recommended mitigations.

Suggested change
func (t *tarArchive) Read(b []byte) {
r := tar.NewReader(bytes.NewReader(b))
for {
header, err := r.Next()
if err != nil || err != io.EOF {
break
}
// check the file type is regular file
if header.Typeflag == tar.TypeReg {
buf := make([]byte, header.Size)
if _, err := io.ReadFull(r, buf); err != nil && err != io.EOF {
log.Fatal(err)
}
const (
// maxTarEntrySize limits the size of a single file extracted from a tar archive.
// This helps prevent excessive memory allocation from a single malicious entry.
maxTarEntrySize int64 = 100 * 1024 * 1024 // 100 MiB
// maxTarTotalSize limits the total size of all files extracted from a tar archive.
// This helps mitigate "zip bomb" style attacks using many smaller entries.
maxTarTotalSize int64 = 1 * 1024 * 1024 * 1024 // 1 GiB
// maxAllocSize is the largest value that can safely be converted to int for allocations.
maxAllocSize int64 = int64(^uint(0) >> 1)
)
func (t *tarArchive) Read(b []byte) {
r := tar.NewReader(bytes.NewReader(b))
var totalSize int64
for {
header, err := r.Next()
if err == io.EOF {
break
}
if err != nil {
log.Fatal(err)
}
// check the file type is regular file
if header.Typeflag == tar.TypeReg {
// Validate header size to prevent excessive memory allocation.
if header.Size < 0 {
log.Fatal("tar entry has negative size")
}
if header.Size > maxTarEntrySize {
log.Fatalf("tar entry too large: %d bytes (limit %d)", header.Size, maxTarEntrySize)
}
if totalSize+header.Size > maxTarTotalSize {
log.Fatalf("tar archive too large: %d bytes processed/extracted would exceed limit %d", totalSize+header.Size, maxTarTotalSize)
}
if header.Size > maxAllocSize {
log.Fatalf("tar entry size %d exceeds maximum allocatable size", header.Size)
}
buf := make([]byte, header.Size)
if _, err := io.ReadFull(r, buf); err != nil {
log.Fatal(err)
}
totalSize += header.Size

Copilot uses AI. Check for mistakes.
}

if _, err := tw.Write(b); err != nil {
log.Fatal(err)
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using log.Fatal in library/task code will abruptly terminate the entire application, which is inappropriate for a pipeline task. Errors should be propagated back through the Run method's return value to allow the calling code to handle them gracefully. The archiver interface methods should return errors, and the Run method should handle those errors appropriately.

Copilot uses AI. Check for mistakes.

_, err = rc.Read(buf)
if err != nil && !errors.Is(err, io.EOF) {
log.Fatal(err)
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using log.Fatal in library/task code will abruptly terminate the entire application, which is inappropriate for a pipeline task. Errors should be propagated back through the Run method's return value to allow the calling code to handle them gracefully. The archiver interface methods should return errors, and the Run method should handle those errors appropriately.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +111
package archive

import (
"fmt"

"github.com/patterninc/caterpillar/internal/pkg/pipeline/record"
"github.com/patterninc/caterpillar/internal/pkg/pipeline/task"
)

type actionType string

const (
actionPack actionType = `pack`
actionUnpack actionType = `unpack`
)

const (
defaultFormat = `zip`
defaultAction = `pack`
)

type archiver interface {
Read(b []byte)
Write(b []byte)
}

type core struct {
task.Base `yaml:",inline" json:",inline"`
Format string `yaml:"format,omitempty" json:"format,omitempty"`
Action actionType `yaml:"action,omitempty" json:"action,omitempty"`
FileName string `yaml:"file_name,omitempty" json:"file_name,omitempty"`
}

func New() (task.Task, error) {
return &core{
Format: defaultFormat,
Action: defaultAction,
}, nil
}

func (c *core) UnmarshalYAML(unmarshal func(interface{}) error) error {
type raw core
obj := raw{
Format: defaultFormat,
Action: defaultAction,
}
if err := unmarshal(&obj); err != nil {
return err
}

if obj.Action != actionPack && obj.Action != actionUnpack {
return fmt.Errorf("invalid action: %s (must be 'pack' or 'unpack')", obj.Action)
}

if obj.Action == actionPack {
if obj.FileName == "" {
return fmt.Errorf("file_name must be specified when action is 'pack'")
}
}

*c = core(obj)

return nil
}

func (c *core) Run(input <-chan *record.Record, output chan<- *record.Record) (err error) {

if input == nil {
return task.ErrNilInput
}

for {
r, ok := c.GetRecord(input)
if !ok {
break
}

if len(r.Data) == 0 {
continue
}

var archiv archiver

switch c.Format {
case "tar":
archiv = &tarArchive{
Base: &c.Base,
FileName: c.FileName,
Record: r,
OutputChan: output,
}
case "zip":
archiv = &zipArchive{
Base: &c.Base,
FileName: c.FileName,
Record: r,
OutputChan: output,
}
default:
return fmt.Errorf("unsupported format: %s", c.Format)
}

switch c.Action {
case actionPack:
archiv.Write(r.Data)
case actionUnpack:
archiv.Read(r.Data)
}
}
return nil
}
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new archive task implementation lacks unit tests. Given that the repository has comprehensive test coverage for other components (e.g., internal/pkg/pipeline/dag_test.go), unit tests should be added to verify the pack and unpack operations for both ZIP and TAR formats, error handling paths, and edge cases such as empty archives, corrupted archives, and files with various sizes.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +124
Albatross
Acorn Woodpecker
American Kestrel
Anna's Hummingbird
Bald Eagle
Baltimore Oriole
Barn Swallow
Belted Kingfisher
Bicolored Antbird
Black Capped Chickadee
Black Skimmer
Blue Jay
Bluebird
Bobolink
Bohemian Waxwing
Brown Creeper
Brown Pelican
Burrowing Owl
California Condor
California Quail
Canada Goose
Cardinal
Caspian Tern
Cedar Waxwing
Chestnut Sided Warbler
Chimney Swift
Chipping Sparrow
Clark's Nutcracker
Clay Colored Sparrow
Cliff Swallow
Columbiformes
Common Eider
Common Goldeneye
Common Grackle
Common Loon
Common Merganser
Common Raven
Common Tern
Common Yellowthroat
Coopers Hawk
Cory's Shearwater
Crested Flycatcher
Curve Billed Thrasher
Dark Eyed Junco
Dickcissel
Dovekie
Downy Woodpecker
Drab Seedeater
Dunnock
Eastern Bluebird
Eastern Meadowlark
Eastern Phoebe
Eastern Screech Owl
Eastern Towhee
Eastern Wood Pewee
Eared Grebe
Egyptian Plover
Elanus leucurus
Evening Grosbeak
Eared Quetzal
Eurasian Wigeon
European Starling
Fabulous Flamingo
Ferruginous Hawk
Fiscal Flycatcher
Flammulated Owl
Flatbill
Flesh Footed Shearwater
Florida Jay
Fringilla coelebs
Fulmar
Gadwall
Gambel's Quail
Gannet
Garden Warbler
Gnatcatcher
Godwit
Golden Eagle
Golden Winged Warbler
Goldeneye
Goldfinch
Goosander
Goshawk
Grace's Warbler
Grasshopper Sparrow
Gray Catbird
Great Black Backed Gull
Great Blue Heron
Great Crested Flycatcher
Great Horned Owl
Great Kiskadee
Great Spotted Woodpecker
Great Tit
Grebe
Greenbul
Green Heron
Green Tailed Towhee
Green Winged Teal
Greenlet
Grey Kingbird
Grey Owl
Grosbeaks
Grouse
Gull
Hairy Woodpecker
Hammond's Flycatcher
Harris Hawk
Harris Sparrow
Hawaiian Creeper
Hawaiian Goose
Hawfinch
Heathland Francolin
Herring Gull
Hoary Puffleg
Hooded Merganser
Hooded Oriole
Hooded Warbler
Hoopoe
Horned Auk
Horned Grebe
Horned Lark
House Finch
House Sparrow
House Wren No newline at end of file
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file appears to be a duplicate of test/pipelines/birds.txt and is located at the repository root, which is likely unintentional. Test data files should be placed in the test/pipelines directory along with the related test YAML files. This file should either be removed or moved to the appropriate test directory.

Suggested change
Albatross
Acorn Woodpecker
American Kestrel
Anna's Hummingbird
Bald Eagle
Baltimore Oriole
Barn Swallow
Belted Kingfisher
Bicolored Antbird
Black Capped Chickadee
Black Skimmer
Blue Jay
Bluebird
Bobolink
Bohemian Waxwing
Brown Creeper
Brown Pelican
Burrowing Owl
California Condor
California Quail
Canada Goose
Cardinal
Caspian Tern
Cedar Waxwing
Chestnut Sided Warbler
Chimney Swift
Chipping Sparrow
Clark's Nutcracker
Clay Colored Sparrow
Cliff Swallow
Columbiformes
Common Eider
Common Goldeneye
Common Grackle
Common Loon
Common Merganser
Common Raven
Common Tern
Common Yellowthroat
Coopers Hawk
Cory's Shearwater
Crested Flycatcher
Curve Billed Thrasher
Dark Eyed Junco
Dickcissel
Dovekie
Downy Woodpecker
Drab Seedeater
Dunnock
Eastern Bluebird
Eastern Meadowlark
Eastern Phoebe
Eastern Screech Owl
Eastern Towhee
Eastern Wood Pewee
Eared Grebe
Egyptian Plover
Elanus leucurus
Evening Grosbeak
Eared Quetzal
Eurasian Wigeon
European Starling
Fabulous Flamingo
Ferruginous Hawk
Fiscal Flycatcher
Flammulated Owl
Flatbill
Flesh Footed Shearwater
Florida Jay
Fringilla coelebs
Fulmar
Gadwall
Gambel's Quail
Gannet
Garden Warbler
Gnatcatcher
Godwit
Golden Eagle
Golden Winged Warbler
Goldeneye
Goldfinch
Goosander
Goshawk
Grace's Warbler
Grasshopper Sparrow
Gray Catbird
Great Black Backed Gull
Great Blue Heron
Great Crested Flycatcher
Great Horned Owl
Great Kiskadee
Great Spotted Woodpecker
Great Tit
Grebe
Greenbul
Green Heron
Green Tailed Towhee
Green Winged Teal
Greenlet
Grey Kingbird
Grey Owl
Grosbeaks
Grouse
Gull
Hairy Woodpecker
Hammond's Flycatcher
Harris Hawk
Harris Sparrow
Hawaiian Creeper
Hawaiian Goose
Hawfinch
Heathland Francolin
Herring Gull
Hoary Puffleg
Hooded Merganser
Hooded Oriole
Hooded Warbler
Hoopoe
Horned Auk
Horned Grebe
Horned Lark
House Finch
House Sparrow
House Wren
# This file previously contained duplicate bird test data.
# The authoritative test data file now lives at: test/pipelines/birds.txt
# This stub is kept only to avoid reintroducing the duplicate by accident.

Copilot uses AI. Check for mistakes.
Comment on lines +59 to +63
w, _ := zipWriter.Create(z.FileName)
w.Write(b)

zipWriter.Close()

Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error returned from w.Write is silently ignored. This could result in incomplete data being written to the zip archive without any indication of failure. The error should be checked and handled appropriately.

Suggested change
w, _ := zipWriter.Create(z.FileName)
w.Write(b)
zipWriter.Close()
w, err := zipWriter.Create(z.FileName)
if err != nil {
log.Fatal(err)
}
_, err = w.Write(b)
if err != nil {
log.Fatal(err)
}
if err := zipWriter.Close(); err != nil {
log.Fatal(err)
}

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,239 @@
# Archive Task

The `archive` task pack and unpack file data in various archive formats (TAR, ZIP), enabling efficient data packaging and extraction within pipelines.
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar error: "pack and unpack" should be "packs and unpacks" to agree with the singular subject "task".

Suggested change
The `archive` task pack and unpack file data in various archive formats (TAR, ZIP), enabling efficient data packaging and extraction within pipelines.
The `archive` task packs and unpacks file data in various archive formats (TAR, ZIP), enabling efficient data packaging and extraction within pipelines.

Copilot uses AI. Check for mistakes.

The archive task handles two primary operations:
- **Pack**: Creates archives from input data (e.g., create a ZIP or TAR file)
- **Unpack**: Extract archives to retrieve individual files
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar error: "Extract archives" should be "Extracts archives" to match the verb form used in the "Pack" bullet point.

Suggested change
- **Unpack**: Extract archives to retrieve individual files
- **Unpack**: Extracts archives to retrieve individual files

Copilot uses AI. Check for mistakes.
Comment on lines 25 to 27
if err != nil || err != io.EOF {
break
}
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition if err != nil || err != io.EOF is logically incorrect and will always be true when err is not nil, causing the loop to break immediately on any error including io.EOF (which is the expected end-of-file condition). This means the tar unpacking will fail to extract any files. The condition should be if err == io.EOF to break on EOF, or if err != nil && err != io.EOF to break on actual errors while continuing on EOF.

Suggested change
if err != nil || err != io.EOF {
break
}
if err == io.EOF {
break
}
if err != nil {
log.Fatal(err)
}

Copilot uses AI. Check for mistakes.
}

if err := tw.Close(); err != nil {
log.Fatal(err)
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using log.Fatal in library/task code will abruptly terminate the entire application, which is inappropriate for a pipeline task. Errors should be propagated back through the Run method's return value to allow the calling code to handle them gracefully. The archiver interface methods should return errors, and the Run method should handle those errors appropriately.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants