Skip to content

Conversation

@shashbha14
Copy link

@shashbha14 shashbha14 commented Jan 14, 2026

Rationale for this change

Fixes crash when writing zero-length POSIXct vectors with empty tzone attributes to Parquet in R 4.5.2.

What changes are included in this PR?

  • Added length check for tzone attribute in InferArrowTypeFromVector (both INTSXP and REALSXP specializations)
  • Added regression test in test-issue-48832.R

Are these changes tested?

Yes, new test case reproduces the issue and verifies the fix.

Closes #48832

Add safety check for empty tzone attribute in InferArrowTypeFromVector
to prevent out-of-bounds access when handling zero-length POSIXct vectors.
…t/Ubuntu

Added Development Tool Requirements section to cpp/development.rst
documenting the required versions:
- clang-format 14.0.6+
- pre-commit 2.17.0+
- Ubuntu 22.04 LTS+
Copy link
Member

@jonkeane jonkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for submitting the change. The devtooling documentation should go in a separate PR IMO.

Would you mind moving the tests to where we test other examples of POSIXct? The approach to solve it looks solid, but CI will determine if we are good.

Comment on lines +18 to +35
test_that("zero-length POSIXct with empty tzone attribute handled safely", {
x <- as.POSIXct(character(0))
attr(x, "tzone") <- character(0)

# Should not crash or error
expect_error(type(x), NA)

# Should default to no timezone (or empty string which effectively means local/no-tz behavior in arrow)
# When sys.timezone is picked up it might vary, but we just check it doesn't crash.
# If it picks up Sys.timezone(), checking exact equality might be flaky across environments if not mocked.
# So we primarily check for no error.

# Also check write_parquet survival
tf <- tempfile()
on.exit(unlink(tf))
expect_error(write_parquet(data.frame(x = x), tf), NA)
expect_true(file.exists(tf))
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind putting this test in somewhere around

test_that("array supports POSIXct (ARROW-3340)", {
times <- lubridate::ymd_hms("2018-10-07 19:04:05") + 1:10
expect_array_roundtrip(times, timestamp("us", "UTC"))
times[5] <- NA
expect_array_roundtrip(times, timestamp("us", "UTC"))
times2 <- lubridate::ymd_hms("2018-10-07 19:04:05", tz = "America/New_York") + 1:10
expect_array_roundtrip(times2, timestamp("us", "America/New_York"))
})
test_that("array uses local timezone for POSIXct without timezone", {
withr::with_envvar(c(TZ = ""), {
times <- strptime("2019-02-03 12:34:56", format = "%Y-%m-%d %H:%M:%S") + 1:10
expect_equal(attr(times, "tzone"), NULL)
expect_array_roundtrip(times, timestamp("us", Sys.timezone()))
# Also test the INTSXP code path
skip("Ingest_POSIXct only implemented for REALSXP")
times_int <- as.integer(times)
attributes(times_int) <- attributes(times)
expect_array_roundtrip(times_int, timestamp("us", ""))
})
# If there is a timezone set, we record that
withr::with_timezone("Pacific/Marquesas", {
times <- strptime("2019-02-03 12:34:56", format = "%Y-%m-%d %H:%M:%S") + 1:10
expect_equal(attr(times, "tzone"), "Pacific/Marquesas")
expect_array_roundtrip(times, timestamp("us", "Pacific/Marquesas"))
times_with_tz <- strptime(
"2019-02-03 12:34:56",
format = "%Y-%m-%d %H:%M:%S",
tz = "Asia/Katmandu"
) +
1:10
expect_equal(attr(times, "tzone"), "Asia/Katmandu")
expect_array_roundtrip(times, timestamp("us", "Asia/Katmandu"))
})
# and although the TZ is NULL in R, we set it to the Sys.timezone()
withr::with_timezone(NA, {
times <- strptime("2019-02-03 12:34:56", format = "%Y-%m-%d %H:%M:%S") + 1:10
expect_equal(attr(times, "tzone"), NULL)
expect_array_roundtrip(times, timestamp("us", Sys.timezone()))
})
})

and possibly for the parquet section, put it somewhere near

test_that("write_parquet() can truncate timestamps", {
tab <- Table$create(x1 = as.POSIXct("2020/06/03 18:00:00", tz = "UTC"))
expect_type_equal(tab$x1, timestamp("us", "UTC"))
tf <- tempfile()
on.exit(unlink(tf))
write_parquet(tab, tf, coerce_timestamps = "ms", allow_truncated_timestamps = TRUE)
new <- read_parquet(tf, as_data_frame = FALSE)
expect_type_equal(new$x1, timestamp("ms", "UTC"))
expect_equal(as.data.frame(tab), as.data.frame(new))
})
test_that("make_valid_parquet_version()", {
expect_equal(
make_valid_parquet_version("1.0"),
ParquetVersionType$PARQUET_1_0
)
expect_equal(
make_valid_parquet_version("2.4"),
ParquetVersionType$PARQUET_2_4
)
expect_equal(
make_valid_parquet_version("2.6"),
ParquetVersionType$PARQUET_2_6
)
expect_equal(
make_valid_parquet_version("latest"),
ParquetVersionType$PARQUET_2_6
)
expect_equal(make_valid_parquet_version(1), ParquetVersionType$PARQUET_1_0)
expect_equal(make_valid_parquet_version(1.0), ParquetVersionType$PARQUET_1_0)
expect_equal(make_valid_parquet_version(2.4), ParquetVersionType$PARQUET_2_4)
})

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[R] arrow::write_parquet error with zero-length datetimes in R 4.5.2

2 participants