Cid/attachments #581

gustavocidornelas · 2026-01-15T20:35:23Z

Pull Request

Summary

Introduces:

Multimodal trace support for OpenAI chat completion and responses API (image, audio, files).
"Attachment" support for every trace step.

Changes

Attachments

Introduces the Attachment abstraction, which represents media data uploaded to a blob store. The Attachment object stores metadata and the storageUri, which can be used by the Openlayer platform to fetch the media.
Every step can have an attachments field, which is an array of Attachment objects. This allows users to log arbitrary media to a step. For example:

from openlayer.lib import trace
from openlayer.lib.tracing import log_attachment

@trace()
def my_function():
    # Do something

    # Log attachment to the `my_function` step
    log_attachment("/path/to/file")
    return

When streaming the trace to the Openlayer step, we now scan the trace to check if there are any attachments. If there are, they are uploaded first and then the trace is streamed. The upload happens via the typical presigned URL flow.

OpenAI multimodal

Besides having attachments, this PR also instruments the trace_openai wrapper to parse image/audio/files in the input or output of OpenAI LLM calls.
To support media in OpenAI traces, we extend how inputs/outputs are represented. In summary, the schema is:

  # -------------- Inputs --------------------
  {
      "prompt": [
          # Old format. We still use it when the prompt only has strings
          {
              "role": "user",
              "content": "Simple text message"  # String for text-only (backwards compatibility)
          },
           # New format. Content as a list of objects with `type` (one of `text`, `image`, `audio`, or `file`)
          {
              "role": "user",
              "content": [  # List for multimodal
                  {"type": "text", "text": "What's in this image?"},
                  {"type": "image", "attachment": {"id": "...", "storageUri": "...", ...}}
              ]
          }
      ]
  }

# -------------- Output --------------------
# Old format. We still use it if the output is a simple string
  "Simple text response"  # String for text-only (backwards compatibility)

  # or
  {"type": "text", "text": "Text response"}

  # or
  {"type": "audio", "attachment": {"id": "...", "storageUri": "...", ...}}

  # or mixed
  [
      {"type": "text", "text": "Here's the image you requested:"},
      {"type": "image", "attachment": {...}}
  ]

Note that if the type is one of image, audio, or file, the other object field is an attachment, which is a serialized Attachment object.

Context

OPEN-8683: Multimodal attachment support for the Python SDK and OPEN-8684: Enhance OpenAI tracer to support multimodal inputs/outputs

Testing

Manual testing

…nputs/outputs

shah-siddd · 2026-01-22T12:06:50Z

src/openlayer/lib/integrations/openai_tracer.py

+            if name:
+                processed_msg["name"] = name
+            processed_messages.append(processed_msg)
+        else:


will this cause issues?
when the content type is unknown?

When we encounter an unknown content type, we preserve the original message unchanged. If OpenAI introduces a new content type we don't recognize, we don't break the trace, we just pass it through as-is. The trace will still work, we just won't have special handling for that new type.

Should it log a warning or emit an alert so we can learn about this new type and add it later?

shah-siddd · 2026-01-22T12:09:25Z

src/openlayer/lib/integrations/openai_tracer.py

+    elif item_type == "file":
+        file_data = item.get("file", {})
+        file_id = file_data.get("file_id")
+        file_data_b64 = file_data.get("file_data")


what if the base64 is too big? I don't know if it will cause issues while sending like for example if the request size exceeds.

Good question! The base64 data is stored temporarily in the Attachment object, but it's not sent in the trace payload. The flow is:

Base64 data is extracted and stored in Attachment.data_base64

Before sending the trace, upload_trace_attachments() uploads the data to our blob storage and gets a storage_uri

After upload, data_base64 is cleared (set to None) in attachment_uploader.py

Only the storage_uri is serialized into the trace payload

So the actual trace request only contains the URI reference, not the heavy base64 data.

shah-siddd · 2026-01-22T12:10:52Z

src/openlayer/lib/integrations/openai_tracer.py

+        elif file_id:
+            # Just reference the file ID (can't download without API call)
+            attachment = Attachment(name=filename)
+            attachment.metadata["openai_file_id"] = file_id


if we just have openai_file_id, what will we show in the UI?

When we only have openai_file_id, we create an attachment with just the filename and store the file_id in metadata. This is a design decision and in the UI, I thought we could show:

The filename (if provided, otherwise "file")

The openai_file_id reference in metadata

No preview/content (since we don't have access to the actual data)

This is a limitation. We could potentially add an option to fetch the file content using the client, but that adds latency and complexity. For now, I thought we could just capture the reference.

lets add this as a comment somewhere so @eluzzi5 can come up with a good design.

Maybe a Linear issue?

shah-siddd · 2026-01-22T12:22:16Z

src/openlayer/lib/tracing/attachment_uploader.py

+            The attachment with storage_uri populated (if upload was needed).
+        """
+        # Skip if already uploaded
+        if attachment.is_uploaded():


should we still upload data from openai or anywhere else to our storage? they might remove that data after sometime.

Currently, if an attachment has an external URL (like a hosted image URL), we keep that URL and don't re-upload to our storage.

Pros of current approach:

Faster (no download + re-upload)

Less storage cost

Cons:

External URLs may expire

Data could become unavailable

I think this is worth discussing as a follow-up improvement. We could add an option like persist_external_urls=True that would download and re-upload external content to ensure long-term availability. What do you think?

I think storage is cheap. We can do it async on our backend later but we should surely store it with us. I am sure that particular file is important and if it expires we miss critical information. We can do it later as a followup task but make it default.
Thoughts @gustavocidornelas @whoseoyster ?

Is there a compliance risk if we copy data to Openlayer?

Good point! The attachment upload feature is opt-in. The user needs to explicitly set attachment_upload_enabled=True when they configure the tracer.

However, we should document this clearly. Worth discussing further

agree with us copying files over to our storage wherever possible, but allowing opt out

shah-siddd · 2026-01-22T12:31:49Z

src/openlayer/lib/tracing/tracer.py

+            try:
+                from .attachment_uploader import upload_trace_attachments
+
+                upload_count = upload_trace_attachments(trace)


this _upload_and_publish_trace will be called when we do have an attachment to upload - right?
because if theres nothing to upload then upload_count will be 0.

_upload_and_publish_trace is called for every trace, regardless of whether there are attachments. It handles both uploading attachments (if any exist and attachment_upload_enabled=True) and publishing the trace data to Openlayer.

If there are no attachments, upload_count will be 0, but the function still proceeds to publish the trace. The attachment upload is just one step in the trace completion process. The naming could be clearer - it's really "process and publish trace" with attachment upload being an optional first step.

src/openlayer/lib/integrations/openai_tracer.py

…acer

matheusportela

@gustavocidornelas I left a bunch of comments but none is blocking (some are literally just because I'm curious)! Let me know if you have any questions

matheusportela · 2026-01-26T16:14:53Z

src/openlayer/lib/data/_upload.py

+    upload_fields = dict(fields) if fields else {}
+    upload_fields["file"] = (object_name, data, content_type)


If you change the fields type signature to fields: Dict = {}, you can simplify this:

upload_fields = { "file": (object_name, data, content_type), **fields }

matheusportela · 2026-01-26T16:15:59Z

src/openlayer/lib/data/_upload.py

+    headers = {"Content-Type": content_type}
+    if extra_headers:
+        headers.update(extra_headers)


Similar to the previous comment, this can be simplified by changing extra_headers to a default {}:

extra_headers: Dict[str, str] = {} headers = { "Content-Type": content_type, **extra_headers }

matheusportela · 2026-01-26T16:17:41Z

src/openlayer/lib/integrations/openai_tracer.py

+                    collected_function_call["arguments"] += delta.tool_calls[
+                        0
+                    ].function.arguments


Weird formatting choice by the linter lol

matheusportela · 2026-01-26T16:22:53Z

src/openlayer/lib/integrations/openai_tracer.py

+        has_audio = (
+            hasattr(response.choices[0].message, "audio")
+            and response.choices[0].message.audio is not None
+        )


Slightly simpler check:

Suggested change

has_audio = (

hasattr(response.choices[0].message, "audio")

and response.choices[0].message.audio is not None

)

has_audio = bool(getattr(response.choices[0].message, "audio", None))

matheusportela · 2026-01-26T16:27:13Z

src/openlayer/lib/integrations/openai_tracer.py

+            for item in content:
+                content_item = _normalize_content_item(item)
+                normalized_content.append(content_item)


This could be a list comprehension:

normalized_content = [_normalize_content_item(item) for item in content)

matheusportela · 2026-01-26T17:25:37Z

src/openlayer/lib/tracing/attachments.py

+            return Path(self.file_path).exists()
+        return False
+
+    def is_valid(self) -> bool:


What should we do if an attachment is not valid?

matheusportela · 2026-01-26T17:28:15Z

src/openlayer/lib/tracing/attachments.py

+            decoded = base64.b64decode(data_base64)
+            size_bytes = len(decoded)
+            checksum_md5 = hashlib.md5(decoded).hexdigest()
+        except Exception:


When would this happen?

matheusportela · 2026-01-26T17:29:13Z

src/openlayer/lib/tracing/attachments.py

+        # TODO: Implement metadata extraction from file
+        pass


Is this still relevant?

matheusportela · 2026-01-26T17:34:24Z

src/openlayer/lib/tracing/steps.py

+            # File-like object
+            file_bytes = data.read()


When would this happen? Can we explicit check for the classes (e.g. BinaryIO) and have an else clause that will raise an error if an unexpected type is passed in? IMO it would be a better developer experience than breaking on data.read(), for instance.

matheusportela · 2026-01-26T17:36:09Z

src/openlayer/lib/tracing/tracer.py

+    # Reset attachment uploader
+    from .attachment_uploader import reset_uploader
+
+    reset_uploader()


Why do we need to reset the uploader here?

The _uploader holds a reference to the Openlayer client. When configure() is called with new credentials/settings, we need to reset the uploader so it creates a new client with the updated configuration. Otherwise, it would continue using the old client.

gustavocidornelas added 2 commits January 14, 2026 13:25

feat(closes OPEN-8683): multimodal attachment support for the Python SDK

5ecc273

feat(closes OPEN-8684): enhance OpenAI tracer to support multimodal i…

4054b8c

…nputs/outputs

whoseoyster requested review from shah-siddd and whoseoyster January 16, 2026 07:45

gustavocidornelas added 2 commits January 20, 2026 09:39

chore(closes OPEN-8725): sanitize raw output when there are attachments

d0df24c

chore(closes OPEN-8686): upload attachments in a non-blocking way

32facd2

shah-siddd reviewed Jan 22, 2026

View reviewed changes

vikasnair requested changes Jan 24, 2026

View reviewed changes

src/openlayer/lib/integrations/openai_tracer.py Show resolved Hide resolved

fix(closes OPEN-8781): files not being parsed correctly for OpenAI tr…

5e6ee68

…acer

matheusportela reviewed Jan 26, 2026

View reviewed changes

		upload_fields = dict(fields) if fields else {}
		upload_fields["file"] = (object_name, data, content_type)

Cid/attachments #581

Are you sure you want to change the base?

Cid/attachments #581

Uh oh!

Conversation

gustavocidornelas commented Jan 15, 2026

Pull Request

Summary

Changes

Attachments

OpenAI multimodal

Context

Testing

Uh oh!

shah-siddd Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

matheusportela left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

shah-siddd Jan 22, 2026 •

edited

Loading