-
Notifications
You must be signed in to change notification settings - Fork 530
[FEATURE]: Add machine-readable JSON output for -out=report #2020
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
ab2cda5 to
ca55c86
Compare
|
I'm reverting the last commit ( |
1ac3c21 to
64355f0
Compare
|
Follow-up: I’m continuing to investigate the Sample Platform failures separately. At this point, they don’t appear to be directly caused by the changes in this PR, but I’m still digging to be sure. I’ll update here once I have a clearer conclusion. |
|
Thanks for this feature! The JSON output format looks well-designed and works correctly. However, please rebase this PR on master. The branch is missing the fix from #2025 (merged Jan 17), which causes a segfault when using After rebasing:
Once rebased, this should be ready to merge. |
64355f0 to
b0d6205
Compare
CCExtractor CI platform finished running the test files on linux. Below is a summary of the test results, when compared to test for commit 9d921de...:
Your PR breaks these cases:
Congratulations: Merging this PR would fix the following tests:
It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you). Check the result page for more info. |
CCExtractor CI platform finished running the test files on windows. Below is a summary of the test results, when compared to test for commit b8019bd...:
Your PR breaks these cases:
Congratulations: Merging this PR would fix the following tests:
It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you). Check the result page for more info. |
cfsmp3
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deep Review Results - Issues Found
I tested the JSON output feature against 172 media files from our test suite. While the feature works well in many cases (166 files produced valid JSON), I found several issues that should be addressed.
Issue 1: Program Count Mismatch (25 files affected)
The JSON reports fewer programs than actually exist in multi-program streams. The program_count and program_numbers fields don't match what ffprobe reports.
Examples:
| File | JSON Reports | FFprobe Shows |
|---|---|---|
96efd279cfa1dddcb1d7d38ecc5ebd6d870a661452c6480804c30a9896037336.ts |
4 programs (0,155,192,193) | 6 programs (155,156,157,158,192,193) |
36d5eca53c56ac18e727badec449ac0f10096369f8a7eda5f7108f7170c5cc8c.mpg |
1 program (2030) | 10 programs (82,2000,2005,2010,2015,2020,2025,2030,2035,2090) |
c6407fb294bf0f97a84e6a70aa2787dc4b13688645d9f2f2db50c754b5855bb6.mpg |
1 program (819) | 8 programs (817,818,819,820,821,830,831,832) |
e92a1d4d2aabdca2f1a2cb7854316a6fdc539bc05d26c5a5aae89f21b697c780.mpg |
1 program (1346) | 7 programs (1344,1345,1346,1347,1348,1351,1352) |
To reproduce:
./ccextractor 96efd279cfa1dddcb1d7d38ecc5ebd6d870a661452c6480804c30a9896037336.ts -out=report --report-format json | jq '.stream.program_count, .stream.program_numbers'
# Returns: 4, [0,155,192,193]
ffprobe -v quiet -print_format json -show_programs 96efd279cfa1dddcb1d7d38ecc5ebd6d870a661452c6480804c30a9896037336.ts | jq '[.programs[].program_num]'
# Returns: [155,156,157,158,192,193]Suggestion: Either report ALL programs in the stream, or rename the field to caption_program_count to clarify it only includes programs with detected caption streams.
Issue 2: has_any_captions Excludes DVB/Teletext
The field has_any_captions only considers EIA-608/CEA-708, not DVB subtitles or Teletext:
// src/lib_ccx/params_dump.c:464
bool has_any_captions = has_608 || has_708;This produces confusing output:
{
"has_any_captions": false,
"teletext": true,
"dvb_subtitles": true
}Files demonstrating this issue:
006fdc391aab432f9e379f6e55fa9fec3dc9b2fad67d4b284fc7f28f3984238f.mpg- has teletext buthas_any_captions: false1020459a866fab62d0adc5c5518e1ffcc7b9f313d3af6a18ecd33d73d2eb8e05.ts- has DVB subtitles buthas_any_captions: false36d5eca53c56ac18e727badec449ac0f10096369f8a7eda5f7108f7170c5cc8c.mpg- has BOTH teletext AND DVB buthas_any_captions: false
Suggestion: Either:
- Rename to
has_608_708to be explicit, OR - Include DVB/Teletext:
bool has_any_captions = has_608 || has_708 || has_teletext || has_dvb;
Issue 3: Video Dimensions Detection Failure (1 file)
One file reports 0x0 for video dimensions when ffprobe shows 1920x1080:
File: af446fc78afeb80bbf1f329f93f205ca44cbbe635d547061932b3d1431806473.ts
./ccextractor af446fc78afeb80bbf1f329f93f205ca44cbbe635d547061932b3d1431806473.ts -out=report --report-format json | jq '.programs[0].video'
# Returns: {"width": 0, "height": 0, ...}
ffprobe -v quiet -print_format json -show_streams af446fc78afeb80bbf1f329f93f205ca44cbbe635d547061932b3d1431806473.ts | jq '.streams[] | select(.codec_type=="video") | {width, height}'
# Returns: {"width": 1920, "height": 1080}What Works Well
- JSON syntax is 100% valid across all 166 files
- EIA-608/CEA-708 caption detection is accurate
- Teletext and DVB subtitle stream detection works correctly
- Stream mode detection (TS, PS, MP4, etc.) is accurate
- Video codec identification is correct
Please address these issues. Happy to re-test once updates are made.
In raising this pull request, I confirm the following (please check boxes):
My familiarity with the project is as follows (check one):
Summary
This PR implements machine-readable JSON output for the
-out=reportfeature, addressing issue #1399. Users can now generate structured reports that can be parsed with tools likejq, enabling seamless integration with automated workflows.Background
Currently, CCExtractor’s report output is human-readable text that requires custom parsing for automation. While other media analysis tools such as ffprobe and mediainfo provide JSON output, structured closed-caption reporting is not consistently available across tools or versions. This feature enables CCExtractor to expose its existing report data in a structured JSON format.
Use case: Users running CCExtractor in automated environments (e.g., CI/CD pipelines, media processing workflows) need to programmatically determine if streams contain closed captions without writing custom parsers.
Changes
-out=reportOptionExisting Text Output (-out=report)
JSON Output Structure (v1.0)
The output follows a versioned JSON report structure:
JSON output via
--report-format json{ "schema": { "name": "ccextractor-report", "version": "1.0" }, "input": { "source": "file", "path": "../20251206ch29FullTS.ts" }, "stream": { "mode": "Transport Stream", "program_count": 5, "program_numbers": [ 1, 2, 3, 4, 5 ], "pids": [ { "pid": 49, "program_number": 1, "codec": "MPEG-2 video" }, { "pid": 52, "program_number": 1, "codec": "AC3 audio" }, { "pid": 53, "program_number": 1, "codec": "AC3 audio" }, { "pid": 65, "program_number": 2, "codec": "MPEG-2 video" }, { "pid": 68, "program_number": 2, "codec": "AC3 audio" }, { "pid": 81, "program_number": 3, "codec": "MPEG-2 video" }, { "pid": 84, "program_number": 3, "codec": "AC3 audio" }, { "pid": 97, "program_number": 4, "codec": "MPEG-2 video" }, { "pid": 100, "program_number": 4, "codec": "AC3 audio" }, { "pid": 113, "program_number": 5, "codec": "MPEG-2 video" }, { "pid": 116, "program_number": 5, "codec": "AC3 audio" } ] }, "programs": [ { "program_number": 1, "summary": { "has_any_captions": true, "has_608": true, "has_708": true }, "services": { "dvb_subtitles": false, "teletext": false, "atsc_closed_caption": true }, "captions": { "present": true, "eia_608": { "present": true, "xds": false, "channels": { "cc1": true, "cc2": false, "cc3": false, "cc4": false } }, "cea_708": { "present": true, "services": [ 1, 2, 3, 4, 5, 6, 9 ] } }, "video": { "width": 1920, "height": 1080, "aspect_ratio": "03 - 16:9", "frame_rate": "04 - 29.97" } }, (More programs omitted for brevity)Schema Notes
programs[]indicates which captioning systems are present (DVB, Teletext, ATSC), whilecaptions.cea_708.services[]lists active CEA-708 caption service numbers.Program Ordering:
input.pathstream.modestream.program_countstream.program_numbers[]stream.pids[]programs[].services.dvb_subtitlesprograms[].services.teletextprograms[].services.atsc_closed_captionprograms[].captions.eia_608.presentprograms[].captions.eia_608.xdsprograms[].captions.eia_608.channels.*programs[].captions.cea_708.presentprograms[].captions.cea_708.services[]programs[].video.width / heightprograms[].video.aspect_ratioprograms[].video.frame_ratecontainer.mp4.timed_text_tracksschema.*programs[].summary.*programs[].captions.presentKey Features:
-out=reportv1.0) for future extensibilityhas_any_captionssummary field reflects EIA-608 / CEA-708 only.)Technical Approach
Example Testing Commands
Field Value Formats:
aspect_ratioandframe_ratepreserve CCExtractor's internal enum formatting (e.g., "03 - 16:9", "04 - 29.97")jq '.programs[].video.aspect_ratio | split(" - ")[1]'Benefits
has_any_captionssummary field for fast EIA-608 / CEA-708 closed-caption checksNotes
strcasecmpon POSIX systems and mapsto _stricmpon Windows via platform-specific preprocessor guards.