-
Notifications
You must be signed in to change notification settings - Fork 3.2k
feat(blocks): Add Bright Data tools #2813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@meirk-brd is attempting to deploy a commit to the Sim Team on Vercel. A member of the Team first needs to authorize it. |
Greptile Summary
Important Files Changed
Confidence score: 4/5
Sequence DiagramsequenceDiagram
participant User
participant BrightDataBlock
participant API
participant BrightDataDatasetAPI
participant BrightDataAPI
participant DatasetPoller
User->>BrightDataBlock: "Select operation and input parameters"
BrightDataBlock->>API: "Route request based on operation type"
alt Dataset Operations
API->>BrightDataDatasetAPI: "POST /api/tools/brightdata/dataset"
BrightDataDatasetAPI->>BrightDataAPI: "Trigger dataset with datasetId"
BrightDataAPI-->>BrightDataDatasetAPI: "Return snapshot_id"
BrightDataDatasetAPI->>DatasetPoller: "Poll snapshot status every 1s"
DatasetPoller->>BrightDataAPI: "GET snapshot status"
BrightDataAPI-->>DatasetPoller: "Status: running/building/starting"
loop Until Complete or 10min timeout
DatasetPoller->>BrightDataAPI: "Check snapshot status"
BrightDataAPI-->>DatasetPoller: "Status update"
end
DatasetPoller-->>BrightDataDatasetAPI: "Final dataset results"
BrightDataDatasetAPI-->>API: "Dataset response with data"
else Scrape Markdown
API->>BrightDataAPI: "POST /api/tools/brightdata/scrape-markdown"
BrightDataAPI-->>API: "Markdown content and metadata"
else Search Engine
API->>BrightDataAPI: "POST /api/tools/brightdata/search-engine"
BrightDataAPI-->>API: "Search results array"
end
API-->>BrightDataBlock: "Processed response"
BrightDataBlock-->>User: "Results with data/markdown/search results"
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
55 files reviewed, 6 comments
| return NextResponse.json({ | ||
| markdown, | ||
| url, | ||
| title: title || undefined, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Redundant check - title is already undefined if falsy, so || undefined is unnecessary
| title: title || undefined, | |
| title, |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/sim/app/api/tools/brightdata/scrape-markdown/route.ts
Line: 80:80
Comment:
**style:** Redundant check - title is already undefined if falsy, so `|| undefined` is unnecessary
```suggestion
title,
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
| const maxCount = Number.isFinite(maxResults) ? Number(maxResults) : undefined | ||
| const results = maxCount ? normalizedResults.slice(0, maxCount) : normalizedResults |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Redundant slice when maxResults is undefined - normalizedResults already contains all results
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/sim/app/api/tools/brightdata/search-engine/route.ts
Line: 92:93
Comment:
**style:** Redundant slice when maxResults is undefined - `normalizedResults` already contains all results
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
| num_of_comments: { | ||
| type: 'string', | ||
| required: false, | ||
| visibility: 'user-or-llm', | ||
| description: 'Number of comments to fetch (default: 10)', | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Using string type for num_of_comments parameter is inconsistent - this should be number type since it represents a count
| num_of_comments: { | |
| type: 'string', | |
| required: false, | |
| visibility: 'user-or-llm', | |
| description: 'Number of comments to fetch (default: 10)', | |
| }, | |
| num_of_comments: { | |
| type: 'number', | |
| required: false, | |
| visibility: 'user-or-llm', | |
| description: 'Number of comments to fetch (default: 10)', | |
| }, |
Is there a specific reason this numeric parameter needs to be a string rather than a number?
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/sim/tools/brightdata/dataset_youtube_comments.ts
Line: 21:26
Comment:
**style:** Using string type for `num_of_comments` parameter is inconsistent - this should be number type since it represents a count
```suggestion
num_of_comments: {
type: 'number',
required: false,
visibility: 'user-or-llm',
description: 'Number of comments to fetch (default: 10)',
},
```
Is there a specific reason this numeric parameter needs to be a string rather than a number?
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
| if (body.num_of_comments === undefined) { | ||
| body.num_of_comments = '10' | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: This logic is flawed - params.num_of_comments is undefined when not provided, but body.num_of_comments was just assigned that undefined value on line 46
| if (body.num_of_comments === undefined) { | |
| body.num_of_comments = '10' | |
| } | |
| if (params.num_of_comments === undefined) { | |
| body.num_of_comments = '10' | |
| } |
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/sim/tools/brightdata/dataset_youtube_comments.ts
Line: 49:51
Comment:
**logic:** This logic is flawed - `params.num_of_comments` is undefined when not provided, but `body.num_of_comments` was just assigned that undefined value on line 46
```suggestion
if (params.num_of_comments === undefined) {
body.num_of_comments = '10'
}
```
How can I resolve this? If you propose a fix, please make it concise.
| if (body.days_limit === undefined) { | ||
| body.days_limit = '3' | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: the conditional check for undefined occurs after the property is already assigned to the body object - consider checking params.days_limit directly before assignment
| if (body.days_limit === undefined) { | |
| body.days_limit = '3' | |
| } | |
| if (params.days_limit === undefined) { | |
| body.days_limit = '3' | |
| } else { | |
| body.days_limit = params.days_limit | |
| } |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/sim/tools/brightdata/dataset_google_maps_reviews.ts
Line: 49:51
Comment:
**style:** the conditional check for undefined occurs after the property is already assigned to the body object - consider checking params.days_limit directly before assignment
```suggestion
if (params.days_limit === undefined) {
body.days_limit = '3'
} else {
body.days_limit = params.days_limit
}
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
| { | ||
| id: 'url', | ||
| title: 'Dataset URL', | ||
| type: 'short-input', | ||
| placeholder: 'https://example.com', | ||
| condition: { | ||
| field: 'operation', | ||
| value: [ | ||
| 'dataset_amazon_product', | ||
| 'dataset_amazon_product_reviews', | ||
| 'dataset_amazon_product_search', | ||
| 'dataset_walmart_product', | ||
| 'dataset_walmart_seller', | ||
| 'dataset_ebay_product', | ||
| 'dataset_homedepot_products', | ||
| 'dataset_zara_products', | ||
| 'dataset_etsy_products', | ||
| 'dataset_bestbuy_products', | ||
| 'dataset_linkedin_person_profile', | ||
| 'dataset_linkedin_company_profile', | ||
| 'dataset_linkedin_job_listings', | ||
| 'dataset_linkedin_posts', | ||
| 'dataset_linkedin_people_search', | ||
| 'dataset_crunchbase_company', | ||
| 'dataset_zoominfo_company_profile', | ||
| 'dataset_instagram_profiles', | ||
| 'dataset_instagram_posts', | ||
| 'dataset_instagram_reels', | ||
| 'dataset_instagram_comments', | ||
| 'dataset_facebook_posts', | ||
| 'dataset_facebook_marketplace_listings', | ||
| 'dataset_facebook_company_reviews', | ||
| 'dataset_facebook_events', | ||
| 'dataset_tiktok_profiles', | ||
| 'dataset_tiktok_posts', | ||
| 'dataset_tiktok_shop', | ||
| 'dataset_tiktok_comments', | ||
| 'dataset_google_maps_reviews', | ||
| 'dataset_google_shopping', | ||
| 'dataset_google_play_store', | ||
| 'dataset_apple_app_store', | ||
| 'dataset_reuter_news', | ||
| 'dataset_github_repository_file', | ||
| 'dataset_yahoo_finance_business', | ||
| 'dataset_x_posts', | ||
| 'dataset_zillow_properties_listing', | ||
| 'dataset_booking_hotel_listings', | ||
| 'dataset_youtube_profiles', | ||
| 'dataset_youtube_comments', | ||
| 'dataset_reddit_posts', | ||
| 'dataset_youtube_videos', | ||
| ], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: This large condition array excludes 'dataset_npm_package' and 'dataset_pypi_package' operations but includes them in the DATASET_TOOL_MAP. Should these operations also require the URL field? Should npm and pypi package datasets also require a URL input, or do they only need the package_name field?
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/sim/blocks/blocks/brightdata.ts
Line: 149:200
Comment:
**logic:** This large condition array excludes 'dataset_npm_package' and 'dataset_pypi_package' operations but includes them in the DATASET_TOOL_MAP. Should these operations also require the URL field? Should npm and pypi package datasets also require a URL input, or do they only need the package_name field?
How can I resolve this? If you propose a fix, please make it concise.
Summary
Integrate Bright Data into SIM with tools, block UI, and API routes. Adds all dataset tools from Bright Data, registers them in the tool registry and Bright Data block, and adds a scoped Bright Data. Includes Bright Data icon updates, dataset polling with 10‑minute timeout.
Fixes #N/A
Type of Change
Testing
cd apps/sim && bun x tsc --noEmit -p tsconfig.brightdata.jsonbun x biome check --write --unsafe apps/sim/app/api/tools/brightdata apps/sim/tools/brightdata apps/sim/blocks/blocks/brightdata.ts apps/sim/tools/registry.ts apps/sim/blocks/registry.tsChecklist
Screenshots/Videos
Example of Dataset usage (there are 40+ datasets, but ):
Search :

Scrape:
