
| Package |  |
pip install brightdata → one import away from grabbing JSON//HTML data
from Amazon, Instagram, LinkedIn, Tiktok, Youtube, X, Reddit and whole Web in a production-grade way.
Abstract away scraping entirely and enjoy your data.
Note: This is an unofficial SDK. Please visit https://brightdata.com/products/ for official information.
Supported Services
┌─────────────────────┬────────────────────────────────────────────────────────┐
│ Service │ Description │
├─────────────────────┼────────────────────────────────────────────────────────┤
│ Web Scraper API │ Ready-made scrapers for popular websites │
│ │ (Amazon, LinkedIn, Instagram, TikTok, Reddit, etc.) │
├─────────────────────┼────────────────────────────────────────────────────────┤
│ Web Unlocker │ Proxy service to bypass anti-bot protection │
│ │ Returns raw HTML from any URL │
├─────────────────────┼────────────────────────────────────────────────────────┤
│ Browser API │ Headless browser automation with Playwright │
│ │ Full JavaScript rendering and interaction support │
├─────────────────────┼────────────────────────────────────────────────────────┤
│ SERP (Soon) │ Get SERP results from Google, Bing, Yandex │
│ │ and many more search engines │
└─────────────────────┴────────────────────────────────────────────────────────┘
Features:
-
scrape_url method provides simplest yet most prod ready scraping experience
- Method auto recognizes url links and types. No need for complex imports for each scraper and domain combination.
- This method has
fallback_to_browser_api boolean parameter. When used, if no specialized scraper is found, it uses brightdata BrowserAPI to scrape the website.
- `scrape_url`` returns a ScrapeResult which has all the information regarding scraping job as well as all key timings to allow extensive debugging.
-
scrape_urls method for multiple link scraping. It is built with native asyncio support which means all urls can scraped at same time asycnrenously. And also ``fallback_to_browser_api` parameter available.
-
Supports Brightdata discovery and search APIs as well
-
To enable agentic workflows package contains a Json file which contains information about all scrapers and their methods
1. Quick start
Obtain BRIGHTDATA_TOKEN from brightdata.com
Create .env file and paste the token like this
BRIGHTDATA_TOKEN=AJKSHKKJHKAJ…
install brightdata package via PyPI
pip install brightdata
Table of Contents
1. Usage
1.1 Auto url scraping mode
brightdata.auto.scrape_url looks at the domain of a URL and
returns the scraper class that declared itself responsible for that domain.
With that you can all you have to do is feed the url.
from brightdata import trigger_scrape_url, scrape_url
rows = scrape_url("https://www.amazon.com/dp/B0CRMZHDG8")
snap = trigger_scrape_url("https://www.amazon.com/dp/B0CRMZHDG8")
it also works for sites which brightdata exposes several distinct “collect” endpoints.
LinkedInScraper is a good example:
| people profile – collect by URL | collect_people_by_url() |
| company page – collect by URL | collect_company_by_url() |
| job post – collect by URL | collect_jobs_by_url() |
In each scraper there is a smart dispatcher method which calls the right method based on link structure.
from brightdata import scrape_url
links_with_different_types = [
"https://www.linkedin.com/in/enes-kuzucu/",
"https://www.linkedin.com/company/105448508/",
"https://www.linkedin.com/jobs/view/4231516747/",
]
for link in links_with_different_types:
rows = scrape_url(link, bearer_token=TOKEN)
print(rows)
Note: trigger_scrape_url, scrape_url methods only covers the “collect by URL” use-case.
Discovery-endpoints (keyword, category, …) are still called directly on a
specific scraper class.
1.2 Access Scrapers Directly
import os
from dotenv import load_dotenv
from brightdata.ready_scrapers.amazon import AmazonScraper
from brightdata.utils.poll import poll_until_ready
import sys
load_dotenv()
TOKEN = os.getenv("BRIGHTDATA_TOKEN")
if not TOKEN:
sys.exit("Set BRIGHTDATA_TOKEN environment variable first")
scraper = AmazonScraper(bearer_token=TOKEN)
snap = scraper.collect_by_url([
"https://www.amazon.com/dp/B0CRMZHDG8",
"https://www.amazon.com/dp/B07PZF3QS3",
])
rows = poll_until_ready(scraper, snap).data
print(rows[0]["title"])
1.3 Async example
-
With fetch_snapshot_async you can trigger 1000 snapshots and each polling task yields control whenever it’s waiting
-
All polls share one aiohttp.ClientSession (connection pool), so you’re not tearing down TCP connections for every check.
-
fetch_snapshots_async is a convenience helper that wraps all the boilerplate needed when you fire off hundreds or thousands of scraping jobs—so you don’t have to manually spawn tasks and gather their results.It preserves the order of your snapshot list.
It surfaces all ScrapeResults in a single list, so you can correlate inputs → outputs easily.
import asyncio
from brightdata.ready_scrapers.amazon import AmazonScraper
from brightdata.utils.async_poll import fetch_snapshots_async
scraper = AmazonScraper(bearer_token=TOKEN)
keywords = ["dog food", "ssd", ...]
snapshots = [scraper.discover_by_keyword([kw])
for kw in keywords]
results = asyncio.run(
fetch_snapshots_async(scraper, snapshots, poll=15, timeout=600)
)
ready = [r.data for r in results if r.status == "ready"]
errors = [r for r in results if r.status != "ready"]
print("ready :", len(ready))
print("errors:", len(errors))
Memory footprint: few kB per job → thousands of parallel polls on a single VM.
1.4 Thread-based PollWorker pattern usage
- Running multiple (up to couple hundred max) scrape jobs with Zero changes to your sync code
- A callback to be invoked with your ScrapeResult when it’s ready or a file-path/directory to dump the JSON to disk.
- Easy to drop into any script, web-app or desktop app
- One OS thread per worker
- Ideal when your codebase is synchronous and you just want a background helper
Need fire-and-forget?
brightdata.utils.thread_poll.PollWorker (one line to start) runs in a
daemon thread, writes the JSON to disk or fires a callback and never blocks
your main code.
1.5 Triggering In Batches
Brightdata supports batch triggering. Which means you can do something like this
- it can be used when you dont need “one keyword → one snapshot-id” mapping.
payload = [{"keyword": kw} for kw in keywords]
snap_id = scraper.discover_by_keyword(payload)
results = asyncio.run(
fetch_snapshot_async(scraper, snap_id, poll=15, timeout=600)
)
rows = results.data
1.6 Concurrent triggering with a thread-pool
- It keeps the one-kw → one-snapshot
behaviour but removes the serial wait between HTTP calls.
from brightdata.utils.concurrent_trigger import trigger_keywords_concurrently
from brightdata.utils.async_poll import fetch_snapshots_async
scraper = AmazonScraper(bearer_token=TOKEN)
snapshot_map = trigger_keywords_concurrently(scraper, keywords, max_workers=64)
results = asyncio.run(
fetch_snapshots_async(scraper,
list(snapshot_map.values()),
poll=15, timeout=600)
)
kw_to_result = {
kw: res
for kw, sid in snapshot_map.items()
for res in results
if res.input_snapshot_id == sid
}
2. What’s included
| Amazon products / search | AmazonScraper | collect_by_url, discover_by_keyword, discover_by_category, search_products |
| Digi-Key parts | DigiKeyScraper | collect_by_url, discover_by_category |
| Mouser parts | MouserScraper | collect_by_url |
| LinkedIn | LinkedInScraper | collect_people_by_url, discover_people_by_name, collect_company_by_url, collect_jobs_by_url, discover_jobs_by_keyword |
Each call returns a snapshot_id string (sync_mode = async).
Use one of the helpers to fetch the final data:
brightdata.utils.poll.poll_until_ready() – blocking, linear
brightdata.utils.async_poll.wait_ready() – single coroutine
brightdata.utils.async_poll.monitor_snapshots() – fan-out hundreds using asyncio + aiohttp
3. ToDos
- make web unlocker return a scrape result object
- add web unlocker fallback mechanism for scrape url
3. Contributing
- Fork, create a feature branch.
- Keep the surface minimal – one scraper class per dataset family.
- Run the smoke-tests under
ready_scrapers/<dataset>/tests.py.
- Open PR.