GitHub Scraper
Search GitHub repos or users and export clean rows: stars, forks, language, topics, license, plus user bio, company, location and follower count.
How it works
- 1Open it on Apify
Hit Run on Apify — it opens the tool in the cloud, no install.
- 2Set the inputs
Adjust
query,type,sort(sensible defaults are pre-filled). - 3Click Run
The tool runs on Apify’s cloud and collects the data for you.
- 4Export the results
Download as JSON, CSV or Excel, or pipe straight into your app, Google Sheets, or an AI agent.
Inputs
| Field | What it does | Type |
|---|---|---|
query | GitHub search syntax. For repositories: "language:python stars:>1000 machine learning", "topic:cli created:>2023-01-01". For users: "location:berlin followers:> | string |
type | What to search for: repositories or users/organizations. Users are additionally enriched with profile details (name, bio, company, location, followers, public r | string |
sort | How to sort repository results (applies to repository searches; user searches use GitHub's relevance ranking). Best match is GitHub's default relevance score. | string |
maxItems | Maximum number of repositories/users to return. GitHub's Search API caps at 1000 results per query (10 pages of 100), so split large jobs by qualifier (e.g. sta | integer |
githubToken | Optional GitHub personal access token. Strongly recommended: without it you get only 60 requests/hr and 10 searches/min; with it you get 5000 requests/hr and 30 | string |
notionConnector | Optional. Write each item as a page into your Notion when the run finishes. Authorize a Notion connector once in Settings → API & Integrations → MCP connectors, | string |
notionParentId | Optional. The Notion data source ID of the database to write into (only used if a Notion connector is set). Leave empty to create the pages privately in your wo | string |
What you get
A structured dataset — each result includes fields like:
createdAtdefaultBranchdescriptiondetailsforksfullNamehomepageidlanguagelicenseloginnameopenIssuesownerExport every run as JSON, CSV or Excel, or send it to your app, a database, Google Sheets, or an AI agent.
18 ready-to-run use cases
Trending AI Agent GitHub Repos Created in 2026
New AI-agent and LLM-tooling GitHub repos created in 2026 with 100+ stars, ranked by stars. A sourcing feed for VCs and dev newsletter writers.
Abandoned npm Repos: No Commits Since 2023
Popular npm GitHub repos with no commits since 2023, flagging zombie dependencies by stars and last-push date for supply-chain risk audits.
Most-Starred Self-Hosted Alternatives to SaaS
Top self-hosted open-source apps on GitHub ranked by stars, archived projects excluded. A research feed for awesome-list curators and SaaS-replacement hunters.
React Native GitHub Repos for DevRel Tracking
Actively-maintained React Native repos with 1000+ stars updated in 2026, with stars and last-commit date. DevRel teams use it to monitor the ecosystem.
Top Solidity Smart-Contract GitHub Repos by Stars
Most-starred Solidity smart-contract repos on GitHub with stars, forks, and license. Web3 auditors and crypto-VC analysts map the on-chain ecosystem with it.
MIT-Licensed Go Repos for LLM Training Data
MIT-licensed, non-fork Go repositories ranked by stars, with license and metadata. A license-safe corpus for code-LLM training and RAG dataset building.
Newly-Created MCP Server Repos on GitHub (2026)
Recently-created MCP (Model Context Protocol) server repos on GitHub from 2026, with stars and dates. Agent-tool directories and AI newsletters track them here.
Top Open-Source Repos by Org (org:stripe) on GitHub
A company's most-starred public GitHub repos via org: search like org:stripe, with stars and language. Tech-footprint research for sales and partnerships.
Fast-Growing Data Engineering Repos (1k+ Stars)
Data-engineering GitHub repos created since 2025 that already passed 1000+ stars. VCs and DevRel scouts spot breakout modern-data-stack tools early.
Senior Rust Engineers in San Francisco on GitHub
Active Rust developers in San Francisco from GitHub, filtered by language, location, and follower count. Technical recruiters mine it for sourcing leads.
Popular FastAPI Repos and Their Maintainers
Most-starred FastAPI projects on GitHub with stars, maintainers, and metadata. Map the Python API ecosystem, find integrations, and spot companies hiring.
Beginner Python Repos with Good First Issues
Active, well-maintained Python repositories that label good first issues and welcome new contributors. A starting point for Hacktoberfest and open-source PRs.
Prolific TypeScript Developers on GitHub to Recruit
Well-followed, prolific TypeScript developers on GitHub ranked by repo count and followers. Startup recruiters and dev-tool teams use it for outreach lists.
JavaScript Security Repos to Triage on GitHub
Recently-updated JavaScript repos tagged around security and vulnerabilities, with last-commit dates. Security researchers and dependency auditors triage them.
Top Python Web Frameworks on GitHub by Stars
Most-starred Python web frameworks on GitHub with stars, forks, last-updated date, and license, sorted by stars. A ranked overview for picking a stack.
Recently Updated LLM Repos on GitHub
Which large-language-model projects on GitHub shipped commits most recently, sorted by last update. Spot active forks and LLM repos worth following.
Most-Forked Machine Learning Repos on GitHub
Machine-learning GitHub repos ranked by fork count, surfacing the projects people actually build on rather than just star. Includes forks, stars, and language.
Go Developers in Berlin on GitHub for Recruiting
Go developers based in Berlin on GitHub with name, company, bio, and follower count. A ready-made sourcing list for recruiters running outreach campaigns.
GitHub Scraper
Search GitHub repositories or users via the public GitHub REST API and get back clean, structured rows. No API key required — but adding a free GitHub token raises your rate limit dramatically (60 → 5000 requests/hr), which matters for larger jobs.
What you get
Repositories (type: repositories): fullName, name, owner, url, description, stars, forks, openIssues, language, topics[], license (SPDX id), homepage, defaultBranch, createdAt, updatedAt, pushedAt.
Users / organizations (type: users) — each result is enriched with profile details: login, url, type, id, name, bio, company, location, blog, followers, publicRepos, createdAt.
Every successful row also carries ok: true. Diagnostic rows (no results, bad input, rate limit, network) carry ok: false plus an errorCode and error message, and are never charged.
Nullable fields: GitHub only returns what a repo/user actually sets, so optional fields are null when absent — e.g. repo description, language, license, homepage; user name, bio, company, location, blog. These nulls are normal and still count as complete rows.
Input
| Field | Notes |
|---|---|
query | GitHub search syntax. Repos: language:python stars:>1000 machine learning, topic:cli. Users: location:berlin followers:>500. |
type | repositories (default) or users. |
sort | stars (default), forks, updated, or best-match. Applies to repository searches. |
maxItems | Default 100, max 1000 (GitHub's Search API cap). |
githubToken | Optional but recommended — raises limits to 5000 req/hr and 30 searches/min. No scopes needed for public data. Kept private. |
Output
One dataset row per repository or user, deduplicated by fullName / login. Queries with no matches return a single NO_RESULTS row and are not charged. An empty/missing query returns a single BAD_INPUT row (also not charged) instead of failing the run.
Rate limits
GitHub allows 60 requests/hr unauthenticated (10 searches/min) and 5000/hr with a token (30 searches/min). The Search API also caps at 1000 results per query. If you hit the limit, the actor returns a clear RATE_LIMITED row (with the reset time) suggesting you add a githubToken — it does not silently fail. This applies to user searches too: the per-user profile enrichment step makes one request per result, so a tokenless user search can exhaust the 60/hr budget — if that happens mid-enrichment, the actor surfaces a RATE_LIMITED row rather than returning zero rows silently.
Troubleshooting
- Got a
RATE_LIMITEDrow — add a freegithubToken(60 → 5000 req/hr) and re-run; user searches especially benefit because each result triggers a profile-detail request. The row includesrateLimitResetsAtso you know when to retry. - Got a
NO_RESULTSrow — the query ran but matched nothing; broaden it or check GitHub search qualifiers. - Got a
BAD_INPUTrow —querywas empty; provide a search string. - These diagnostic rows have
ok: falseand are never charged.
Example
{ "query": "language:python stars:>5000 web framework", "type": "repositories", "sort": "stars", "maxItems": 100 }
Notes
To pull more than 1000 results, split the job by a qualifier — e.g. star bands (stars:1000..5000, stars:5001..20000) or creation date windows (created:2022-01-01..2022-12-31).