Request a tool
All toolsMCP serverRequest a toolPlatformsCategories
GitHub Scraper icon

GitHub Scraper

Search GitHub repos or users and export clean rows: stars, forks, language, topics, license, plus user bio, company, location and follower count.

Run this in the cloudRun on Apify →

Developer & Research Tools

How it works

  1. 1
    Open it on Apify

    Hit Run on Apify — it opens the tool in the cloud, no install.

  2. 2
    Set the inputs

    Adjust query, type, sort (sensible defaults are pre-filled).

  3. 3
    Click Run

    The tool runs on Apify’s cloud and collects the data for you.

  4. 4
    Export the results

    Download as JSON, CSV or Excel, or pipe straight into your app, Google Sheets, or an AI agent.

Inputs

FieldWhat it doesType
queryGitHub search syntax. For repositories: "language:python stars:>1000 machine learning", "topic:cli created:>2023-01-01". For users: "location:berlin followers:>string
typeWhat to search for: repositories or users/organizations. Users are additionally enriched with profile details (name, bio, company, location, followers, public rstring
sortHow to sort repository results (applies to repository searches; user searches use GitHub's relevance ranking). Best match is GitHub's default relevance score.string
maxItemsMaximum number of repositories/users to return. GitHub's Search API caps at 1000 results per query (10 pages of 100), so split large jobs by qualifier (e.g. stainteger
githubTokenOptional GitHub personal access token. Strongly recommended: without it you get only 60 requests/hr and 10 searches/min; with it you get 5000 requests/hr and 30string
notionConnectorOptional. Write each item as a page into your Notion when the run finishes. Authorize a Notion connector once in Settings → API & Integrations → MCP connectors,string
notionParentIdOptional. The Notion data source ID of the database to write into (only used if a Notion connector is set). Leave empty to create the pages privately in your wostring

What you get

A structured dataset — each result includes fields like:

createdAtdefaultBranchdescriptiondetailsforksfullNamehomepageidlanguagelicenseloginnameopenIssuesowner

Export every run as JSON, CSV or Excel, or send it to your app, a database, Google Sheets, or an AI agent.

18 ready-to-run use cases

Trending AI Agent GitHub Repos Created in 2026

New AI-agent and LLM-tooling GitHub repos created in 2026 with 100+ stars, ranked by stars. A sourcing feed for VCs and dev newsletter writers.

Abandoned npm Repos: No Commits Since 2023

Popular npm GitHub repos with no commits since 2023, flagging zombie dependencies by stars and last-push date for supply-chain risk audits.

Most-Starred Self-Hosted Alternatives to SaaS

Top self-hosted open-source apps on GitHub ranked by stars, archived projects excluded. A research feed for awesome-list curators and SaaS-replacement hunters.

React Native GitHub Repos for DevRel Tracking

Actively-maintained React Native repos with 1000+ stars updated in 2026, with stars and last-commit date. DevRel teams use it to monitor the ecosystem.

Top Solidity Smart-Contract GitHub Repos by Stars

Most-starred Solidity smart-contract repos on GitHub with stars, forks, and license. Web3 auditors and crypto-VC analysts map the on-chain ecosystem with it.

MIT-Licensed Go Repos for LLM Training Data

MIT-licensed, non-fork Go repositories ranked by stars, with license and metadata. A license-safe corpus for code-LLM training and RAG dataset building.

Newly-Created MCP Server Repos on GitHub (2026)

Recently-created MCP (Model Context Protocol) server repos on GitHub from 2026, with stars and dates. Agent-tool directories and AI newsletters track them here.

Top Open-Source Repos by Org (org:stripe) on GitHub

A company's most-starred public GitHub repos via org: search like org:stripe, with stars and language. Tech-footprint research for sales and partnerships.

Fast-Growing Data Engineering Repos (1k+ Stars)

Data-engineering GitHub repos created since 2025 that already passed 1000+ stars. VCs and DevRel scouts spot breakout modern-data-stack tools early.

Senior Rust Engineers in San Francisco on GitHub

Active Rust developers in San Francisco from GitHub, filtered by language, location, and follower count. Technical recruiters mine it for sourcing leads.

Popular FastAPI Repos and Their Maintainers

Most-starred FastAPI projects on GitHub with stars, maintainers, and metadata. Map the Python API ecosystem, find integrations, and spot companies hiring.

Beginner Python Repos with Good First Issues

Active, well-maintained Python repositories that label good first issues and welcome new contributors. A starting point for Hacktoberfest and open-source PRs.

Prolific TypeScript Developers on GitHub to Recruit

Well-followed, prolific TypeScript developers on GitHub ranked by repo count and followers. Startup recruiters and dev-tool teams use it for outreach lists.

JavaScript Security Repos to Triage on GitHub

Recently-updated JavaScript repos tagged around security and vulnerabilities, with last-commit dates. Security researchers and dependency auditors triage them.

Top Python Web Frameworks on GitHub by Stars

Most-starred Python web frameworks on GitHub with stars, forks, last-updated date, and license, sorted by stars. A ranked overview for picking a stack.

Recently Updated LLM Repos on GitHub

Which large-language-model projects on GitHub shipped commits most recently, sorted by last update. Spot active forks and LLM repos worth following.

Most-Forked Machine Learning Repos on GitHub

Machine-learning GitHub repos ranked by fork count, surfacing the projects people actually build on rather than just star. Includes forks, stars, and language.

Go Developers in Berlin on GitHub for Recruiting

Go developers based in Berlin on GitHub with name, company, bio, and follower count. A ready-made sourcing list for recruiters running outreach campaigns.

GitHub Scraper

Search GitHub repositories or users via the public GitHub REST API and get back clean, structured rows. No API key required — but adding a free GitHub token raises your rate limit dramatically (60 → 5000 requests/hr), which matters for larger jobs.

What you get

Repositories (type: repositories): fullName, name, owner, url, description, stars, forks, openIssues, language, topics[], license (SPDX id), homepage, defaultBranch, createdAt, updatedAt, pushedAt.

Users / organizations (type: users) — each result is enriched with profile details: login, url, type, id, name, bio, company, location, blog, followers, publicRepos, createdAt.

Every successful row also carries ok: true. Diagnostic rows (no results, bad input, rate limit, network) carry ok: false plus an errorCode and error message, and are never charged.

Nullable fields: GitHub only returns what a repo/user actually sets, so optional fields are null when absent — e.g. repo description, language, license, homepage; user name, bio, company, location, blog. These nulls are normal and still count as complete rows.

Input

FieldNotes
queryGitHub search syntax. Repos: language:python stars:>1000 machine learning, topic:cli. Users: location:berlin followers:>500.
typerepositories (default) or users.
sortstars (default), forks, updated, or best-match. Applies to repository searches.
maxItemsDefault 100, max 1000 (GitHub's Search API cap).
githubTokenOptional but recommended — raises limits to 5000 req/hr and 30 searches/min. No scopes needed for public data. Kept private.

Output

One dataset row per repository or user, deduplicated by fullName / login. Queries with no matches return a single NO_RESULTS row and are not charged. An empty/missing query returns a single BAD_INPUT row (also not charged) instead of failing the run.

Rate limits

GitHub allows 60 requests/hr unauthenticated (10 searches/min) and 5000/hr with a token (30 searches/min). The Search API also caps at 1000 results per query. If you hit the limit, the actor returns a clear RATE_LIMITED row (with the reset time) suggesting you add a githubToken — it does not silently fail. This applies to user searches too: the per-user profile enrichment step makes one request per result, so a tokenless user search can exhaust the 60/hr budget — if that happens mid-enrichment, the actor surfaces a RATE_LIMITED row rather than returning zero rows silently.

Troubleshooting

  • Got a RATE_LIMITED row — add a free githubToken (60 → 5000 req/hr) and re-run; user searches especially benefit because each result triggers a profile-detail request. The row includes rateLimitResetsAt so you know when to retry.
  • Got a NO_RESULTS row — the query ran but matched nothing; broaden it or check GitHub search qualifiers.
  • Got a BAD_INPUT rowquery was empty; provide a search string.
  • These diagnostic rows have ok: false and are never charged.

Example

{ "query": "language:python stars:>5000 web framework", "type": "repositories", "sort": "stars", "maxItems": 100 }

Notes

To pull more than 1000 results, split the job by a qualifier — e.g. star bands (stars:1000..5000, stars:5001..20000) or creation date windows (created:2022-01-01..2022-12-31).