Lemmy Scraper
Scrape public Lemmy posts from any instance by feed, community, or keyword. Get title, author, score, comments and more as JSON. No login needed.
How it works
- 1Open it on Apify
Hit Run on Apify — it opens the tool in the cloud, no install.
- 2Set the inputs
Adjust
instance,mode,query(sensible defaults are pre-filled). - 3Click Run
The tool runs on Apify’s cloud and collects the data for you.
- 4Export the results
Download as JSON, CSV or Excel, or pipe straight into your app, Google Sheets, or an AI agent.
Inputs
| Field | What it does | Type |
|---|---|---|
instance | The Lemmy instance host to scrape (bare domain, no https://). Examples: lemmy.world, lemmy.ml, beehaw.org, sh.itjust.works. | string |
mode | What to scrape: "feed" = the instance front-page feed; "community" = a single community (put its name in Query); "search" = search posts by keyword (put the ter | string |
query | For mode "community": the community name, e.g. "technology" or "[email protected]". For mode "search": the keywords to search for, e.g. "linux". Ignored in | string |
sort | How to sort posts. Hot/Active rank by recent engagement; New is chronological; the Top* options rank by score within a time window. | string |
maxItems | Maximum number of posts to return. The actor paginates (50 per request) until it reaches this many or runs out of posts. | integer |
notionConnector | Optional. Write each post as a page into your Notion when the run finishes. Authorize a Notion connector once in Settings → API & Integrations → MCP connectors, | string |
notionParentId | Optional. The Notion data source ID of the database to write into (only used if a Notion connector is set). Leave empty to create the pages privately in your wo | string |
What you get
A structured dataset — each result includes fields like:
authorauthorActorbodycommentscommunitycommunityTitledownvotesidnsfwpostUrlpublishedscorethumbnailtitleExport every run as JSON, CSV or Excel, or send it to your app, a database, Google Sheets, or an AI agent.
2 ready-to-run use cases
Scrape lemmy.world Front Page: Hot Posts + Scores
The hottest lemmy.world front-page posts as structured data: title, link, author, score, and comment count. Sort by Hot, New, or Top, no login needed.
Lemmy Keyword Search: Find Posts on Any Topic
Track a brand or topic on Lemmy by pulling every post that mentions your keyword across lemmy.world, sorted newest first, with author and score.
Lemmy Scraper
Scrape public posts from any Lemmy instance — the federated, Reddit-style link aggregator. Browse an instance's front-page feed, pull a single community, or search by keyword. No account, no API key, no login.
Uses the public Lemmy v3 REST API, so reads are fast and clean (structured JSON, not HTML scraping).
Modes
- Feed — the instance front page (
/api/v3/post/list). Just setinstanceandsort. - Community — one community's posts. Set
mode: "community"and put the community name inquery(e.g.technology, or cross-instance[email protected]). - Search — search posts by keyword. Set
mode: "search"and put the term inquery(e.g.linux).
What you get per post
id, title, url (the external link the post points to, if any), body (post text; markdown, with stray HTML stripped), author, authorActor (the creator's federated actor URL), community, communityTitle, score, comments, upvotes, downvotes, nsfw, thumbnail, published (ISO), and postUrl (the permalink on the instance, e.g. https://lemmy.world/post/123).
Fields that can be null
url/thumbnail— many posts are pure text discussions with no external link or image.body— link posts often have no body text.- Any field Lemmy omits for a given post comes back
nullrather than being dropped.
Input
| Field | Notes |
|---|---|
instance | Lemmy instance host (bare domain). Default lemmy.world. |
mode | feed, community, or search. Default feed. |
query | Community name (community mode) or search term (search mode). |
sort | Hot, Active, New, TopDay, TopWeek, TopMonth, TopAll. Default Hot. |
maxItems | Max posts to return (paginated 50 at a time). Default 100. |
Output
One dataset row per post, deduped by post id. Pricing is pay-per-result: you are only charged for genuine post rows (ok: true). Rows we couldn't deliver are never charged:
- invalid input — a single
ok: falsediagnostic row witherrorCode: "BAD_INPUT"(bad instance, bad mode, or a missing community name / search term), - no posts for this feed/community/search (
NO_RESULTS), - a missing community or non-Lemmy host (
NOT_FOUND), - rate limits or network errors (
RATE_LIMITED/NETWORK).
Proxy
The Lemmy v3 REST API is public and has no anti-bot, so no proxy is required and the default runs without one (saving proxy credits). Only enable Apify Proxy if an instance rate-limits your IP at very high volume.
Troubleshooting
NOT_FOUNDin community mode? Check the community name. If it lives on another instance, use the cross-instance form[email protected], or setinstanceto that instance directly.NO_RESULTS? The feed/community/search genuinely returned nothing on this instance — try a differentsort, a broader search term, or a larger instance.BAD_INPUT?communityandsearchmodes both requirequery.instancemust be a bare Lemmy domain likelemmy.world.
Example
{ "instance": "lemmy.world", "mode": "community", "query": "technology", "sort": "Hot", "maxItems": 50 }
Notes
Lemmy is federated: a large instance like lemmy.world also relays content from communities hosted elsewhere. The postUrl permalink points to the instance you scraped; authorActor and the community's federated identity tell you where the content originates.