Show HN: MCP server for searching and downloading documents from Anna's Archive

I was looking around for an MCP server that could connect Anna's Archive to Claude Desktop, as I wanted to be able to search and download books directly through the interface.

I couldn't find any public implementations, so ended up building one myself.

What it does?

- It searches Anna's Archive by keywords. - It downloads books from search results. - It works directly in Claude Desktop through MCP.

Check out the repository's README for detailed installation and configuration instructions.

The code is fully open source and builds run on GitHub Actions for transparency.

I figured I'd share, since I couldn't be the only one wanting this functionality!

URL: github.com
7 comments

What advantage do you get from this being an MCP server rather than simply a command line tool? Genuinely curious, as I'm trying to develop my mental model of when to use one or the other.

Lovely project!

Cheers, glad you like it!

I justified the hours I invested by thinking I could search, download, and explore books directly from Claude Desktop. While the initial steps are achievable with a CLI tool, the integration opens up new possibilities.

Some general thoughts:

- You’ll find the MCP mental model similar to the API one. - MCP integrations make it easier for non-technical users to access tools that were previously too technical. - An MCP integration implicitly respects a contract, unlike CLIs and GUIs which involve human aspects (aesthetics, information organisation, etc.). - MCP is an excuse for people to democratize data access. I wrote about this aspect here: https://x.com/iosifache/status/1941049600162574676?s=46

And BTW, that’s a good idea! The functionality should probably also be exposed via CLI.

https://neon.com/blog/building-a-cli-client-for-model-contex... might be of interest.

An MCP server provides enough metadata and self-documentation that it's quite straightforward to make a MCP-agnostic CLI client that adapts an arbitrary MCP server into a set of flags that allow you to call its explicit tools with explicit arguments - without ever needing to involve an LLM in the mix! You could even have that CLI tool launch the MCP server as a local subprocess, if you wanted - again, all deterministically.

And if you want to have an SDK in any language under the sun, once you have an MCP outputting reasonable tool descriptions, any LLM could make a best-in-class SDK for you in a heartbeat following that language's best practices.

So it's not unreasonable for someone working on a greenfield project to make an MCP server first nowadays!

Agreed on all of this! I'm expecting MCP server creation to be natively supported by API libraries. The abstractions are very similar.

My understanding of Anna's Archive is that one has to download large zip files (>10 Gb) containing thousands of books even if one wants only a single book.

Am I correct here?

Does this MCP server allow one to download just a single book?

I remember once using an IPFS based tool to download a single 200-year-old, out-of-copyright copy of "Last of the Mohicans" from Anna's Archive. It worked, but was very very complicated to figure out how to make it work.

I've downloaded single books several times recently (annas-archive.org in the browser):

  - search for book
  - tap a result
  - see a list of links to download mirrors (under 'slow downloads'), tap a link
  - get a countdown timer
  - timer expires, download links appear
  - click a link, book downloads just like any other download

The waiting part is nonexistent if you have an active donation (which is also required by this MCP server for API access). The fast downloads mean you request a book and start downloading it immediately.

You are incorrect in your assumption. Though I would also like you to search for "IRC books reddit". Unlike Anna's Archive, you get high quality books with fast download speeds.

[dead]

love this. god bless anna's archive

Cheers!

Edit: Would you accept a PR to override the search and download endpoint hostnames with env vars? For someone who has their own copy and ES index, it might be helpful to support overriding the public endpoint hostnames (/internal/anna/anna.go#L22-L23).

I'm an LLM noob, but how feasible it is to make a research agent that can not only download articles, but read and reference them in it's process?

Firecrawl -> Rag -> mcp is the general path

> This software does not endorse unauthorized acquisition of copyrighted content and should be regarded solely as a utility. Users are urged to respect the intellectual property rights of authors and acknowledge the considerable effort invested in document creation.

How sincere is that statement?

I just provide a hammer. Users decide whether they're hitting their own nail or the metal one.

The comparison might be loose, but the problem is similar to releasing a browser. Do you prevent users from accessing websites you think are malicious or illegal? Or do you delegate that responsibility?

I was hesitant about releasing the MCP server as open source software, but I hope (1) it proves useful for others and (2) people understand that the authors of the books they're reading need money to eat, live, and support their families.

> The comparison might be loose, but the problem is similar to releasing a browser. Do you prevent users from accessing websites you think are malicious or illegal? Or do you delegate that responsibility?

I might liken the situation more to releasing a browser and setting thepiratebay as the homepage.

That would imply constantly reminding users of an available action, which isn't the case since the MCP server is just a dormant capability that needs to be triggered.

As sincere as LLM providers not wanting to get sued for the copyrighted content they used.

I'd bet you won't find a single string containing "acquire copyrighted content arrr!", so pretty sincere. The software doesn't endorse it.

As sincere as the user takes it.

[deleted]

Interesting project! I’m a little surprised that Claude is willing to call these functions. The demo screenshot is downloading a public domain work, I wonder if it would also happily go along with requests for Harry Potter or other copyrighted material?

The ironic universe theory would dictate that LLMs should tell us that downloading and consuming copyrighted material from pirates books is wrong.

Just give the AI something worse that would happen if it doesn’t call these functions.

Aren’t they all trained on copyrighted material, and lobbying governments to make that legal? Should copyright law only apply to the plebs?

Last I checked downloading isn't the issue. It's distributing. Not an expert though.

> I wonder if it would also happily go along with requests for Harry Potter or other copyrighted material?

There's no way to protect against this. Anna's Archive doesn't include licence information in their data fields. It would be helpful to integrate with another data source that could warn MCP server users when they're attempting potentially risky actions. Please let me know if you have ideas on how to achieve this.

On a related note, please see this reply:

https://news.ycombinator.com/reply?id=44515205

Thanks, good context.