I'd like to point out that Sci-Hub not only stores scientific papers, but it will retrieve official pdf's through an institutions proxy if it can and if necessary. It could perhaps store stuff there, but it could not provide that service there (yet).
Anyone can torrent these then host them on IPFS without sci-hub's involvement. However, I suspect you'll have a hard time getting any duplicity for all of this. It's over 50mil files!
Thank you so much. My hope is that IPFS can serve as the underlying distributed object storage, and then work up from there to have an indexed distributed search system on top of that (Elasticsearch within a docker container using versioned ES index backups in IPFS? With documents referenced by their IPFS content hash for de-duplication?).
Not AFAIK, but you good get a good idea by downloading all the torrent files, and extracting the torrent size from the metadata. Or maybe just download 1 torrent (since there's over 500 of them, one per 100k files), and multiply the average size per file.
I agree such meta data should be captured, but if the papers are converted to IPFS, it'll be easier to copy, ship, and then re-serve the data at end points.
Think of it as a [sneaker|dark]net CDN enabled by content addressable storage.
[1] https://ipfs.io/