WelcomeUser Guide
ToSPrivacyCanary
DonateBugsLicense

©2025 Poal.co

This is why I have been stopping in old/small local book stores for years looking for early editions of many books. You can't edit what is already printed on the page. I am even considering buying multiple editions, building one of those book-scanner-bots then hooking it up to some OCR and doing a diff between the various print versions to show exactly what was changed between editions.

What do you think? Good idea or waste of time?

This is why I have been stopping in old/small local book stores for years looking for early editions of many books. You can't edit what is already printed on the page. I am even considering buying multiple editions, building one of those book-scanner-bots then hooking it up to some OCR and doing a diff between the various print versions to show exactly what was changed between editions. What do you think? Good idea or waste of time?

(post is archived)

[–] 4 pts (edited )

Piracy is usually the answer, whenever whiny (((media niggers))) complain about something.

[–] 3 pts

I guess I'm a luddite.

[–] 3 pts

Nothing wrong with physical media. I have it for everything. I have a very large local storage server for my movies/music/tv/etc but I have physical media that I ripped and converted it form. I backup the digitals off-site incase the physical is destroyed by a natural disaster or something.

It is a side effect of me wanting to make sure I never lose important pictures/video/etc from life for decades. Having my entertainment media also backed up is just a bonus. I also can stream it anywhere I have a internet connection and don't rely on any "cloud" service.

[–] 1 pt

That sounds awesome, but I'd worry about burning out if you decide to do too many books at once without anticipating the potential workload.

If I was going to do something like this, here's what I'd do:

  • Get a high-quality camera that's computer-controlled (focus, f-stop, exposure, maybe zoom). If it's a commercial camera make sure it can be easily removed from the frame, since someone will see the camera and want to use it outside the frame.
  • Mark out the maximum available frame on the capture plate. You don't want to capture an entire book only to realize that the top 10% of every page is cut off.
  • Make a set of weights to hold down the book. Left and right side are essential, a middle weight may be redundant unless you're capturing newer books with less-flexible spines. Some sort of gripping surface on the outside weights is essential as you don't want to be messing with weight placement to hold the pages in place (which would increase time-per-page). Maybe some sort of silicone, or if you set up metal plates hinged on the long axis on an adjustable rack you could use felt.
  • Set up a dedicated computer for image capture, with an interface for specifying the book's metadata, and a dedicated button to start a capture. Multi-tasking it on an existing computer would increase the chances of failure. You could get away with building a RasPi into the frame for this function but its ability to perform OCR would be limited, therefore the processing may need to be done on another machine.
  • Set up dedicated storage with encrypted offsite backups. You don't want a fire to destroy all your work.
  • Write programs to automate every possible aspect of processing each page. Python would be the language of choice due to its flexibility and ease of writing. E.g.:
    • A capture program on the dedicated computer specifically for accurate image capture and storage. Detects out-of-focus images and recaptures, strips EXIF data, splits images into left and right pages, and stores the image in a hierarchical folder structure for future retrieval. Also stores information on each page into a database.
    • A processing program that monitors the database for any new images and runs it through OCR, runs the result through a spelling and grammar checker for consistency, and stores both the raw output and a restructured output as text files within the folder structure (i.e. remove non-paragraph line breaks, and remove headers, footers, and page numbers). Also update the database to specify the page has been OCR'ed, and add any spelling/grammar inconsistencies to a separate table that references the book/page number.
  • Create some sort of interface for verification of every page. Have every page image and its OCR available for side-by-side viewing, and have a task queue of all spelling/grammar errors available for direct access for error correction.
  • (Optional) Make the above interface available to the public, with appropriate protections. Have all user submissions added to a new table for manual verification by a trusted user. This will be costly, either by dint of maintaining a self-hosted server (risk of DDOS, doxing, and hackers gaining a bridgehold), or by running it in the cloud (massive server and storage costs).
[–] 1 pt

This has been going on for at least 10 years.

[–] 1 pt

!

[–] 1 pt

Good idea. We goys need to do that too. Only jews are taking great interest at owning/controlling informations, especially the historical ones.

[–] 1 pt

I have eye sight issues so ebooks have been my go to forever. I have been ripping the arm off every kindle book I buy for years and years. I have found that they do just delete them but I rip them as I buy them. I can get around this latest round of crap but I will never buy another if it is completely blocked. I have spent thousands on books and I will simply pirate the rest or just reread the old ones. I read I over 150 books a year according to their app. If they stop me from keeping a legit copy then they won't care that they lose me as a customer. I know they won't. I will find some other way to keep reading.

[–] 0 pt

Kindle sucks. Every update makes it slower and have a worse UI, though the hardware is decent.

I got fed up and got a Kobo. I never update it and get all my books from LibGen or Anna's archive. I haven't bought an ebook in years unless it's like $0.99 and that's only if I really like the author. There's no way in hell I'm spending more on an ebook than a physical copy.

[–] 0 pt

Agreed, I read so much my friends think I am crazy. Maybe I am?

I will not be held back by this bullshit though. I don't have enough storage for the "hidden library" but when I do.....