You seem like the kind of people that read. How do you convert to epub?

fort_burp@feddit.nl · 5 months ago

You seem like the kind of people that read. How do you convert to epub?

stupid_asshole69 [none/use name]@hexbear.net · 5 months ago

First you have to accept epub into your heart

fort_burp@feddit.nl · 5 months ago

OK I swallowed the Kobo, what next?

miz [any, any]@hexbear.net · 5 months ago

rookie mistake. you have to liquefy the Kobo and inject it like the documentary Pulp Fiction

ThanksObama5223 [he/him]@hexbear.net · 5 months ago

starkillerfish [she/her]@hexbear.net · 5 months ago

pdf is a printing format. epub is a type of html essentially. you essentially want to turn a book into a webpage. it is practically impossible unless you do it manually or the pdf is basically blank and single column without footnotes.

TLDR: pdf and epub are very different formats. you cannot easily convert pdf to epub (but epub to pdf is much easier).

fort_burp@feddit.nl · 5 months ago

it is practically impossible unless you do it manually or the pdf is basically blank and single column without footnotes.

Yea, seems like it :/ thanks

thefunkycomitatus [comrade/them, they/them]@hexbear.net · 5 months ago

I would just find the books in .epub to begin with. PDF is an evil format. Just in case anyone needs book sources:

https://fmhy.net/reading

fort_burp@feddit.nl · 4 months ago

woah, cool site!

Edie [it/its, she/her]@hexbear.net · edit-2 5 months ago

PDFs are styling with text. The footnotes are usually just plain text, with no connection, no different from the rest of the text—unlike in EPUBs where they are usually connected through anchors, bonus if they have epub:type, and the footnote text is usually away from the rest of the chapter text. AFAIK there is no good way of automatically converting from PDF to EPUB. So to answer the question in the title, manually.

ⓘ This user is suspected of being a cat. Please report any suspicious behavior.

fort_burp@feddit.nl · 5 months ago

Bah, thanks. It’s so annoying bc highlighting the page and doing copy paste also mixes the text of the two columns.

dead [he/him]@hexbear.net · 5 months ago

Epub is a zip file with html files inside of it. You can rename epub to zip and extract it with any archive tool.

PDF is a document format.

Book PDFs can contain text or sometimes pictures of text if it is a scanned book. Images of text can be converted into text using OCR software.

If you have like some basic programming knowledge, you could write a script to convert your specific book to the epub style you want.

You could see if the book is already available in epub form on LibGen.

https://en.wikipedia.org/wiki/Library_Genesis

techpeakedin1991@lemmy.ml · 5 months ago

Like others have said, you probably have to do it manually. If the pdf has a lot of pages though, and they’re all in a similar format, it might be easier to script it using something like https://github.com/jsvine/pdfplumber

oscardejarjayes [comrade/them]@hexbear.net · edit-2 5 months ago

ePUB is basically zipped HTML, so while it’s easy to convert from, it’s hard to convert to. You might just want to try to find your book in an alternative format from somewhere like Annas Archive. I think azw3 and mobi’s can be converted to ePUB easier.

Really the only good way is to manually recreate the book, there’s no good automatic pdf to epub converter. You might be able to hire a guy on fiverr or such to do it for you, that’s the closest I can think of to automatic.

bobs_guns@lemmygrad.ml · 5 months ago

Use koreader in two column mode if you can. It’s kinda funky but will let you read the text at a more appropriate size if that’s your issue

fort_burp@feddit.nl · 4 months ago

lol yea, size is the issue and it’s just so awkward to read

Beaver [he/him]@hexbear.net · 5 months ago

I haven’t tried this tool, but it claims to be able to re-flow PDF text: https://www.willus.com/k2pdfopt/

Edie [it/its, she/her]@hexbear.net · edit-2 5 months ago

The PDF Conversion Tips page is interesting.

ⓘ This user is suspected of being a cat. Please report any suspicious behavior.

fort_burp@feddit.nl · 4 months ago

From that link:

I’ve been on mobileread.com since 2011, regularly reading the PDF forum, and probably the most common question from new members regarding PDFs is about the best way to view them on e-readers such as the Kindle, Kobo, Nook, etc.

How can you be so helpful, Edie? Thanks!

fort_burp@feddit.nl · 5 months ago

Cool, thank you. I’ll give it a try.

Edamamebean [she/her]@hexbear.net · edit-2 5 months ago

Instead of doing any converting you could probably find the epub on Anna’s Archive. I’ve never had any problems finding books on there, even pretty obscure stuff. They also seem to have everything in both epub and pdf. Good luck friend!

https://annas-archive.org/

fort_burp@feddit.nl · 4 months ago

Good advice, thanks! Actually I got the PDF from Anna, there was no epub available :/

ClathrateG [none/use name]@hexbear.net · 5 months ago

https://cloudconvert.com/pdf-to-epub

First google result for ‘pdf to epub online converter’, just tried it myself on a random pdf and the converted epub opened fine in calibre

fort_burp@feddit.nl · 5 months ago

Thanks I will give it a try when I get back. Did the text come out ok for you, like were all the words in the same order?

ClathrateG [none/use name]@hexbear.net · edit-2 5 months ago

From my glance at the first paragraph yes, even the font was the same

Is that an issue you’ve encountered with other converters?

Edie [it/its, she/her]@hexbear.net · edit-2 5 months ago

I tried https://redstarpublishers.org/adoratsky.pdf in the one you shared. It’s good compared to all the PDF converts I’ve seen. And if I had to read it without making any changes to it, it’ll certainly do. But it could use some manual intervention. There are random line breaks, blockquotes are not blockquotes, and footnotes are just… in the text. That’s at least what I see at a glance.

Edit: Wait, hang on, cloudconvert is just using Calibre! It’s the exact same output. Every css class is calibre[number]. And stuff like the OPF contain metadata with calibre: <dc:contributor opf:role="bkp">calibre (8.4.0) [https://calibre-ebook.com/]</dc:contributor>

ⓘ This user is suspected of being a cat. Please report any suspicious behavior.

fort_burp@feddit.nl · 4 months ago

Yes, the same column fuckery persists :/

Monk3brain3 [any, he/him]@hexbear.net · 5 months ago

Calibre is your friend

Edie [it/its, she/her]@hexbear.net · 5 months ago

Please read the post before commenting /gen

ⓘ This user is suspected of being a cat. Please report any suspicious behavior.

stupid_asshole69 [none/use name]@hexbear.net · 5 months ago

Pdfs an be set up in a lot of different ways.

One way is where text is encoded into the document like if text were aligned and sized just right for one of those typewriters with the white out ribbon. Text encoded into the pdf in this way can be selected, edited and copied just like any other kind of document.

Another way is where text is embedded into the document, like a picture of a newspaper article pasted onto a piece of paper. Text in the pdf like this can’t be manipulated or selected and is the kind you’re having problems with.

The way to get around that kind of text is optical character recognition. OCR software analyzes images of text and figures out what characters it corresponds to. Just chase down some free ocr package and input your pdf.

fort_burp@feddit.nl · 4 months ago

Cool, thank you very much. I got k2pdf (courtesy of another dope-ass bear) to get the two columns + footnotes in the original pdf into a pdf that is just one column with footnotes clearly distinguishable. Now I need just what you’re saying because the result of the k2pdf conversion is an image that I can’t select text from (but the words are all in the right order, which is good).

Tesseract seems like a popular choice, I’ll give that a try.

Edie [it/its, she/her]@hexbear.net · edit-2 4 months ago

Tesseract doesn’t support PDF input, you’ll need some other program like ocrmypdf (which I have used. It uses tesseract), or extract each page to it’s own image (which I have also done but I forget how right now.)

ⓘ This user is suspected of being a cat. Please report any suspicious behavior.

fort_burp@feddit.nl · 4 months ago

Thanks again! You’re the best :)

This looks like exactly what I need. After getting the formatting right with k2pdf I can then use ocrmypdf to get it back to text form and then just ctrl + a copy to writer and export as epub, since the pdf size is like 15x the epub size.