pdf to txt
How to translate pdf book images to text (results are very poor, and will need lots of corrections).
Search for ’tesseract english’ (or whatever language).
Arch: tesseract-data-eng and poppler-utils
pdftoppm -png file.pdf test
for x in *png; do tesseract -l eng “$x” - » out.txt done