PDF Notes
Jump to navigation
Jump to search
References
- https://medium.com/@kaerumy/cleaning-up-scanned-documents-with-open-source-tools-9d87e15305b
- https://github.com/scantailor/scantailor/wiki/Split-Pages
- http://www.tobias-elze.de/pdfsandwich/
- https://www.howtogeek.com/197195/how-to-remove-a-password-from-a-pdf-file-in-linux/
- https://stackoverflow.com/questions/36270555/open-a-pdf-with-blank-password-with-pdftk
- https://www.howtogeek.com/228796/how-to-extract-and-save-images-from-a-pdf-file-in-linux/
- https://ocrmypdf.readthedocs.io/en/latest/cookbook.html
- https://www.onetransistor.eu/2015/12/ocr-searchable-pdf-linux.html ==
Bash Script for converting Magazine
for fn in *.pdf ; do echo $fn #Cleanup from prior runs rm -f tmp/images*tif #Split pdf pages into individual tif files pdfimages -tiff "$fn" ./tmp/images #combine into a single tif file tiffcp tmp/images*tif $(basename "$fn" .pdf).tif ####put combined TIF into docker folder for OCRMyPDF and wait for output # tiff2pdf -o ../../../OCRMyPDF/Input/$fn $(basename "$fn" .pdf).tif done rm tmp/images*tif
Bash command to rename yyyymmdd_hhmmss to yyyy-mm-dd_hh.mm.ss
find . -regextype posix-extended -regex ".*/[0-9]{8}_[0-9]{6}.*" -exec rename -v 's/(\d{4})(\d{2})(\d{2})_(\d{2})(\d{2})(\d{2})/$1-$2-$3_$4.$5.$6/' {} \;