PDF Notes

From Pikes' Wiki
Revision as of 15:28, 8 August 2020 by DocGyver (talk | contribs) (Created page with " =References= * https://medium.com/@kaerumy/cleaning-up-scanned-documents-with-open-source-tools-9d87e15305b * https://github.com/scantailor/scantailor/wiki/Split-Pages * htt...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


References

Bash Script for converting Magazine

for fn in *.pdf ; do
	echo $fn
	#Cleanup from prior runs
	rm -f tmp/images*tif
	#Split pdf pages into individual tif files
	pdfimages -tiff "$fn" ./tmp/images

	#combine into a single tif file
	tiffcp tmp/images*tif $(basename "$fn" .pdf).tif

	####put combined TIF into docker folder for OCRMyPDF and wait for output

	#
	tiff2pdf -o ../../../OCRMyPDF/Input/$fn $(basename "$fn" .pdf).tif
done
rm tmp/images*tif

Bash command to rename yyyymmdd_hhmmss to yyyy-mm-dd_hh.mm.ss

find . -regextype posix-extended  -regex ".*/[0-9]{8}_[0-9]{6}.*" -exec rename -v 's/(\d{4})(\d{2})(\d{2})_(\d{2})(\d{2})(\d{2})/$1-$2-$3_$4.$5.$6/' {} \;