PDF Notes

From Pikes' Wiki
Revision as of 14:04, 10 December 2021 by DocGyver (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


References

Bash Script for converting Magazine

for fn in *.pdf ; do
	echo $fn
	#Cleanup from prior runs
	rm -f tmp/images*tif
	#Split pdf pages into individual tif files
	pdfimages -tiff "$fn" ./tmp/images

	#combine into a single tif file
	tiffcp tmp/images*tif $(basename "$fn" .pdf).tif

	####put combined TIF into docker folder for OCRMyPDF and wait for output

	#
	tiff2pdf -o ../../../OCRMyPDF/Input/$fn $(basename "$fn" .pdf).tif
done
rm tmp/images*tif

Bash command to rename yyyymmdd_hhmmss to yyyy-mm-dd_hh.mm.ss

find . -regextype posix-extended  -regex ".*/[0-9]{8}_[0-9]{6}.*" -exec rename -v 's/(\d{4})(\d{2})(\d{2})_(\d{2})(\d{2})(\d{2})/$1-$2-$3_$4.$5.$6/' {} \;

Bash Script for processing scans

cd <sourcefolder>
tiffcp <list of tif files> <output.pdf>
cp <output.pdf> //wormhole/Media/OCRMyPDF/Input
cp //wormhole/Media/OCRMyPDF/Output/<output.pdf> <destination folder>