PDF Notes: Difference between revisions
Jump to navigation
Jump to search
(Created page with " =References= * https://medium.com/@kaerumy/cleaning-up-scanned-documents-with-open-source-tools-9d87e15305b * https://github.com/scantailor/scantailor/wiki/Split-Pages * htt...") |
No edit summary |
||
| Line 35: | Line 35: | ||
<pre> | <pre> | ||
find . -regextype posix-extended -regex ".*/[0-9]{8}_[0-9]{6}.*" -exec rename -v 's/(\d{4})(\d{2})(\d{2})_(\d{2})(\d{2})(\d{2})/$1-$2-$3_$4.$5.$6/' {} \; | find . -regextype posix-extended -regex ".*/[0-9]{8}_[0-9]{6}.*" -exec rename -v 's/(\d{4})(\d{2})(\d{2})_(\d{2})(\d{2})(\d{2})/$1-$2-$3_$4.$5.$6/' {} \; | ||
</pre> | |||
=Bash Script for processing scans= | |||
<pre> | |||
cd <sourcefolder> | |||
tiffcp <list of tif files> <output.pdf> | |||
cp <output.pdf> //wormhole/Media/OCRMyPDF/Input | |||
cp //wormhole/Media/OCRMyPDF/Output/<output.pdf> <destination folder> | |||
</pre> | </pre> | ||
Latest revision as of 14:04, 10 December 2021
References
- https://medium.com/@kaerumy/cleaning-up-scanned-documents-with-open-source-tools-9d87e15305b
- https://github.com/scantailor/scantailor/wiki/Split-Pages
- http://www.tobias-elze.de/pdfsandwich/
- https://www.howtogeek.com/197195/how-to-remove-a-password-from-a-pdf-file-in-linux/
- https://stackoverflow.com/questions/36270555/open-a-pdf-with-blank-password-with-pdftk
- https://www.howtogeek.com/228796/how-to-extract-and-save-images-from-a-pdf-file-in-linux/
- https://ocrmypdf.readthedocs.io/en/latest/cookbook.html
- https://www.onetransistor.eu/2015/12/ocr-searchable-pdf-linux.html ==
Bash Script for converting Magazine
for fn in *.pdf ; do echo $fn #Cleanup from prior runs rm -f tmp/images*tif #Split pdf pages into individual tif files pdfimages -tiff "$fn" ./tmp/images #combine into a single tif file tiffcp tmp/images*tif $(basename "$fn" .pdf).tif ####put combined TIF into docker folder for OCRMyPDF and wait for output # tiff2pdf -o ../../../OCRMyPDF/Input/$fn $(basename "$fn" .pdf).tif done rm tmp/images*tif
Bash command to rename yyyymmdd_hhmmss to yyyy-mm-dd_hh.mm.ss
find . -regextype posix-extended -regex ".*/[0-9]{8}_[0-9]{6}.*" -exec rename -v 's/(\d{4})(\d{2})(\d{2})_(\d{2})(\d{2})(\d{2})/$1-$2-$3_$4.$5.$6/' {} \;
Bash Script for processing scans
cd <sourcefolder> tiffcp <list of tif files> <output.pdf> cp <output.pdf> //wormhole/Media/OCRMyPDF/Input cp //wormhole/Media/OCRMyPDF/Output/<output.pdf> <destination folder>