Home | Search


2026-04-01

Very simple automated redaction script

Disclaimer

What?

About script

Probably don't use this in production

Script

(tested on 2026-03-31 on macOS)

#!/usr/bin/env bash

{
  printf '%s\n\n' 'Only output those entries that whose text contains the name of a person or organisation. Each line of output should contain exactly 12 tab-separated fields.'
  tesseract "$1" - tsv quiet
} |
tee /dev/stderr |
jq -Rs '{model:"gpt-oss-20b",input:.}' |
curl -s -H 'Content-Type: application/json' -H 'Authorization: Bearer no-key' --data-binary @- http://127.0.0.1:8080/v1/responses |
jq -r '.output_text // ([.output[] | select(.type == "message") | .content[] | select(.type == "output_text") | .text] | join(""))' |
tee /dev/stderr |
awk -F '\t' '
BEGIN {
  print "fill black"
  print "stroke none"
}
$1 == 5 && $7 != "" && $8 != "" && $9 != "" && $10 != "" {
  x1 = $7
  y1 = $8
  x2 = $7 + $9 - 1
  y2 = $8 + $10 - 1
  print "rectangle " x1 "," y1 " " x2 "," y2
}
' | magick "$1" -draw @- png:- > "$2"

Future directions

Subscribe

Enter email or phone number to subscribe. You will receive atmost one update per month

Comment

Enter comment