We show that our model can be fine-tuned for various downstream image modality tasks, including printed and handwritten text-line images. Our fine-tuned model achieved character error rates (CER) of 0.73%, 1.32%, and 3.14% on scanned, camera-captured, and handwritten recognition respectively. The accuracy of our OCR system which measures the number of correct predictions to the total number of predictions is at 94.10%, 93.82%, and 86.74% respectively. The model can also be fine-tuned on a smaller dataset while still preserving the CER. We believe that our model and evaluation provide the next milestone developments for the current state of the Thai-English OCRs.