Overview
- Aim for perfect text and formatting (markdown instructions here).
- Quality over quantity. Don’t hurry - stay within daily agreed work limit. More daily work sometimes implies more errors.
Workflow
Our proofreaders follow various workflows to achieve perfect text based on what suits them.
साधारण-परीक्षण-रीतिः
यान्त्रिक-शोधनम् - First, use AI LLM to correct 10-20 lines (instructions here).
- (Note that, depending on the prompt, this may yield sanskrit text with a lot of hyphens and spaces (eg. दीर्घ-सिद्धिस् साध्ये सताम् अस्त्व् इत्य् अवदत्) - let them remain as long as they follow rules mentioned elsewhere.)
Then proceed to manual proofreading below. ततो मानुष-शोधनम्॥ In each of the below 2 steps, you should do the following -
- Manually detect and correct errors - provided both the original image and your language skills agree with the correction.
- Mark doubtful places and ask for clarification as described here - eg. where your language intuition and the original image conflict.
- In doing the above, it’s important that you try to understand the text to the extant possible.
- Beware of some common errors -
- Note - There should be no hyphens between padas, and no spaces within padas.
The steps:
मूल-प्रधान-शोधनम्: First go through a page or block of the text by repeatedly doing this: read the next few words in the image, and then fix/ mark the same in the typed text.
टङ्कित-पाठ-प्रधान-शोधनम्: Then, read the corrected text from the previous step directly (preferably aloud, and preferably in a different script as described here), looking at the original image only where you are suspect an error.
These steps are to be followed in sequence -
never blindly trust the AI output, even with near perfect text.
If in our experience, omitting one of the steps (usually the last) does not lead to many errors or loss of speed, you will be allowed to omit it - be sure to ask.
Fully manual workflow
Within sanskrit text, you insert spaces and hyphens to the maximum possible extant, subject to the pada-separation rules. So, you get text like - सिद्धिस् साध्ये सताम् अस्त्व् इत्य् अवदत् instead of सिद्धिस्साध्ये सतामस्त्वित्यवदत्.
Contribution levels
Our expectation: when we read the corrected text, we expect to have atleast the same experience as reading the original pdf (if not better). Even otherwise, if you leave the text in a significantly better state than earlier, it is valuable.
- Top level: Perfect text and formatting.
- Next level: Perfect text, with basic formatting. Reader won’t feel particular urge to consult the source most of the time.
- Next level: Almost perfect text (possibly missing diacritics and accents), with basic formatting (contiguous paragraphs, footnotes etc.).
- And so on.
We generlly expect top level contribution from paid proofreaders.
Often, OCR makes very few mistakes (<5) on a page if the print is good. It probably takes more human hours to add structure than to proofread. We should take structure seriously.
Typing correct symbols
- Please use the correct symbols. Common mistakes: |(pipe) instead of ।(daNDa), :(colon) instead of visarga(ः), ०(शून्यम्) instead of ॰ (devanAgarI abbreviation sign).
Special characters
If you cannot type unusual unicode characters, copy them from here and paste.
-
IAST diacritics
- ā Ā ī Ī ū Ū ṛ Ṛ ṝ Ṝ ḷ Ḷ ḹ Ḹ
- ṃ Ṃ ḥ Ḥ
- ṅ Ṅ ñ Ñ
- ṭ Ṭ ḍ Ḍ
- ś Ś ṣ Ṣ
-
ISO
- ē ō r̥ r̥̄ l̥ l̥̄ ṁ
-
Vedic Svaras
॒ ॑
-
No harm using ISO instead of IAST - we can fix it later.
-
No harm ignoring initial letter capitalization (ie ṣ instead of Ṣ and so on).
Telugu
Certain defects are common in old sanskrit texts published in telugu script. Please use sanskrit knowledge to detect and correct those. If in doubt ask with screenshot.
One needs to remove fake spaces. For example ఆర్యమిశ్రాః instead of ఆర్యమి శ్రాః , and బహూనామాస్తికానామహాత్మనాం instead of బహూనా మా స్తికానామహాత్మనాం.
Also అథచాస్యాద ర్శా౯ - old telugu and kannaDa texts use something like ౯ (with with extra curls) for n - so it should be అథచాస్యాదర్శాన్.
Sometimes, instead of వో, they use a symbol like వేృ. So, this should be recognized as వో.
Telugu sanskrit books often use dh ध् instead of th थ् (and rarely vice-versa)- for example - గ్రంథోయం instead of గ్రంధోయం.
The following are often confused by proofreaders (so beware)-
- na, sa
- n-maatraa, m-maatraa
- v-maatraa, p-maatraa
- ड, द, ध्-maatraas and letters
ఙ ఞ - not used in common telugu, are used. So beware of mistaking those too.
In case of tamiL or maNipravALa texts in telugu script, printing ऴ् and ऱ् would be complicated.