AI Advanced Read
The Advanced Read model is specifically tuned to extract all text from a document presented to it in a format that is readable. This includes text formatted in multiple columns on a page (eg a newspaper format) and is capable of handwriting recognition. This model will return one metadata field only containing the text present in the entire document. Using Scan2x metadata fields, it is then possible to use Regular Expressions (Regex) and other methods to extract the data required.
Handwriting language support
Language
|
Language
|
English
|
Japanese
|
Chinese Simplified
|
Korean
|
French
|
Portuguese
|
German
|
Spanish
|
Italian
|
Russian
|
Thai
|
Arabic
|
Printed language support
Language
|
Abaza
|
Abkhazian
|
Achinese
|
Acoli
|
Adangme
|
Adyghe
|
Afar
|
Afrikaans
|
Akan
|
Albanian
|
Algonquin
|
Angika (Devanagari)
|
Arabic
|
Asturian
|
Asu (Tanzania)
|
Avaric
|
Awadhi-Hindi (Devanagari)
|
Aymara
|
Azerbaijani (Latin)
|
Bafia
|
Bagheli
|
Bambara
|
Bashkir
|
Basque
|
Belarusian (Cyrillic)
|
Belarusian (Latin)
|
Bemba (Zambia)
|
Bena (Tanzania)
|
Bhojpuri-Hindi (Devanagari)
|
Bikol
|
Bini
|
Bislama
|
Bodo (Devanagari)
|
Bosnian (Latin)
|
Brajbha
|
Breton
|
Bulgarian
|
Bundeli
|
Buryat (Cyrillic)
|
Catalan
|
Cebuano
|
Chamling
|
Chamorro
|
Chechen
|
Chhattisgarhi (Devanagari)
|
Chiga
|
Chinese Simplified
|
Chinese Traditional
|
Choctaw
|
Chukot
|
Chuvash
|
Cornish
|
Corsican
|
Cree
|
Creek
|
Crimean Tatar (Latin)
|
Croatian
|
Crow
|
Czech
|
Danish
|
Dargwa
|
Dari
|
Dhimal (Devanagari)
|
Dogri (Devanagari)
|
Duala
|
Dungan
|
Dutch
|
Efik
|
English
|
Erzya (Cyrillic)
|
Estonian
|
Faroese
|
Fijian
|
Filipino
|
Finnish
|
Fon
|
French
|
Friulian
|
Ga
|
Gagauz (Latin)
|
Galician
|
Ganda
|
Gayo
|
German
|
Gilbertese
|
Gondi (Devanagari)
|
Greek
|
Greenlandic
|
Guarani
|
Gurung (Devanagari)
|
Gusii
|
Haitian Creole
|
Halbi (Devanagari)
|
Hani
|
Haryanvi
|
Hawaiian
|
Hebrew
|
Herero
|
Hiligaynon
|
Hindi
|
Hmong Daw (Latin)
|
Ho(Devanagiri)
|
Hungarian
|
Iban
|
Icelandic
|
Igbo
|
Iloko
|
Inari Sami
|
Indonesian
|
Ingush
|
Interlingua
|
Inuktitut (Latin)
|
Irish
|
Italian
|
Japanese
|
Jaunsari (Devanagari)
|
Javanese
|
Jola-Fonyi
|
Kabardian
|
Kabuverdianu
|
Kachin (Latin)
|
Kalenjin
|
Kalmyk
|
Kangri (Devanagari)
|
Kanuri
|
Karachay-Balkar
|
Kara-Kalpak (Cyrillic)
|
Kara-Kalpak (Latin)
|
Kashubian
|
Kazakh (Cyrillic)
|
Kazakh (Latin)
|
Khakas
|
Khaling
|
Khasi
|
K'iche'
|
Kikuyu
|
Kildin Sami
|
Kinyarwanda
|
Komi
|
Kongo
|
Korean
|
Korku
|
Koryak
|
Kosraean
|
Kpelle
|
Kuanyama
|
Kumyk (Cyrillic)
|
Kurdish (Arabic)
|
Kurdish (Latin)
|
Kurukh (Devanagari)
|
Kyrgyz (Cyrillic)
|
Lak
|
Lakota
|
Latin
|
Latvian
|
Lezghian
|
Lingala
|
Lithuanian
|
Lower Sorbian
|
Lozi
|
Lule Sami
|
Luo (Kenya and Tanzania)
|
Luxembourgish
|
Luyia
|
Macedonian
|
Machame
|
Madurese
|
Mahasu Pahari (Devanagari)
|
Makhuwa-Meetto
|
Makonde
|
Malagasy
|
Malay (Latin)
|
Maltese
|
Malto (Devanagari)
|
Mandinka
|
Manx
|
Maori
|
Mapudungun
|
Marathi
|
Mari (Russia)
|
Masai
|
Mende (Sierra Leone)
|
Meru
|
Meta'
|
Minangkabau
|
Mohawk
|
Mongolian (Cyrillic)
|
Mongondow
|
Montenegrin (Cyrillic)
|
Montenegrin (Latin)
|
Morisyen
|
Mundang
|
Nahuatl
|
Navajo
|
Ndonga
|
Neapolitan
|
Nepali
|
Ngomba
|
Niuean
|
Nogay
|
North Ndebele
|
Northern Sami (Latin)
|
Norwegian
|
Nyanja
|
Nyankole
|
Nzima
|
Occitan
|
Ojibwa
|
Oromo
|
Ossetic
|
Pampanga
|
Pangasinan
|
Papiamento
|
Pashto
|
Pedi
|
Persian
|
Polish
|
Portuguese
|
Punjabi (Arabic)
|
Quechua
|
Ripuarian
|
Romanian
|
Romansh
|
Rundi
|
Russian
|
Rwa
|
Sadri (Devanagari)
|
Sakha
|
Samburu
|
Samoan (Latin)
|
Sango
|
Sangu (Gabon)
|
Sanskrit (Devanagari)
|
Santali(Devanagiri)
|
Scots
|
Scottish Gaelic
|
Sena
|
Serbian (Cyrillic)
|
Serbian (Latin)
|
Shambala
|
Shona
|
Siksika
|
Sirmauri (Devanagari)
|
Skolt Sami
|
Slovak
|
Slovenian
|
Soga
|
Somali (Arabic)
|
Somali (Latin)
|
Songhai
|
South Ndebele
|
Southern Altai
|
Southern Sami
|
Southern Sotho
|
Spanish
|
Sundanese
|
Swahili (Latin)
|
Swati
|
Swedish
|
Tabassaran
|
Tachelhit
|
Tahitian
|
Taita
|
Tajik (Cyrillic)
|
Tamil
|
Tatar (Cyrillic)
|
Tatar (Latin)
|
Teso
|
Tetum
|
Thai
|
Thangmi
|
Tok Pisin
|
Tongan
|
Tsonga
|
Tswana
|
Turkish
|
Turkmen (Latin)
|
Tuvan
|
Udmurt
|
Uighur (Cyrillic)
|
Ukrainian
|
Upper Sorbian
|
Urdu
|
Uyghur (Arabic)
|
Uzbek (Arabic)
|
Uzbek (Cyrillic)
|
Uzbek (Latin)
|
Vietnamese
|
Volapük
|
Vunjo
|
Walser
|
Welsh
|
Western Frisian
|
Wolof
|
Xhosa
|
Yucatec Maya
|
Zapotec
|
Zarma
|
Zhuang
|
Zulu
|
|