学位论文 > 优秀研究生学位论文题录展示
OCR SYSTEM FOR Mongolian
作 者: 巴雅尔
导 师: 智敏
学 校: 内蒙古师范大学
专 业: 计算机应用技术
关键词: Optical Character Recognition OCR Traditional Mongolian script histogram model
分类号: TP391.43
类 型: 硕士论文
年 份: 2012年
下 载: 1次
引 用: 0次
阅 读: 论文下载
内容摘要
The countries of the world also develop the documents using many kinds of scripts in different languages. Most countries use standard fonts for recognizing the typewritten materials, methods which recognize into computer text have been researched and many kinds of program for digitizing the text had been designed. The issue of recognizing the typed and typewritten materials by standard font is considered as fully decided problem. On the contrary there are few research works for the recognizing the Traditional Mongolian script. For digitizing the Traditional Mongolian script, the recognizing problem hasn’t been fully decided yet and research work has being made till now.Large amount of Mongolian printed documents need to be digitized in digital library and various applications. Traditional Mongolian script has unique writing style and multi-font type variations, which bring challenges to Mongolian OCR research. As traditional Mongolian script has some characteristics, for example, one character may be part of another character, we define the character set for recognition according to the segmented components, and the components are combined into characters by rule-based post-processing module. For character recognition, a method based on projection profile analysis, line segmentation and word segmentation is presented. For character segmentation, a scheme is used to find the segmentation point by analyzing the properties of projection and connected components. As Mongolian has different font-types which are categorized into two major groups, the parameter of segmentation is adjusted for each group. A font-type classification method for the two font-type group is introduced. For recognition of Mongolian text mixed with Chinese ad English, language identification and relevant character recognition kernels are integrated. Experiments show that the presented methods are effective. The text recognition rate is90%on the test samples from practical documents with multi-font-types and mixed scripts.
|
全文目录
ABSTRACT 4-5 LIST OF FIGURES 5-6 Content 6-7 Chapter 1 Introduction 7-12 1.1 Theory and researching work part 7 1.1.1 Purpose of work 7 1.1.2 Set objectives in the work scope 7 1.1.3 Novelty of the work 7 1.1.4 Result of work 7 1.2 Description 7-8 1.3 Domestic and foreign research 8-12 1.3.1 Foreign researc 8-10 1.3.2 Domestic research status 10-12 Chapter Ⅱ Optical Character Recognition (OCR) technology 12-30 2.1 About OCR 12-14 2.1.1 History of the development 12 2.1.2 The rise of OCR technology 12-13 2.1.3 The development of OCR technology in China 13-14 2.2 Classification of OCR technology 14-16 2.3 Working sequence 16-30 2.3.1 Image Acquisition 17 2.3.2 Pre-Processing 17-20 2.3.3 Document Page Analysi 20-22 2.3.4 Applying ways 22-28 2.3.5 Post-Processing 28-30 Chapter Ⅲ The OCR system of realizatio 30-44 3.1 Feature of the Mongolian script 30-31 3.2 Knowing the Mongolian script 31 3.3 Processing 31-40 3.4 Finding the location of "nuruu" of word 40-44 CONCLUSION 44-45 APPENDIX 45-46 BIBLIOGRAPHY 46-48 ACKNOWLEDGEMENTS 48
|
相似论文
- 基于WEB的汉语言水平考试模拟测试与学习系统设计与实现,TP311.52
- 融合3G技术的物流管理软件开发与设计,TP311.52
- Research on Direct Calculation of Yielding and Buckling Strength Based on JTP Rules for Oil Tankers,U661.43
- Numerical and Experimental Study on Vibration Control of Marine Diesel Engin Using MR Dampers,U664.121
- 集中式WLAN结构中的VoIP语音质量评价系统,TN916.2
- 一类SAT Benchmark的算法研究及其应用,TP18
- 单泵喷水推进式船舶的操纵性能研究,U661.33
- 宽带电力线通信系统脉冲噪声特性研究,TM73
- 基于E-Model的VoIP语音质量评价系统,TN916
- 社会保障制度对中国城乡收入差距影响及对策研究,C913.7
- 枪械鉴别中图像拼接方法研究,TP391.41
- E模型基于延迟抖动的扩展,TN916
- 新型光电器件特性测试及模型建立等相关问题研究,TN36
- 多肽:N-乙酰氨基半乳糖转移酶2原核表达及其蓖麻蛋白样结构域同源建模,Q55
- 海马快速点燃模型建立及机制研究,R96
- 基于J2EE平台的面向煤矿企业的物资供应信息系统分析与设计,F251
- 洪灾避难迁移决策支持系统关键技术研究与应用,X43
- 我国软件企业实施CMM的必要性及障碍研究,F426.67
- 基于SWAT模型的非点源污染模拟研究,X52
- Web服务选择的研究,TP393.09
- 协同设计平台的实现及访问控制的研究,TP393.08
中图分类: > 工业技术 > 自动化技术、计算机技术 > 计算技术、计算机技术 > 计算机的应用 > 信息处理(信息加工) > 模式识别与装置 > 文字识别及其装置
© 2012 www.xueweilunwen.com
|