Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
miami1165610803.pdf (339.37 KB)
ETD Abstract Container
Abstract Header
Recognizing Table Formatting From Text Files
Author Info
Rajendran, Venkatprabhu
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=miami1165610803
Abstract Details
Year and Degree
2006, Master of Computer Science, Miami University, Computer Science and Systems Analysis.
Abstract
Some text documents, legacy documents in particular, do not format sections of text containing tables. Documents like these are not as readable as they could be since the columns are not aligned. Non aligned columns prevent the reader from seeing the important patterns in the text. This thesis presents algorithms to help insert table formatting into free text. This algorithm parses the text to identify common syntactic patterns such as dates, dollar amounts, and times. Pattern matching techniques are then used to identify models of what type of data each column should contain. These models are called templates. Ambiguities often exist. These ambiguities make it necessary to rank the alternative templates and their associated tables. This thesis focuses on evaluating the candidate templates and associated tables to rank the different alternatives. The scoring function attempts to mimic the process that a human might go through when performing the same task. The effectiveness of the scoring is evaluated on a set of tables that have appeared in real electronic feeds.
Committee
Michael Zmuda (Advisor)
Pages
51 p.
Subject Headings
Computer Science
Keywords
Pattern Matching Techniques
;
Entities
;
Templates
;
Evaluation
;
Scoring Function
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Rajendran, V. (2006).
Recognizing Table Formatting From Text Files
[Master's thesis, Miami University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=miami1165610803
APA Style (7th edition)
Rajendran, Venkatprabhu.
Recognizing Table Formatting From Text Files.
2006. Miami University, Master's thesis.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=miami1165610803.
MLA Style (8th edition)
Rajendran, Venkatprabhu. "Recognizing Table Formatting From Text Files." Master's thesis, Miami University, 2006. http://rave.ohiolink.edu/etdc/view?acc_num=miami1165610803
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
miami1165610803
Download Count:
499
Copyright Info
© 2006, all rights reserved.
This open access ETD is published by Miami University and OhioLINK.