Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

Recognizing Table Formatting From Text Files

Rajendran, Venkatprabhu

Abstract Details

2006, Master of Computer Science, Miami University, Computer Science and Systems Analysis.
Some text documents, legacy documents in particular, do not format sections of text containing tables. Documents like these are not as readable as they could be since the columns are not aligned. Non aligned columns prevent the reader from seeing the important patterns in the text. This thesis presents algorithms to help insert table formatting into free text. This algorithm parses the text to identify common syntactic patterns such as dates, dollar amounts, and times. Pattern matching techniques are then used to identify models of what type of data each column should contain. These models are called templates. Ambiguities often exist. These ambiguities make it necessary to rank the alternative templates and their associated tables. This thesis focuses on evaluating the candidate templates and associated tables to rank the different alternatives. The scoring function attempts to mimic the process that a human might go through when performing the same task. The effectiveness of the scoring is evaluated on a set of tables that have appeared in real electronic feeds.
Michael Zmuda (Advisor)
51 p.

Recommended Citations

Citations

  • Rajendran, V. (2006). Recognizing Table Formatting From Text Files [Master's thesis, Miami University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=miami1165610803

    APA Style (7th edition)

  • Rajendran, Venkatprabhu. Recognizing Table Formatting From Text Files. 2006. Miami University, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=miami1165610803.

    MLA Style (8th edition)

  • Rajendran, Venkatprabhu. "Recognizing Table Formatting From Text Files." Master's thesis, Miami University, 2006. http://rave.ohiolink.edu/etdc/view?acc_num=miami1165610803

    Chicago Manual of Style (17th edition)