Create a program to determine which transcription has the fewest errors.
When text is transcribed from other media such as audio or images, occassionally errors are introduced. To determine which algorithm or service produces the most faithful reproduction, we compute the edit distance between the correct version and each transcribed version. The best service is the one with the fewest edit required. The following edit operations are allowed.
Each dataset is a text file containing several lines of ASCII text, each at most 30,000 characters. The first line is the correct text; all other lines contain errors. The following shows a trivial example of this format with areas near errors highlighted.
If debugging is the process of removing software bugs, then programming must be the process of putting them in. -- Edsger Dijkstra If debuging is the process of removing software bugs, then programing must be the process of puting them in. -- Edsger Dijkstra If debugging is the procedure of removing software bugs, then programming must be the procedure of putting them in. -- Edsger Dijkstra If debugging is the process of removing software bugs, then programming must be the process that puts them in. -- Edsger Dijkstra
In the dataset above, the second line requires 3 insertions (g, m, and t) to match the correct (first) line. The third line requires 4 splits (s to du twice, and s to re twice) to be equivalent. Finally, the last line requires 5 (2 merges, 1 split, and 2 insertions). Thus the second line requires the fewest edits (3).
Your program should determine the first line number requiring the fewest edits and the number of edits required for it. Report this as a JSON object with members: line and edits as shown below.
{ "answer" : { "line" : 2, "edits" : 3 } }