LaTeX table reader in Julia

sswatson · October 26, 2020, 7:47pm

Is there a LaTeX table reader in Julia? I know there are several writers, but I haven’t been able to find something that takes the LaTeX source as input.

The reason I’m interested is that MathPix is really good at OCRing tables, but it outputs to LaTeX. (Example use case: I want to do some analysis with some data I find printed in a PDF.)

I know this would be really easy to throw together a basic version of, but I didn’t want to reinvent the wheel if it’s already out there.

tbeason · October 26, 2020, 8:05pm

Ha. This was not my first thought.

Not aware of anything. Is there an implementation in another language? You could start with that via some of the interop packages Julia has if so.

dpsanders · October 26, 2020, 10:06pm

Can you give an example of how a table would look?

sswatson · October 26, 2020, 11:45pm

To make sure the example is representative, I went to Google image search and OCRed one of the first tables I saw. Here was the output from MathPix:

\begin{aligned}
&\text { Table } 1.1 . \text { Nonlinear Model Results }\\
&\begin{array}{cccc}
\hline \hline \text { Case } & \text { Method#1 } & \text { Method#2 } & \text { Method#3 } \\
\hline 1 & 50 & 837 & 970 \\
2 & 47 & 877 & 230 \\
3 & 31 & 25 & 415 \\
4 & 35 & 144 & 2356 \\
5 & 45 & 300 & 556 \\
\hline
\end{array}
\end{aligned}

Here’s how it renders:

The part I’d be interested in is just the part between the \begin{array} and \end{array}.

@tbeason I’m starting to think you’re right about the difficulties here. At the most basic level, you’re just splitting into rows and then into entries. For the simplest examples, this would be trivial. But I’m thinking it’s likely to happen fairly often that MathPix outputs something that the naive algorithm isn’t prepared for. Handling that complexity in a graceful way is almost certainly more work than it’s worth.

My understanding is that there are LaTeX table readers in Python. That’s probably the right solution.

affans · October 27, 2020, 12:20am

Considering the different codes/environments one can use to generate a latex table, this is probably really difficult. I guess it would be marginally easy to develop something for mathpix only, as you have some gaurantee what the output code will include.

ericphanson · October 27, 2020, 12:23am

You could try using Pandoc to convert to a simpler format (like markdown pipe tables) and then try to parse those (I think they’re basicslly CSVs).

Topic		Replies	Views
Course about creating reports with julia Teaching & Outreach announcement , latex	18	2009	October 21, 2023
Creating and formatting data tables in Julia and saving as high quality image New to Julia	1	550	September 12, 2022
Exporting LaTeX tables General Usage question	3	4478	January 23, 2018
[ANN] PrettyTables.jl now has LaTeX backend Community	22	2894	January 12, 2020
Create text document without LaTeX General Usage	3	149	August 20, 2024

LaTeX table reader in Julia

Related topics