About images contained in documents

lucian.pricop · 8 June 2021 07:24

For documents containing images when converting to markdown, we got a few options:

extract the images, remove them from the document and keep the text format/indentation
extract the images on top of each page
extract the images and try to place them somewhere in the page

Given that markdown does not contain styles we cannot place the text and the image side by side, how should we treat this case?

ionut.orzu · 8 June 2021 07:49

For this document the results for the first 2 options you’ve listed:
First option result:

Second option result:

This option fails to maintain the font sizes but keeps the text indentation and placement in the page.

lucian.pricop · 8 June 2021 09:54

It seems the aspect of the document changes quite a lot, even if we eventually match the font type and size, perfect imitation can not be achieved. Can we instead add a place holder where the image was with a label to reference the attached graphic that would be placed on the same page in footer? Similar to adding references as foot notes in s scientific paper.

ionut.orzu · 16 June 2021 10:19

Following your advice we have got this result:

Hovering on images will tell the ‘Fig.’ number.

We can also go with this approach where the images are aligned to the left and push the content down:

lucian.pricop · 16 June 2021 10:37

I like the former version, because it focuses attention on the content. However we might face difficulties with footnotes support depending on how we export these.

The later version seems straight forward without any complications. Let’s see more opinions.

putt1ck · 16 June 2021 12:47

Be interesting to know how OpenOffice does it with PDF import, and whether there’s any mileage in using an OpenOffice server to do a conversion just for the layout.

ionut.orzu · 16 June 2021 13:10

The problem is not related to the PDF, it is that we are using Markdown for the processed file (so any document is converted to Markdown in the end) and it does not support columns in the layout, so we have to change the layout of the original document. This issue is not limited to images either, same problem with 2 columns of text layouts.

putt1ck · 23 August 2021 13:27

Been thinking about this and ways to utilise the internals of a given “starting” doc to recreate the structures at the end. Requires a way to “hide” tags in the document view, which can be utilised as placemarkers to match with the original document’s skeleton.