Skip to content

Conversation

Zegnat
Copy link
Member

@Zegnat Zegnat commented Mar 25, 2018

This seems to be an under-documented feature of DOMDocument::saveHTML. It may sometimes add a \n at the end of its output. So when you just concatenate the string outputs of this method you may be introducing line breaks that weren’t in the original source.

I think adding a trim of some sort is wrong, as you might then also be trimming Text nodes that actually should contain the line break.

Instead what I have found to fix this is to move all the nodes into a DocumentFragment and retrieving the HTML of this fragment in one go.


Prior to this fix, the parser returns the following, note the \n in ["properties"]["content"][0]["html"]:

<div class="h-entry"><div class="e-content"><p>1</p><p>2</p></div></div>
{
  "type": [ "h-entry" ],
  "properties": {
    "content": [ {
      "html": "<p>1</p>\n<p>2</p>",
      "value": "1 2"
    } ]
  }
}

@aaronpk aaronpk added this to the 0.4.2 milestone Mar 25, 2018
@aaronpk aaronpk merged commit 5eeef8b into microformats:master Mar 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants