Skip to content

Need help parsing a standard nginx directory listing. Different results with ruby and jruby. #1888

@amo13

Description

@amo13

Dear Nokogiri community,
I want to parse the content of my nginx directory listing and it all works just fine with normal ruby, but with jruby I can't get nokogiri to behave the same way than on normal ruby and parse the listing. Does anyone have an idea to help me out?

To Reproduce

#! /usr/bin/env ruby

require 'nokogiri'
require 'open-uri'

Nokogiri.HTML(open("https://archive.******.de/i9305"))

Result with ruby

=> #(Document:0x2afe5afd8350 {
  name = "document",
  children = [
    #(DTD:0x2afe5b0055a8 { name = "html" }),
    #(Element:0x2afe5ab1767c {
      name = "html",
      children = [
        #(Text "\r\n"),
        #(Element:0x2afe5ac382f4 {
          name = "head",
          children = [
            #(Element:0x2afe5acd0f68 {
              name = "title",
              children = [ #(Text "Index of /i9305/")]
              })]
          }),
        #(Text "\r\n"),
        #(Element:0x2afe5aec8668 {
          name = "body",
          attributes = [
            #(Attr:0x2afe5aefca08 { name = "bgcolor", value = "white" })],
          children = [
            #(Text "\r\n"),
            #(Element:0x2afe5af3cf40 {
              name = "h1",
              children = [ #(Text "Index of /i9305/")]
              }),
            #(Element:0x2afe5aba4ff4 { name = "hr" }),
            #(Element:0x2afe5ab9822c {
              name = "pre",
              children = [
                #(Element:0x2afe5af9ee48 {
                  name = "a",
                  attributes = [
                    #(Attr:0x2afe5af9e3d0 { name = "href", value = "../" })],
                  children = [ #(Text "../")]
                  }),
                #(Text "\r\n"),
                #(Element:0x2afe5affdad8 {
                  name = "a",
                  attributes = [
                    #(Attr:0x2afe5aff6fe4 {
                      name = "href",
                      value = "ResurrectionRemix/"
                      })],
                  children = [ #(Text "ResurrectionRemix/")]
                  }),
                #(Text "                                 03-Mar-2019 10:12                   -\r\n"),
                #(Element:0x2afe5aaecea4 {
                  name = "a",
                  attributes = [
                    #(Attr:0x2afe5aaee858 { name = "href", value = "TWRP/" })],
                  children = [ #(Text "TWRP/")]
                  }),
                #(Text "                                              12-Mar-2019 19:27                   -\r\n"),
                #(Element:0x2afe5ac48690 {
                  name = "a",
                  attributes = [
                    #(Attr:0x2afe5ac67400 {
                      name = "href",
                      value = "override_TWRP/"
                      })],
                  children = [ #(Text "override_TWRP/")]
                  }),
                #(Text "                                     12-Mar-2019 19:27                   -\r\n")]
              }),
            #(Element:0x2afe5aa751c4 { name = "hr" })]
          }),
        #(Text "\r\n")]
      })]
  })

Result with jruby:

=> #(Document:0x7e6 {
  name = "document",
  children = [
    #(Element:0x7e8 {
      name = "html",
      children = [
        #(Element:0x7ea { name = "head" }),
        #(Element:0x7ec {
          name = "body",
          children = [ #(Element:0x7ee { name = "h" })]
          })]
      })]
  })

Environment

ruby:

# Nokogiri (1.10.2)
    ---
    warnings: []
    nokogiri: 1.10.2
    ruby:
      version: 2.6.0
      platform: x86_64-linux
      description: ruby 2.6.0p0 (2018-12-25 revision 66547) [x86_64-linux]
      engine: ruby
    libxml:
      binding: extension
      source: packaged
      libxml2_path: "/home/amo/.rvm/gems/ruby-2.6.0/gems/nokogiri-1.10.2/ports/x86_64-pc-linux-gnu/libxml2/2.9.9"
      libxslt_path: "/home/amo/.rvm/gems/ruby-2.6.0/gems/nokogiri-1.10.2/ports/x86_64-pc-linux-gnu/libxslt/1.1.33"
      libxml2_patches:
      - 0001-Revert-Do-not-URI-escape-in-server-side-includes.patch
      - 0002-Remove-script-macro-support.patch
      - 0003-Update-entities-to-remove-handling-of-ssi.patch
      libxslt_patches: []
      compiled: 2.9.9
      loaded: 2.9.9

jruby:

# Nokogiri (1.10.2)
    ---
    warnings: []
    nokogiri: 1.10.2
    ruby:
      version: 2.5.0
      platform: java
      description: jruby 9.2.5.0 (2.5.0) 2018-12-06 6d5a228 OpenJDK 64-Bit Server VM 25.212-b01
        on 1.8.0_212-b01 +jit [linux-x86_64]
      engine: jruby
      jruby: 9.2.5.0
    xerces: Xerces-J 2.12.0
    nekohtml: NekoHTML 1.9.21

Any advice is greatly appreciated.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions