-
-
Notifications
You must be signed in to change notification settings - Fork 922
Closed
Description
Dear Nokogiri community,
I want to parse the content of my nginx directory listing and it all works just fine with normal ruby, but with jruby I can't get nokogiri to behave the same way than on normal ruby and parse the listing. Does anyone have an idea to help me out?
To Reproduce
#! /usr/bin/env ruby
require 'nokogiri'
require 'open-uri'
Nokogiri.HTML(open("https://archive.******.de/i9305"))
Result with ruby
=> #(Document:0x2afe5afd8350 {
name = "document",
children = [
#(DTD:0x2afe5b0055a8 { name = "html" }),
#(Element:0x2afe5ab1767c {
name = "html",
children = [
#(Text "\r\n"),
#(Element:0x2afe5ac382f4 {
name = "head",
children = [
#(Element:0x2afe5acd0f68 {
name = "title",
children = [ #(Text "Index of /i9305/")]
})]
}),
#(Text "\r\n"),
#(Element:0x2afe5aec8668 {
name = "body",
attributes = [
#(Attr:0x2afe5aefca08 { name = "bgcolor", value = "white" })],
children = [
#(Text "\r\n"),
#(Element:0x2afe5af3cf40 {
name = "h1",
children = [ #(Text "Index of /i9305/")]
}),
#(Element:0x2afe5aba4ff4 { name = "hr" }),
#(Element:0x2afe5ab9822c {
name = "pre",
children = [
#(Element:0x2afe5af9ee48 {
name = "a",
attributes = [
#(Attr:0x2afe5af9e3d0 { name = "href", value = "../" })],
children = [ #(Text "../")]
}),
#(Text "\r\n"),
#(Element:0x2afe5affdad8 {
name = "a",
attributes = [
#(Attr:0x2afe5aff6fe4 {
name = "href",
value = "ResurrectionRemix/"
})],
children = [ #(Text "ResurrectionRemix/")]
}),
#(Text " 03-Mar-2019 10:12 -\r\n"),
#(Element:0x2afe5aaecea4 {
name = "a",
attributes = [
#(Attr:0x2afe5aaee858 { name = "href", value = "TWRP/" })],
children = [ #(Text "TWRP/")]
}),
#(Text " 12-Mar-2019 19:27 -\r\n"),
#(Element:0x2afe5ac48690 {
name = "a",
attributes = [
#(Attr:0x2afe5ac67400 {
name = "href",
value = "override_TWRP/"
})],
children = [ #(Text "override_TWRP/")]
}),
#(Text " 12-Mar-2019 19:27 -\r\n")]
}),
#(Element:0x2afe5aa751c4 { name = "hr" })]
}),
#(Text "\r\n")]
})]
})
Result with jruby:
=> #(Document:0x7e6 {
name = "document",
children = [
#(Element:0x7e8 {
name = "html",
children = [
#(Element:0x7ea { name = "head" }),
#(Element:0x7ec {
name = "body",
children = [ #(Element:0x7ee { name = "h" })]
})]
})]
})
Environment
ruby:
# Nokogiri (1.10.2)
---
warnings: []
nokogiri: 1.10.2
ruby:
version: 2.6.0
platform: x86_64-linux
description: ruby 2.6.0p0 (2018-12-25 revision 66547) [x86_64-linux]
engine: ruby
libxml:
binding: extension
source: packaged
libxml2_path: "/home/amo/.rvm/gems/ruby-2.6.0/gems/nokogiri-1.10.2/ports/x86_64-pc-linux-gnu/libxml2/2.9.9"
libxslt_path: "/home/amo/.rvm/gems/ruby-2.6.0/gems/nokogiri-1.10.2/ports/x86_64-pc-linux-gnu/libxslt/1.1.33"
libxml2_patches:
- 0001-Revert-Do-not-URI-escape-in-server-side-includes.patch
- 0002-Remove-script-macro-support.patch
- 0003-Update-entities-to-remove-handling-of-ssi.patch
libxslt_patches: []
compiled: 2.9.9
loaded: 2.9.9
jruby:
# Nokogiri (1.10.2)
---
warnings: []
nokogiri: 1.10.2
ruby:
version: 2.5.0
platform: java
description: jruby 9.2.5.0 (2.5.0) 2018-12-06 6d5a228 OpenJDK 64-Bit Server VM 25.212-b01
on 1.8.0_212-b01 +jit [linux-x86_64]
engine: jruby
jruby: 9.2.5.0
xerces: Xerces-J 2.12.0
nekohtml: NekoHTML 1.9.21
Any advice is greatly appreciated.