Although you’re strongly advised against writing your own XML parser, the standard ones can’t cope well with badly formatted XML input. The stuff I was trying to use wasn’t well formed, so I had to write something that didn’t care. It’s only four lines, I can’t see what all the fuss is about. Encodings blah.
# Chris's recursive xml parser that doesn't handle quotes properly my $d=0; while (<>) { p($_); } sub p { my ($t,$a,$c) = $_ =~ m:<([/\w]+) ?(\w+=".*")* ?(/)? ?>:; if ($t =~ m:^/:) { print "\t" x $d,"+++$t+++ \n\n\n"; $d--; return; } else { print "\t" x $d,"+++$t+++\n"; } if (defined($a)) { foreach my $kv (split(/ /, $a)) { if (my ($k, $v) = $kv =~ m/^(\w+)="(.*)"/) { print "\t" x $d,"$k = $v\n"; } } } if (defined($c)) { print "\n\n"; } else { print "\n\n"; $d++; while (<>) { p($_); } } }