SimplePie 1.2: ‘This XML document is invalid, likely due to invalid characters. XML error: SYSTEM or PUBLIC, the URI is missing at line 1, column…’

10 02 2011

Got busy with SimplePie today.

Background reference: I discovered SimplePie this morning as an alternative for MagpieRSS (yeah, I acknowledge I’m late on this), with which I had issues with HTTPS’ transfered feeds.

Unluckily, SimplePie proved to have issues with HTTPS too. During my tests, such protocol access resulted in the error message mentioned in the title of this post.

It appears that this is a real bug in SimplePie.

At one point, a hostname value is computed in order to be used by fsockopen(). PHP manual states that “ssl://” or “tls://” may be prepended to hostname value in order to use SSL/TLS with this function. In, this newly computed hostname value is further used in HTTP’s HOST header, hence the server spits out an HTTP error, and such response is not likely to provide in the end a suitable input for the XML parser.

The following patch corrects this bug. I double-checked any side-effects.

---	2011-02-10 12:53:00.000000000 +0100
+++	2011-02-10 15:45:31.000000000 +0100
@@ -7733,14 +7733,15 @@
 				$url_parts = parse_url($url);
 				if (isset($url_parts['scheme']) && strtolower($url_parts['scheme']) === 'https')
-					$url_parts['host'] = "ssl://$url_parts[host]";
+					$fsock_host = "ssl://$url_parts[host]";
 					$url_parts['port'] = 443;
 				if (!isset($url_parts['port']))
+					$fsock_host = $url_parts['host'];
 					$url_parts['port'] = 80;
-				$fp = @fsockopen($url_parts['host'], $url_parts['port'], $errno, $errstr, $timeout);
+				$fp = @fsockopen($fsock_host, $url_parts['port'], $errno, $errstr, $timeout);
 				if (!$fp)
 					$this->error = 'fsockopen error: ' . $errstr;