[Devel] Parsing dates of Xanga entries
Eliot Landrum
eliot at landrum.cx
Wed Jun 15 03:04:11 EST 2005
Minh -
Even though I already talked to you on Jabber about this, I wanted to
post this here for posterity.
I found this script by Michael Greene that cleans up Xanga RSS a little
bit. I added the pubDate support and a little bit of a real title. So,
basically, instead of:
<title>6/13/2005 5:25:33 PM</title>
<description>Well, it's over. Michael Jackson was found not guilty
on all 10 counts.</description>
You'll get
<pubDate>Mon, 13 Jun 2005 17:25:33 -0400</pubDate>
<title>Well, it's over. Michael...</title>
<description>Well, it's over. Michael Jackson was found not guilty on
all 10 counts.</description>
Planet seems to be pretty happy with that change!
So, just point Planet to the script on your webserver like
http://host/xanga.php?username=xxxx
Eliot
Minh Nguyen wrote:
> The Planet that I unofficially maintain for my high school, called
> Planet Xavier [1], syndicates a large number of blogs hosted by a
> service called Xanga [2]. Xanga only provides feeds in RSS 0.91 form,
> and includes post dates in the <title> element, using the following
> format, in Eastern Standard Time:
>
> mm/dd/yyyy hh:mm:ss AM
>
> I'd like for Planet to parse the provided date for each post, but I'm
> new to Python, and the sheer length of the feedparser.py script is
> preventing me from seeing how I'd be able to do this. From my cursory
> reading of the code, I'm pretty sure that the script can't parse the
> date format, but that's something I can figure out. I'm just not sure
> how to get Planet to read the <title> element instead of the <date>
> element. If anyone can help me out, I'd appreciate that very much.
>
> [1] http://mxn.f2o.org/planet/xavier/
> [2] http://www.xanga.com/
>
>_______________________________________________
>Devel mailing list
>Devel at lists.planetplanet.org
>http://lists.planetplanet.org/mailman/listinfo/devel
>
>
-------------- next part --------------
<?php
/*
Xanga Feed Converter 0.4
(C) 2004 Michael Greene, michael dot greene at gmail dot com
Published under the MIT License
Revision Date: 2004.10.19
*/
/*
Added pubDate support
Eliot Landrum <eliot at landrum.cx>
June 14, 2005
*/
// Some aggregators don't like it if we don't tell them
// *exactly* what we're sending
header('Content-Type: text/xml; charset=utf-8');
// Get the username and form the URL for the Xanga feed
$username = $_GET['username'];
$feed = 'http://www.xanga.com/rss.aspx?user=' . $username;
// Grab the Xanga feed
ini_set('allow_url_fopen', true);
$fp = fopen($feed, 'r');
$xml = '';
while (!feof($fp)) {
$xml .= fread($fp, 128);
}
fclose($fp);
// Convert the malformed Xanga feed into a near-valid RSS feed
$xml = str_replace('</channel><item>','<item>',$xml);
$xml = str_replace('</rss>','</channel></rss>',$xml);
// Don't know why Xanga would send this like this but not anymore
$xml = str_replace('&nbsp;',' ', $xml);
$xml = str_replace(' ',' ', $xml);
//$xml = str_replace('<','<', $xml);
//$xml = str_replace('<br>','<br/>', $xml);
// Break the feed into <item> components
$items = explode('<item>', $xml);
// Add a description element to the channel for validity
$items[0] .= '<description>An RSS Feed of a Xanga.com Journal</description>';
// Make the titles a lot cooler
for ($i=1; $i<6; $i++) {
// Grab the description part
$description = strstr($items[$i], '<description>');
// Get rid of any HTML
$description = html_entity_decode($description);
$description = strip_tags($description);
// Extract the first 25 characters of that
$extra = substr($description, 0, 25);
// Separate the title from the rest
$itempieces = explode('</title>', $items[$i]);
// Take out the title tag
$itempieces[0] = str_replace('<title>','', $itempieces[0]);
// Take the contents of what was the title and convert it to an RFC-822 date
// Then add it as a pubDate element
$itempieces[0] = '<pubDate>' . date("r",strtotime($itempieces[0])) . '</pubDate>';
// Add the pubDate plus the first 25 characters of the content as the title
$itempieces[0] = $itempieces[0] . '<title>' . $extra . '...';
// Put the pieces of the item back together
$items[$i] = implode('</title>', $itempieces);
}
// Put the items back together
$xml = implode('<item>', $items);
// Output the final RSS feed
echo $xml;
?>
More information about the Devel
mailing list