Benjamin Juang (ibneko) wrote,
Benjamin Juang
ibneko

Xanga to LiveJournal...

Since I use livejournal more, that's my first priority. Then comes livejournal to xanga, which should be a lot easier.
---
Notes to self:
Parsing Xanga "friends" page [ http://www.xanga.com/Private/subs.aspx ].

  • Main Date tags are found at "blogheader". That's where I'll have to separate things, assuming I check daily for updates. This will separate things by day, assuming I get it right.
  • a href=\"/home.aspx?user= will find each user's entry. No, that's not true. I should find class="blogbody" first, which marks the table that begins the username (Why didn't they include this in the main entry? Weird people...) followed by a href=\"/home.aspx?user=, from which I will extract the username. Then I have to find class="blogbody" (I really ought to be able to count the number of characters between the first and the second class="blogbody" to verify where I am. I assume these ugly tables are script generated, so they shouldn't be of variable length.
  • Everything between <td width="5%">&nbsp;</td><td valign="top"> and </td></tr><tr><td width="5%">&nbsp;</td> should mark out the entry.
  • span class="smalltext" to the first </a> marks out the read/post comment link for the entry.


...On a side note, Xanga returns html in one squish. No return/new lines. So the code looks like:
<table border="0" cellspacing="0" cellpadding="4" width="100%"><tr><td valign="top">&nbsp;</td><td align="right"><span class="smalltext">browse subscriptions: <a href="subs.aspx?nextdate=4%2f28%2f2004+20%3a39%3a30.810&direction=n">Next »</a></span></td></tr></table><table border="1" cellspacing="0" cellpadding="1" width="100%" class="tabs"><tr><td width="3">&nbsp;</td><td width="130" align="center" id="tabselected"><a href="subs.aspx" class="tabselected">Public Posts</a></td><td width="5">&nbsp;</td><td width="125" align="center" id="tab"><a href="subsprotected.aspx" class="tab">Protected Posts</a></td><td>&nbsp;</td></tr></table><div class="blogheader">Thursday, April 29, 2004</div><table border="0" cellspacing="0" cellpadding="1" width="100%" class="blogbody"><tr><td width="5%">&nbsp;</td><td valign="top"><a href="/home.aspx?user=whoa__now"><b>whoa__now</b></a></tr></table><table border="0" cellspacing="0" cellpadding="1" width="100%" class="blogbody"><tr><td width="5%">&nbsp;</td><td valign="top">someone tell me to ask him out.</td></tr><tr><td width="5%">&nbsp;</td><td><span class="smalltext"><a href="http://www.xanga.com/item.aspx?user=whoa__now&tab=weblogs&uid=84779473">4:20 PM</a> - <a href="http://www.xanga.com/item.aspx?user=whoa__now&tab=weblogs&uid=84779473">add eprops</a> - <a href="http://www.xanga.com/item.aspx?user=whoa__now&tab=weblogs&uid=84779473">add comments</a> - <a href="/send.aspx?uid=84779473&tab=weblogs">email it</a></span></td></tr></table...
Damn ugly. ::obsessively goes in and adds newlines so it's easier to read::

---
It'll be opt-in, in a way. Xanga users will have to access a page similar to this one [ http://darwin.servehttp.com/ibcorner-addafriend.bml ] where they will be able to give their xanga username and be added as a subscriber to the account. 'cept the difference is that the ibcorner-addafriend.bml is so that people can see the friends-only posts... and in this case, it's to let the people's posts get sent over... I might have to add a password system to let people control adding and deleting. A two-layer password system, similar to the college/university page. One password to edit, one personal password to make changes.

Right. Anyways, expected work time: probably a night or two. Parsing should be simple. Posting to LiveJournal, I already do. The biggest difficult is the testing...

And another strange thing... Does Xanga show all of the recent posts for that day? Because it feels like there should be a lot more entries there..............
Subscribe
  • Post a new comment

    Error

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

  • 3 comments