There are many tools and/or libraries to use if you want to parse html pages. In java, one of the popular ones is called HTML Parser, which is what i use. It is not an application but a java library that you can plug into your classpath when compiling and executing your application using it. Go over to their site and download it. When you extract the archive file, it contains the JAR file library , samples and documentation.

I mainly use HTML Parser for extraction purposes. However, you can also use it for transformation. Some cool features include having filters which help immensely in getting the html tags that you only need.

Here is a sample code that uses the HasAttributeFilter class to filter out only tags that contain this attribute. I use the FilterBean class in this example to access the site page’s content. You can also use the Parser class to do the same thing. Using either is up to your preference.

try {
  NodeFilter[] nff = {new HasAttributeFilter("id", "spoof")};
  FilterBean fb = new FilterBean();
  fb.setFilters (nff);
  NodeList pageNodeList = fb.getNodes();
} catch (Exception e) { }

Suppose our link page contains the following html contents:

<p id="spoof">This is a sample paragraph</p>
<p id="officeid">Office id is 000123</p>

Once you execute that code, the output for System.out would be:

<p id="spoof">This is a sample paragraph</p>

the NodeList class is patterned after the Vector class and can be broken into separate tags. You just need to loop them. The documentation API contains all the classes of HTML Parser that you can use in your parsing needs. Take another filter as example, the TagNameFilter. if you replace HasAttributeFilter in the code with this

new TagNameFilter("p")

System.out will output as one string:
<p id="spoof">This is a sample paragraph</p>
<p id="officeid">Office id is 000123</p>

if you need to acecss each <p> tag separately you need to loop the pageNodeList object like this:

for (int i=0; i<pageNodeList.size(); i++) {
  System.out.println(((Node) pageNodeList.elementAt(i)).toHtml());

There you have it. HTML Parsing is so easy when using this helper library. It saves you the time and trouble of creating your own parser. Feel free to comment out if you have questions and/or problems.

Having an apple gadget is pretty … wooooh. The elegance, the design, etc ;). Just dont talk about its specs and price. Why? Apple has this nasty habit of selling new stuffs containing hardware specs that are not so satisfying at an expensive price. After a few months.. whammo! you get a price drop.

Take the latest gadget as an example. the ipod touch was released late last year of 2007 (october i think). i bought one last november, a 16 gig ipod touch. Then just this month (february of 2008), they announce a 32gig version of it with a slight price gap than the 16 gig model. I know, i know.. why did i buy one knowing that apple does this all the time? it’s a cool gadget for one. and 16gig of space is good enough for me. i dont keep movies anyway so it’s alright. the problem will be if you want to keep lots of movies in it and you want your converted files not to have its quality degraded. the filesizes are awfully big if you are a quality freak.

my worst experience was with the very first model of the ipod. back then, it was only a 4gig model. i was so pissed when i got one because after a few months they slashed the price of the bigger capacity models to the same price as the 4gig model that i just got. my advice to you, if you want to buy an apple gadget/accessory or whatever. Make sure that you wont regret what you just bought and that your gadget will keep you contented for many years. check out the specs and see if it will suit your needs for many years. if not, have it upgraded (like in a laptop or desktop’s case). im not regretting the 16gig ipod touch that’s for sure. i already learned my lesson…

let’s hope the macbook air wont disappoint a lot of people too. its specs didnt impress me. yes, it’s attractive because it’s sooooo thin. but it doesnt have a dvd writer. you’d have to buy one as an accessory which means, added cost. thiiiin as in so thin which makes it gullible to damage. not only that, its introductory price is very very very expensive it’s not worth it. if you have the cash though, go get it.

what’s in store for me next? probably nothing.. i am very contented with my macbook and itouch… good for 10 years or more.. so id probably get a new ipod after 1 decade haha

Whoa! my very first blog post. i am very lazy when it comes to these but hey, with google adsense’s popularity i think ill give a go with blogging and see what happens. What to blog? what to blog? i guess i can ask about why consumer prices keep skyrocketing even though the u.s. dollar is declining? i thought that since the dollar has been weakening, at least prices would also downgrade. no way! they keep skyrocketing upward. even the pinaypay saging just increased its price to almost 100%?!? are they nuts? you’d think they imported those bananas from abroad hence the increase.

the same thing with julie’s bakeshoppe. they probably increased all prices of their bread products. i only eat their cheese bread and choco german. cheese bread had a probably acceptable 50 cent increase but choco german?!? no way.. it’s the same as the pinaypay banana, almost a 100% increase. i did ask my uncle last year why it happens like that. he told me, that is just the way how it works out. businesses would be happy because consumer prices would never be lowered. see this as another profit for them.

dollar keeps declining (which is good news for the government only). i don’t see it as good news to us. OFWs wouldn’t like this because the conversion rate becomes small. so do people who have invested in the dollar. considering our salaries barely get an increase every year, it can’t keep up with the increase of prices in almost everything. life’s getting harder and harder each day.. and it’s going to probably get worse …

Related Posts Plugin for WordPress, Blogger...