retrieving info from website java

Discussion in 'Software' started by red death68, Jan 5, 2013.

  1. red death68

    red death68 Command Sergeant Major

    Ok so here is what I am trying to do I found a code on oracles site for retrieving the html of a web address. The code works nicely but I need only the info from 2 lines or maybe even just one line if I preset the text here is the code:

    Code:
    import java.net.*;
    import java.io.*;
    
    public class URLReader {
        public static void main(String[] args) throws Exception {
    
            URL oracle = new URL("http://services.runescape.com/m=itemdb_rs/viewitem.ws?obj=2363");
            BufferedReader in = new BufferedReader(
            new InputStreamReader(oracle.openStream()));
    
            String inputLine;
            while ((inputLine = in.readLine()) != null)
                System.out.println(inputLine);
            in.close();
        }
    }
    One thing I am trying to do is retrieve the information from lines 370 and 371 from the html in the url above. I can make due with even just line 371. I need it stored in a string so I can remove unneeded parts and make use of it.

    I had tried using split and setting it to split at /n for new line but this didn't work but this idea came when I was quite tired so it may not have been a good idea.


    Anyways enough rambling can anyone help me with this?
     
  2. PC-XT

    PC-XT Master Sergeant

    Some thoughts:
    Are you talking displayed lines produced by html in a browser? These are hard to count outside of a browser. I assume you mean the code itself.

    /n is a typo. I know you mean \n... maybe you made the same typo in your code?

    If you set a variable to count the lines, you can check it each time, and assign lines 370 and 371 to variables to remember only them:

    String inputLine, line370, line371;
    int count=0;
    while ((inputLine = in.readLine()) != null)
    if(++count==370)line370=inputLine;else if(count==371)line371=inputLine;
    in.close();

    depending on whether you start counting with 1 or 0, you might need to change the starting count to -1, or the line numbers.

    You might be able to stop the while loop after line 371, as well, if there's enough more to load to bother with that, and it doesn't break something...
     
  3. red death68

    red death68 Command Sergeant Major

    Thank you for the help I will try that as soon as i get the time. I am still rather new to java and am trying to learn by branching out and reading tutorials and reading code and learning from it

    so any help is much appreciated and yes I am talking the html code itself not what the browser displays to the user
     
  4. red death68

    red death68 Command Sergeant Major

    thak you for the suggestion I had tried something like that and to no avail then I had the idea to rewrite the class so it was more of a callable component for my protect and I managed to get this

    Code:
    import java.net.*;
    import java.io.*;
    
    public class URLReader 
    {
    	String url = "", inputLine = "";
    	
    	public URLReader(String url)
    	{
    		this.url = url;
    	}
    	
    	public String getHTML(int x) throws IOException
    	{
    		int count = 0;
            URL oracle = new URL(url);
            BufferedReader in = new BufferedReader(new InputStreamReader(oracle.openStream()));
            while ((inputLine = in.readLine()) != null)
            {
            	count++;
            	if (count == x)
            	{
            		return inputLine;
            	}
            }
            return inputLine;
    	}
    }
    now my only question is, is there a way to make it look for just a specific line rather then having to scan the entire html in the loop as I have it?

    currently it takes roughly .880 seconds according to the j unit test i ran to verify its working order.

    this will be quite intensive as I plan to make multiple calls based on a drop box selection made by the user. If you can help me find any way to do this quicker and less system intensively it would be much appreciated.

    in the mean time I will work on the programs other components and add the url calling as late as possible.

    If you are wondering why it must call only that specific line and that many times it is to retrieve prices of items in a game from the games equivalent of a stock market. just so you have an idea what i am doing as it may make the search easier. a sample of the page its calling from is the url in the first code i posted.
     

MajorGeeks.Com Menu

Downloads All In One Tweaks \ Android \ Anti-Malware \ Anti-Virus \ Appearance \ Backup \ Browsers \ CD\DVD\Blu-Ray \ Covert Ops \ Drive Utilities \ Drivers \ Graphics \ Internet Tools \ Multimedia \ Networking \ Office Tools \ PC Games \ System Tools \ Mac/Apple/Ipad Downloads

Other News: Top Downloads \ News (Tech) \ Off Base (Other Websites News) \ Way Off Base (Offbeat Stories and Pics)

Social: Facebook \ YouTube \ Twitter \ Tumblr \ Pintrest \ RSS Feeds