Reading Content from a Webpage in Java

by Ismail Sirma


Posted on 22.6.2015 10:59:26


Here, we will connect to a web page and read its contents. Connection to the address is handled by URLConnection object in java.net library. We can manage reading of the page content by BufferedInputStream and InputStream objects in java.io library.

Here is the code:
import java.io.BufferedInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.net.URL;
import java.net.URLConnection;
import java.net.MalformedURLException;

/**
* Created by ismailsirma on 22.6.2015.
*/
public class ReadWeb {

public static void main(String[] args){

try {
// create a URL object (uniform resource locator)
URL url = new URL("http://www.theverge.com");

// Creating a url connection object using openConnection method located in Java.net.Url.
URLConnection uc = url.openConnection();
// adds a request property in order to talk in http with a web server
uc.addRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)");
//Opens a communications link to the resource referenced by this URL
uc.connect();

// create an input stream for connecting the file on web
InputStream stream = url.openStream();

// create a buffered input stream in order to read one or more bytes from the file
BufferedInputStream buffer = new BufferedInputStream(stream);

// create StringBuilder for collecting the data from file
// StringBuilder lets you append content or insert content into a string without rebuilding the string object
StringBuilder sb = new StringBuilder();

// loop through the file
while (true){
// In each loop, read a single character from the stream
// read method returns a single byte
int data = buffer.read();
// in the end of the stream; read method returns the value of -1
if (data == -1){
break; // break out of the loop when the end of the file is reached
// In else statement, if end of the file is not reached, append the string
} else {
sb.append((char)data);
}
}

System.out.println(sb);
} catch(MalformedURLException e){ // Exception to indicate a malformed URL has occurred
// Either no legal protocol could be found in a specification string or the string could not be parsed
e.printStackTrace();
}
catch (IOException e) {
e.printStackTrace();
}
}
}

 

Blog Search

Message

Welcome to Blog of Ismail Sirma.

Back to List