LEGO® Java (III): Apache Camel Routing and Testing
In the first part of this series we saw the basics of Apache Camel routing and contexts, and in the second part we learned about error handling and web services. Today, in the third part, we will add more functionality to our Camel extractor application and we will do it in a modular way. We will also have a look at how to effectively test our functionality.
Improving our HTML cleaner
In a perfect world, our application would never need any bug fixing or improvements. But because we know that the world is not perfect, it happens often that we discover limitations in our applications and we must make some improvements to correct them. If you try extracting a URL from DZone (“http://www.dzone.com“) with the application, then you will see an exception in the console indicating that “TagSoup” cannot fix a problem related with the DOM structure (TagSoup is used by the “TidyMarkup” marshaller to clean our HTML) . We could configure “TagSoup” differently, but instead, let’s assume that we have to use another tool to do the work. In this case, we are going to use “HtmlCleaner” and we’ll integrate it into our application using a Java “bean”.
Add a new Java class to your project containing this code:
public class HtmlCleanerBean {
public Document cleanHtml(String html) {
CleanerProperties properties = new CleanerProperties();
properties.setNamespacesAware(false);
HtmlCleaner cleaner = new HtmlCleaner(properties);
TagNode node = cleaner.clean(html);
try {
return new DomSerializer(properties).createDOM(node);
} catch (ParserConfigurationException e) {
throw new RuntimeException(e);
}
}
}
This class uses “HtmlCleaner” and therefore requires the following Maven dependency:
<dependency>
<groupId>net.sourceforge.htmlcleaner</groupId>
<artifactId>htmlcleaner</artifactId>
<version>2.2</version>
</dependency>
We could add more steps to our extractor route, but it is starting to become a little verbose and unclear. So instead, let’s better add a new route for our improved html functionality:
public class HtmlImproverRoutes extends RouteBuilder {
@Override
public void configure() throws Exception {
from("direct:html_improver")
.onException(CamelException.class)
.continued(true)
.log("Tidying failed, trying with Cleaning.")
.bean(HtmlCleanerBean.class)
.end()
.unmarshal().tidyMarkup();
}
}
And replace the tidy markup marshalling with a detour through the new created route:
from("direct:page_extractor")
.streamCaching()
.onException(HttpOperationFailedException.class)
.onWhen(bean(HttpErrorHelperBean.class).isEqualTo(true))
.log("Fetching URL failed: '${exception.message}', trying with relocation: '${exception.redirectLocation}'.")
.handled(true)
.setBody(simple("${exception.redirectLocation}"))
.to("direct:page_extractor")
.end()
.setHeader(Exchange.HTTP_URI, body())
.setBody(constant(null))
.log("Extracting content from: '${header." + Exchange.HTTP_URI + "}'")
.pipeline("http:extractor", "direct:html_improver")
.log("Html from: '${body}'")
.split(xpath("//body//p/text()"), new SplitterAggregationStrategy("(?s).*[A-Za-z0-9].*"))
.log("Text chunk: '${body}'.")
.end();
To get this route created in the Camel context, adjust the main method in the “main” class like this:
public static void main(String[] args) throws Exception {
DefaultCamelContext camelContext = new DefaultCamelContext();
camelContext.addRoutes(new PageExtractorRoutes());
camelContext.addRoutes(new HtmlImproverRoutes());
camelContext.start();
ProducerTemplate template = camelContext.createProducerTemplate();
String result = template.requestBody("direct:page_extractor", "http://www.dzone.com", String.class);
System.out.printf("Extracted: '%s'.\n", result);
Thread.sleep(100000);
camelContext.stop();
}
As you can see, in the new “HtmlImproverRoutes” class, we have applied an “onException” clause to detect when “TagSoup” html parsing fails. The difference this time is that we no longer use “handled” to end the processing of the message (as we were in the previous case, where we redelivered the message to the same endpoint). This time, we use “continued” to instruct Camel to send the message through the error handling steps and then continue processing from where the error occurred.
We have also performed some changes in the page extractor route. First, we have activated the “streamcaching” to avoid exceptions when the inputstream returned by the “http” extractor is consumed several times. Also, after setting the URL to extract in the “CamelHttpUri” header, we are now resetting the body of the message sent to the “http” endpoint. If not done, the body causes some problems when accessing certain web servers because the endpoint uses the HTTP POST method (used when the body is not empty) instead of the more suitable GET. The last change is to route the result of the “http” endpoint to the newly created route. To enforce a little bit more the sequential character of this process, we have applied the “pipeline” enterprise integration pattern instead of the default “to” construct of the Java DSL.
If you run the application now, you should see how it detects the parsing error and continues the processing of the content using “HtmlCleaner” instead of “TagSoup”.
This functionality (the “html improver”) has been moved from the initial route builder to an specific one, what contributes a to better modularization and re-usability of the code in Apache Camel. Because the endpoints are named with string constants and now the strings: “direct:page_extractor” and “direct:html_improver” appear in different files, it is usually a good practice to extract these values as constants in the route builders. This, makes more difficult to get the code broken in case of a change in the endpoint name or type, and it plays the role of an interface.
Testing the routes
Apache Camel offers mock testing support for testing the routes in an effective way. But, while in most of the examples out there, a mock endpoint is added explicitly to the route, let’s see how to do it in a programmatic way to avoid altering the route that we want to test.
First, add these two dependencies to your Maven pom file:
...
<dependency>
<groupId>org.apache.camel</groupId>
<artifactId>camel-test</artifactId>
<version>${camel.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.8.1</version>
<scope>test</scope>
</dependency>
...
Second, add this class with some run-time support for mock testing:
public class CamelMockTestSupport extends CamelTestSupport {
private MockEndpoint fMockEndpoint;
public MockEndpoint getMockEndpoint() {
if (null == fMockEndpoint) {
fMockEndpoint = new MockEndpoint("mock:_mock_");
}
return fMockEndpoint;
}
protected void sendBodyToMock(String endpoint, final Object body) {
Exchange exchange = template.request(endpoint, new Processor() {
public void process(Exchange exchange) throws Exception {
exchange.getIn().setBody(body);
}
});
exchange.getIn().copyFrom(exchange.getOut());
template.send(getMockEndpoint(), exchange);
}
}
Third, create a test based on this approach:
...
import static com.canoo.camel.routes.HtmlImproverRoutes.HTML_IMPROVER_EP;
public class HtmlImproverRoutesTest extends CamelMockTestSupport {
@Override
protected RouteBuilder createRouteBuilder() throws Exception {
return new HtmlImproverRoutes();
}
@Test
public void testHtmlImproverRoute_withBadHtml() throws InterruptedException {
MockEndpoint mock = getMockEndpoint();
mock.expectedMessageCount(1);
mock.expectedBodiesReceived("<html xmlns:html=\"http://www.w3.org/1999/xhtml\">\n" +
"<body>\n" +
"<h1>Bad Html</h1>\n" +
"</body>\n" +
"</html>\n");
sendBodyToMock(HTML_IMPROVER_EP, "<html><h1>Bad Html</html>");
mock.assertIsSatisfied();
}
}
If you run this test, you should see a nice green bar (or the like) indicating that the test passed. Please, note the static import for “HTML_IMPROVER_EP”, which contains the name of the endpoint, extracted as a constant (as explained before).
As of Apache Camel 2.7, some advice functionality has been added to mock testing making possible introduce the mock endpoints in an aspect oriented way.
As summary of this third part, we have shown how to organize the project routes using several route builders and how you can use mocks to test and, therefore, improve the quality and stability of your Apache Camel project.
In the next Camel “ride”, we will introduce the Spring framework and unveil what to do with the extracted data. Hope to see you there!
Source code of this third part for download. To run the application, just expand the file, change to the folder where the pom file is and execute: ‘mvn compile exec:java -Dexec.mainClass=”com.canoo.camel.Part3″‘











LEGO® Java (III): Apache Camel Routing and Testing said,
March 21, 2011 @ 3:53 pm
[...] of this series we saw the basics of Apache Camel routing and contexts, and in the second part… [full post] alberto Rich Internet Applications (RIA) javaalbertocameleip 0 0 [...]