Thursday, September 10, 2009

Scala scala.io.Source fromURL blocks / hangs forever without timeout value

Recently I am doing a scala project which is trying to data mine a forum. I have reached to a point that, since I am using multi threads to do the web content fetching, some of my threads block/hang forever at various lines, like Source.getLine, hasNext or even fromURL. Here is one example of the thread dump stack:


"pool-1-thread-194" prio=6 tid=0x0b57d000 nid=0x1710 runnable [0x0f4af000..0x0f4afa94]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
- locked <0x031a5488> (a java.io.BufferedInputStream)
at sun.net.www.MeteredStream.read(MeteredStream.java:116)
- locked <0x031efdc8> (a sun.net.www.http.KeepAliveStream)
at java.io.FilterInputStream.read(FilterInputStream.java:116)
at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2446)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:264)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:306)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:158)
- locked <0x031efe48> (a java.io.InputStreamReader)
at java.io.InputStreamReader.read(InputStreamReader.java:167)
at java.io.BufferedReader.fill(BufferedReader.java:136)
at java.io.BufferedReader.read(BufferedReader.java:157)
- locked <0x031efe48> (a java.io.InputStreamReader)
at scala.io.BufferedSource$$anonfun$1$$anonfun$apply$1.apply(BufferedSource.scala:29)
at scala.io.BufferedSource$$anonfun$1$$anonfun$apply$1.apply(BufferedSource.scala:29)
at scala.io.Codec.wrap(Codec.scala:65)
at scala.io.BufferedSource$$anonfun$1.apply(BufferedSource.scala:29)
at scala.io.BufferedSource$$anonfun$1.apply(BufferedSource.scala:29)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:146)
at scala.collection.Iterator$$anon$1.next(Iterator.scala:712)
at scala.collection.Iterator$$anon$1.head(Iterator.scala:699)
at scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:374)
at scala.collection.Iterator$$anon$17.hasNext(Iterator.scala:319)
at scala.collection.Iterator$$anon$1.hasNext(Iterator.scala:706)
at scala.io.Source$LineIterator.getc(Source.scala:182)
at scala.io.Source$LineIterator.next(Source.scala:195)
at scala.io.Source$LineIterator.next(Source.scala:165)
at scala.io.Source.getLine(Source.scala:163)


With debugger on, you should be able to figure out the timeout parameter of the socketRead0 method is actually ZERO. That's why fromURL will block forever.

Open up the scala.io.Source source (2.8), fromURL is actually a convenience method to fromInputStream(url.openStream())(codec).

Now that's easy, just forget about the fromURL method. Use the fromInputStream instead with java.net.URLConnection.


import java.net.URL
import scala.io.Source

val timeout = 60000
val conn = (new URL(url)).openConnection()
conn.setConnectTimeout(timeout)
conn.setReadTimeout(timeout)
val inputStream = conn.getInputStream()

val src = Source.fromInputStream(inputStream,
Source.DefaultBufSize,
null,
() => inputStream.close())


EDIT: forgot to close the stream!

Saturday, August 29, 2009

Perl Unicode

Recently I was struggling with the unicode in Perl, there are a few things I found it is tricky:

From here, there is a clear explanation on the encode/decode functions. If my perl program is a command line program which accepts cp950 arguments in a chinese windows, the arguments need to be "decoded" into Perl's internal form.

for(my $i=0; $i < scalar(@ARGV); $i++) {
$ARGV[$i] = Encode::Byte::decode("cp950", $ARGV[$i]);
}

When you send the arguments to a server that accepts UTF8, you need to send

$data =~ s/(\P{IsASCII})/sprintf('%2x;',ord($1))/eg;

After getting the server response, if it is in UTF8, you need to decode the response one more into Perl's internal form.

$response = Encode::decode_utf8($response);

So, the last step is, we need to change the Perl's internal form to cp950 for the STDOUT.

binmode STDOUT, ":cp950";

This is all from my memory if there is any mistake.

Tuesday, August 04, 2009

Processing EDI, XML, CSV and more with Smooks

Getting this headline in TSS, and remember it was used to be a tedious task to handle that in my previous company. Posting this for future reference.

http://www.theserverside.com/news/thread.tss?thread_id=55339

Monday, August 03, 2009

Windows Gadget experience

Recently I have finished a windows gadget that uses ajax to grab the data. The examples I followed were the official gadgets from Microsoft: Stocks and Feed Headlines.

As being told in several sites and books, Windows Gadget is composed of html, css and javascript. It wasn't that easy at the first glance on the stocks gadget. Drilling down the code to see how the data is parsed from the web led to an ActiveX object with an dll. I googled this and found out it is really the case: http://www.cnblogs.com/yayx/archive/2007/09/04/881879.html (a chinese webpage however). Anyway, another problem with the stocks javascript is that, it's all structured as static class way, click here for the difference on what javascript you have normally seen in the past. Additionally, the number of the code lines is horrible too, >4000 lines doesnt help, the reason is that basically the author put all the logic INCLUDING all the html creation like "tables,TDs" in the javascript instead of in the html layer.

Anyway, the very first issue was to try to put the data retrieval to something that can be easily tested. Replaced ActiveX with ajax with the helps from these pages:

http://developer.novell.com/wiki/index.php/Using_the_XMLHttpRequest_object
http://www.ibm.com/developerworks/web/library/wa-ajaxintro2/
http://www.jibbering.com/2002/4/httprequest.html

The way to link the server code with the UI code is by using the listener pattern, you just need to define the functions that you want to run into the server code listeners, once the data is retrieved OR the status message from the connection is changed, the listener functions will be triggered, and your UI will be updated.

12029 is the status code that indicates "unable to establish connection".
setTimeout() and setInterval() are the functions to constantly perform a function.
Reusing the XMLHTTP Object is described here.

Once the server communication code was ready, I was trying to code the same way as in stocks but failed with the elements positioning. On the contrary, feed headlines is a much better example to deal with. The author defined the tables in the html, defined a number of methods to trigger by following the html trigger like onmouseout, onwheelmove etc.

There is a function from the web that can transform the milliseconds to human readable date in javascript. Watch out the "var day = Math.floor(hr/60)", it should be divided by 24?!

At last, two more tricky places that wasted me few hours of debugging:
1. I have tried the gadget, and it only worked on one server, it didnt connect to other server and the XMLHttpRequest status code was zero and empty XML response. The only way to make this problem disappear was to kill the sidebar.exe process and start it over.

2. In the setting.js in the feed headlines gadget, loadSettings(); in the load() seemed to be duplicated by my eyes at the first place, after several hours of struggling why the saved settings cannot be successfully passed to the setting UI, this line was the missing piece.

Overall, the experience was great, and I am really satisfied with the results.

Saturday, July 18, 2009

Simple 2.1

Just found out http://simple.sourceforge.net/ as a XML/Object framework, it is quite similar to the inhouse custom library built by my CTO. Writing this as a reference to me just in case I need it in the future.

Wednesday, July 08, 2009

Unicode in Java

Today I found out the jvm system properties file.encoding needs to be set as UTF-8 in non-English windows to work properly with an utf-8 configured MySQL (that's with DEFAULT CHARSET=utf8 in CREATE TABLE).

-Dfile.encoding=UTF-8


In traditional chinese windows, the default code page is ms950, while it is windows-1252 for my local English windows setup.

Monday, June 29, 2009

PHP 500 Internal Server Error

Today I encountered an error message in the integration test with "500 Internal Server Error". I have wasted an hour in searching the "Client-Warning: Redirect loop detected" in google and playing around with the require statement etc.

The easiest way should be just turning on the error logging in php.ini, log_error = On. The error message in the log told me the require statement couldn't find the target php file. That's it...

PHP Fatal error:  require_once() [function.require]: Failed opening required 'utilities.php' (include_path='/opt/ecloud/i686_Linux/php/lib/php') in /opt/ecloud/i686_Linux/apache/htdocs/accelerator/evalPhp.php(18) : eval()'d code on line 2

Friday, June 26, 2009

Jetty Handler with NIO and Continuation

After getting the http or https request through SelectChannelConnector or SslSelectChannelConnector, the custom handler method that you extend from AbstractHandler handle(...) will run.
public void handle(String target,
HttpServletRequest request,
HttpServletResponse response,
int dispatch)
throws IOException,
ServletException {

// Obtain Jetty continuation
Continuation continuation = getContinuation(request, null);

// Create a custom callback and store it in each HttpServletRequest
// This callback is to wrap the continuation object for the other thread
// to call continuation.resume() for generating the response.
// "callback" should be a final static variable in production server.
Callback callback = (Callback) request.getAttribute("callback");
if (callback == null) {
callback = new Callback(continuation);
request.setAttribute("callback", callback);
}

// Synchronize on callback to prevent continuation.resume() from
// happening before continuation.suspend().
synchronized (callback) {
if (continuation.isNew()) {
// Dispatch the request to another area with different thread
// and of course, callback must be referenced later to call
// continuation.setObject() and continuation.resume() down the road.
}
// zero here for the simplicity
continuation.suspend(0);
}

// Up to this point, the continuation is resumed, and got the object ready
// for response.
PrintWriter out = null;
try {
out = response.getWriter();
Object obj = continuation.getObject();
// further processing the obj for the "out"
} finally {
if (out != null) {
out.close();
}

// Reset the continuation
continuation.reset();
continuation.setObject(null);
}
}

Wednesday, June 24, 2009

MySQL alter table with multiple indexes

Recently I need to add/delete multiple indexes in the same table. I didn't notice that, I can chain the add/drop index in a single "ALTER TABLE" which only does one table copying once instead of multiple times.

http://brian.moonspot.net/mysql-alter-multiple-things

In the past, I tried to copy over the huge dataset to a temp table with the new indexes created, but I encountered a problem in deleting a FK in the child table that I can't solve. http://bugs.mysql.com/bug.php?id=14347