Matthew Kwong's Blog

Friday, December 18, 2009

C input routine for single character and string

By answering some beginner questions in the programming forum, I am trying to learn some C/C++ again. As a matter of course, I learned something new as always, this is a page I referred to when trying to clear the buffer in stdin.

The first example is for capturing single character and check if it is a number between 1 and 4:


char option;
int number;
do {
    puts("Please enter your choice:");
    fflush(stdout);
    if (fgets(&option, 2, stdin) != NULL) {
        if (option != '\n') {
            scanf("%*[^\n]"); // get rid of the non-newline characters
            scanf("%*c");     // get rid of the newline character
        }
    }
} while (!(sscanf(&option, "%d", &number) == 1 &&
           number >= 1 && number <= 4));

The 2nd example is for capturing a string with trimming, overflow protection and emptyness checking:


char fileStr[20];
char *pointer = fileStr;
bool tooLong = false;

do {
    printf("\nPlease input a file name: ");
    fflush(stdout);
    if (fgets(fileStr, sizeof fileStr, stdin) != NULL) {
        tooLong = false;
        if (*fileStr != '\n') {
            // search for newline character
            char *newline = strchr(fileStr, '\n');
            if (newline != NULL) {
                *newline = '\0'; /* overwrite trailing newline */
                pointer = trimwhitespace(pointer); /* trim the line */
            } else {
                /* clear the stdin since user input too much */
                tooLong = true;
                scanf("%*[^\n]");
                scanf("%*c");
            }
        }
    }
} while (tooLong || *pointer == '\0' || *pointer == '\n');

printf("file name = \"%s\"\n", pointer);

Pay special attention on the variable you are gonna use at last, it is "pointer".

Tuesday, December 08, 2009

Birt and java.lang.NoClassDefFoundError: org/w3c/tidy/Tidy Tidy.jar

Lately I have been debugging a customer issue with our birt integration. The exception message is java.lang.NoClassDefFoundError: org/w3c/tidy/Tidy. We have found a lot of posts in google with this error.

Most of the google results and this one are related to the file/folder permission, and the location of the jars in tomcat/websphere. Our integration is not involved with any webapp container, and we double check few times the file/folder permission is okay.

And then, I know that lsof can show what jars the java process has linked. That shows all the jars are marked as "deleted". That is a big hint to me. Something is wrong with the java process and this time, we found that, the same java process had been launched twice. This is a good lsof tutorial btw.

Friday, December 04, 2009

Javascript window.event.keyCode in firefox

This is usually used for capturing the ENTER key pressed in the html text box:

function onkeypressed(e) {
    var keyCode = (window.event) ? window.event.keyCode : e.which;
    if (keyCode == 13) {
        // do something
        return true;
    } else {
        return false;
    }
}

Monday, November 02, 2009

Use regular expression to extract / parse a string into a collection of matched results recursively

Helped a person in doing an assignment and almost forgot a routine that I love in parsing a string into an ArrayList of matched results with a regular expression. Here it is:


Pattern p = Pattern.compile(...);
Matcher m = p.matcher(inputString);
List l = new ArrayList();
for (int i = 0; m.find(i); i = m.end()) {
    l.add(m.group(index));    // depends on your regular expression grouping
}

Thursday, October 01, 2009

PHP 5.2.10 session ID (sessionId) in URL problem in Solaris (all ZEROes 0000 as the year of the set-cookies)

Wow, I have actually spent 4 hours in looking into what's wrong with this session ID in url problem in a solaris box.

My journey started as turning on all the php log, no luck, nothing in error.log when the problem happened.

Then, I went to play around with all the 0/1 settings in session.* (actually use_only_cookies, use_trans_sid), no luck either.

And then, I went to compare a working windows setup with this solaris setup in php_info()... nothing special there.

I was starting to believe there was something wrong with the cookie, so I tried to use php to setCookie with a timeout. Using the firefox, I found out, the cookie from the solaris apache was not obtained in firefox. Therefore, I started to search google for "apache cookie" and NO, I wasted another hour.

Hmm it seems cookie is actually set in header. Then I used "curl -i" to dump the solaris header, and finally noticed the difference. The year of the expiration date is ALL ZEROes. Confirmed with a cookie WITHOUT timeout, the cookie started to appear in firefox for windows and solaris apache/php.

Googled some more and finally got the correct query: "set-cookie solaris 0000 year", the first result is the answer. Wow, I almost gave up on the way... Created this post for more google result matches.

Thursday, September 24, 2009

Two links for future reference - Spring & inner class and access target from proxy in aop

First link is here, you can create a bean with inner class with a constructor back to the parent class bean or you will get an exception something like no default constructor.

Second link is here, it is basic I know, just something I will most likely forget in the future if I need this again.

Scala 2.8 scala.io.Source throws java.nio.charset.UnmappableCharacterException at an unmappable sequence of bytes by default

Using scala 2.8 to grab a webpage encoded in big5, Source.fromURL throws UnmappableCharacterException at a chinese character (in bytes) that cannot be mapped to the unicode character. The default behavior of the scala Codec is to report this exception.

From reading the Codec source, you could see that Codec is actually composed of java.nio.charset.CharsetDecoder. From reading the javadoc, there is a caller method onUnmappableCharacter, and there should be 3 different CodingErrorAction that you can choose.

In scala.io.Codec source:


def onUnmappableCharacter(newAction: Action): this.type = { _onUnmappableCharacter = newAction ; this }

So that's easy enough,


import java.nio.charset.CodingErrorAction.REPLACE
implicit def codec = Codec("big5").onUnmappableCharacter(REPLACE)
scala.io.Source.fromURL(...) // a big5 encoded page with unmappable

Then everything should go quietly without any error since the unmappable sequence will be replaced by the default value.

Wednesday, September 16, 2009

Python in cygwin with puttycyg or mintty - interactive mode without prompt

I finally find the answer at here the last post under that thread.


$ python -i
Python 3.1.1 (r311:74483, Aug 17 2009, 17:02:12) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>

Yay!

Tuesday, September 15, 2009

Python 3 Unicode - print() in a putty cygwin terminal with UTF8 enabled

When trying to rewrite my hkgolden forum stat program by using Python 3, the unicode issue was my first thing to deal with. Using a putty cygwin terminal to launch the program and try to print the web page content to the terminal (UTF8 enabled), I immediately encountered two problems: 1. the chinese is not chinese anymore, and 2. "UnicodeEncodeError: 'gbk' codec can't encode character '\u2022' in position 188: illegal multibyte sequence".

The reason why the chinese cannot be shown correctly because the print() will automatically pick up some default encoding from the terminal/os even you have written "print()" would fail since you could guess from problem #2, the default for my system is "gbk" as I have picked "Simplified chinese" as my non-Unicode encoding in Windows (print(sys.stdout.encoding) returns 'cp936' for me).

After three hours of reading the reference python doc, and googling, I figured out how to bypass the print() with sys.stdout.buffer.write(), this method is for outputting the bytes directly to stdout.


sys.stdout.buffer.write(line.decode("big5").encode())

'line' was in big5 encoded bytes
line.decode("big5") makes the bytes to an unicode string in Python 3
line.decode("big5").encode() will make the unicode string to utf8 encoded bytes

More research on this sys.stdout.buffer.write led me to the better answers at "Setting the correct encoding when piping stdout in python" and here.


import sys
import codecs
sys.stdout = codecs.getwriter('utf8')(sys.stdout.buffer)
print(line.decode("big5"))  # automatically using utf8 to output the unicode string

From http://www.python.org/doc/3.1/library/codecs.html#codecs.StreamWriter, stream must be a file-like object open for writing binary data, and that's our "sys.stdout.buffer".

The best way to deal with unicode is to, treat every input as bytes and decode it (say for our example, it is big5) after receiving the input; send every output as bytes by encoding the internal string representation (same as in perl), the best choice is utf8 here.

Thursday, September 10, 2009

Scala scala.io.Source fromURL blocks / hangs forever without timeout value

Recently I am doing a scala project which is trying to data mine a forum. I have reached to a point that, since I am using multi threads to do the web content fetching, some of my threads block/hang forever at various lines, like Source.getLine, hasNext or even fromURL. Here is one example of the thread dump stack:


"pool-1-thread-194" prio=6 tid=0x0b57d000 nid=0x1710 runnable [0x0f4af000..0x0f4afa94]
   java.lang.Thread.State: RUNNABLE
 at java.net.SocketInputStream.socketRead0(Native Method)
 at java.net.SocketInputStream.read(SocketInputStream.java:129)
 at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
 at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
 - locked <0x031a5488> (a java.io.BufferedInputStream)
 at sun.net.www.MeteredStream.read(MeteredStream.java:116)
 - locked <0x031efdc8> (a sun.net.www.http.KeepAliveStream)
 at java.io.FilterInputStream.read(FilterInputStream.java:116)
 at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2446)
 at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:264)
 at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:306)
 at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:158)
 - locked <0x031efe48> (a java.io.InputStreamReader)
 at java.io.InputStreamReader.read(InputStreamReader.java:167)
 at java.io.BufferedReader.fill(BufferedReader.java:136)
 at java.io.BufferedReader.read(BufferedReader.java:157)
 - locked <0x031efe48> (a java.io.InputStreamReader)
 at scala.io.BufferedSource$$anonfun$1$$anonfun$apply$1.apply(BufferedSource.scala:29)
 at scala.io.BufferedSource$$anonfun$1$$anonfun$apply$1.apply(BufferedSource.scala:29)
 at scala.io.Codec.wrap(Codec.scala:65)
 at scala.io.BufferedSource$$anonfun$1.apply(BufferedSource.scala:29)
 at scala.io.BufferedSource$$anonfun$1.apply(BufferedSource.scala:29)
 at scala.collection.Iterator$$anon$11.next(Iterator.scala:146)
 at scala.collection.Iterator$$anon$1.next(Iterator.scala:712)
 at scala.collection.Iterator$$anon$1.head(Iterator.scala:699)
 at scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:374)
 at scala.collection.Iterator$$anon$17.hasNext(Iterator.scala:319)
 at scala.collection.Iterator$$anon$1.hasNext(Iterator.scala:706)
 at scala.io.Source$LineIterator.getc(Source.scala:182)
 at scala.io.Source$LineIterator.next(Source.scala:195)
 at scala.io.Source$LineIterator.next(Source.scala:165)
 at scala.io.Source.getLine(Source.scala:163)

With debugger on, you should be able to figure out the timeout parameter of the socketRead0 method is actually ZERO. That's why fromURL will block forever.

Open up the scala.io.Source source (2.8), fromURL is actually a convenience method to fromInputStream(url.openStream())(codec).

Now that's easy, just forget about the fromURL method. Use the fromInputStream instead with java.net.URLConnection.


import java.net.URL
import scala.io.Source

val timeout = 60000
val conn = (new URL(url)).openConnection()
conn.setConnectTimeout(timeout)
conn.setReadTimeout(timeout)
val inputStream = conn.getInputStream()

val src = Source.fromInputStream(inputStream,
                                 Source.DefaultBufSize,
                                 null,
                                 () => inputStream.close())

EDIT: forgot to close the stream!