Phillip Pearson - web + electronics notes

tech notes and web hackery from a new zealander who was vaguely useful on the web back in 2002 (see: python community server, the blogging ecosystem, the new zealand coffee review, the internet topic exchange).


Integers in Java

Java question: I want to extract an integer from the first element of a Vector. Is there a better way than the following?

int foo = ((Integer) v.get(0)).intValue();

I'm assuming I'm doing something silly here, because it's so much easier in other languages. In C# it's:

int foo = (int) v[0];

and in Python, just:

foo = v[0]
... more like this: [, , ]

Back on this search engine

OK, I think I've cracked the problem of how to do a proper search engine for Radio weblogs. The problem with using an existing search engine (like ht://Dig, which I've wasted[1] far too much time getting to understand) is that it indexes whole HTML pages. This is fine for most sites, but Radio blogs[2] put many posts on a single page, and all blogs put heaps of junk (blogrolls, etc) around the outside of the posts, making it easy for a search engine to get sidetracked in its search for real content.

So now I'm hacking away on a standalone search engine that will integrate with Radio at a much deeper level. The concept is that Radio will inform the engine directly of the content of posts, and it'll index that rather than trying to extract the info from the HTML directly.

This will result in search capabilities pretty much equivalent to Blosxom's (more), which are the best I've seen so far.

I'm doing it in Java, because that means I can use Lucene (Lupy is good, but not quite complete yet). Other useful bits are Tomcat for the web serving and HSQLDB for data storage. Also Apache XML-RPC for communications. This is my first proper Java project (not the same as the one I started a few months back, but that never went anywhere in the end), so I'm going through the learning curve at the same time, which makes things interesting. Let's see how this goes ...


1. At least I now know a fair bit about ht://Dig's htsearch module and its build process ...

2. (or blogs made with anything other than Movable Type, which defaults to archiving each post on a new page)
... more like this: [, ]


Hmm ... Eclipse is a nice IDE, but it seems painfully slow at times.
... more like this: []

Backing up with tar and ssh

bbum: I found myself in a situation where I really needed to copy some files from a remote OS X box to my local system, but the only access I had was via SSH. Unfortunately, the files all have resource forks that contain pertinent information.

You can probably do this all at once:

  ssh -c 'gnutar cvp file1 file1/rsrc file2 file2/rsrc file3 file3/rsrc' | gnutar -x -v -p --overwrite

Here are some other handy ones:

- back up a remote directory to a local tarfile over a slow link

  ssh -c 'tar -cz /path/to/dir/to/back/up' > backup.tar.gz

- back up a remote directory to a local tarfile over a fast link, where the remote PC is very slow

  ssh remotehost -c 'tar -c /path/to/dir/to/back/up' | gzip > backup.tar.gz

Also rsync is great for this sort of thing:

  rsync -vr localbackupdir
... more like this: []