Wednesday, August 14, 2013

Searching JARs for String (Linux)

I recently thought about writing a Groovy script to search JARs for a specific string, but decided to first look for an alternative rather than writing a script. The alternative needed to be easy to use, freely available, and not bogged down with a load of dependencies. I was glad I looked first, because the Linux-based approach provided by "jan61" satisfied my need nicely. In this blog post, I look at that example more closely.

As documented on the LinuxQuestions.org thread searching contents of .class file within jar (for strings), a single line command in Linux nicely does the job of searching JARs recursively from a given directory for a given String. This search isn't searching for the names of the entries themselves in the JAR, but rather is searching the contents of each searchable file in the JAR. Here is that simple line (the token <stringWithOrWithoutSpacesToFind> represents the string to search for):

Linux Command to Search Contents of .jar Files for Specific String
find . -iname '*.jar' -printf "unzip -c %p | grep -q '<stringWithOrWithoutSpacesToFind>' && echo %p\n" | sh

I like to have this in script form (or as an alias) because that name is easier to remember than typing in that entire command each time. Here is an example script that could be used.

Linux Script Form of Above Command
printf "Searching JARs for string '${1}'...\n"
find . -iname '*.jar' -printf "unzip -c %p | grep -q '${1}' && echo %p\n" | sh

The two versions of the command shown immediately above will work as-is and the rest of this blog post focuses on how the command works. I start analyzing the command from the inside and move outward.

The Linux unzip command "list[s], test[s] and extract[s] compressed files in a ZIP archive." The -c passed to the unzip command "extract[s] files to stdout" and includes the name of the extracted files with that standard output. The %p is associated with the Linux find command. More specifically, %p is a directive to the -printf flag of the find command that directs it to include the found file's name.

Each found file is unzipped and its content directed to standard output where it is piped to a grep command to search for the provided text String. The provided -q parameter specifies "quiet" mode in which nothing is written to standard output and the grep exits immediately with zero status code upon detecting a match. The && symbols indicate that the echo command will be run to print out the file name with content matching the grep-ed for String if (and only when) the grep command returns a successful status (0).

All of the above are only executed against files with .jar extension thanks to the find . -iname '*.jar' command. The whole thing is piped to a shell.

Thanks to jan61 for the elegant Linux command covered in this post that makes it easy to search contents of files in JARs for a given string.

3 comments:

Cd-MaN said...

I guess that wouldn't work with pack200 jars. Probably the most compatible solution would be to use the jar command + a temporary directory.

Anonymous said...

I've also had good luck grep'ing the binary contents of the jar directly, prior to unjar'ing, as sort of a quick check filter.

Excerpt from my bash function:

for jar in $jars
do
if grep -q $searchterm $jar
then
$match=$(jar tvf $jar | grep $searchterm)
...
fi
done

This initial grep directly against the jar is much faster than unjar'ing every jar found, and you only unjar and display the contents if you get a hit.

@DustinMarx said...

Attila-Mihaly,

Thanks for the feedback. I had not considered JARs compressed with pack200.

Jon,

I like the idea of running the grep of the binary content of the JAR as a faster pre-filter before bothering to unzip the JAR. I thought the script that jan61 provided was pretty fast already, but that pre-filter should make it even faster, especially for numerous and large JARs.

Dustin