learning java
Mark J. Dulcey
mark at buttery.org
Thu Apr 29 17:38:01 EDT 2004
Derek Martin wrote:
>>* I have a list of several million strings, but I know that there are only
>>about 100k unique ones. I want to make myself a frequency table that
>>tells me how often each string occurred. Write me a data structure to do
>>it. Everybody makes a hashmap, which is fine, but most people end up
>>creating several million Integer objects when it can be done by only
>>creating 100k.
>
>
> Not knowing Java, I couldn't write the code to do it; but the obvious
> solution seems to be to create a hash with the string as a key, and
> the frequency would be the value. I'm having trouble imagining an
> approach which would produce 1M hash objects and provide a correct
> answer... What's the deal?
What makes it tricky is two things about Java and its standard libraries:
1. You can only put objects in Java collections, not primitive data
types. Java provides "wrapper" object types for the primitive types
(Boolean, Byte, Character, Double, Float, Integer, Long, Short, String),
largely for that reason.
In current versions of Java, this is a pain, because you have to
explicitly box the primitive types when you put them in the collection
(using a constructor), and cast (since Java has only generic
collections) and unbox (using a method of Integer or whatever) when you
take them out. Java 1.5 introduces templated collections, like C++, and
automatic boxing and unboxing of primitive types; that will simplify the
code, but the efficiency cost remains.
2. The wrapper objects provided by Java are immutable. You can set their
values only in their constructors.
If you put Integers in your hash table, you will have to create a new
object each time you change the value. If you want to avoid that
overhead, you instead have to create a new class of object, say
MutableInteger, that has a set() method or has the int as a public
instance variable. In a real application, this comes more naturally; you
will usually want to store other data in addition to the count of each
string, so you create a class to hold all of it.
Immutability of objects is sometimes a good thing. For instance, if you
have a class, all of the members of which are immutable objects, you can
safely do shallow rather than deep copies of that object. If the member
objects can be modified, you have to do a deep copy if you clone your
object. Here, however, immutability gets in your way, forcing you to
create an extra class rather than using one from the library.
More information about the Discuss
mailing list