Friday, September 22, 2006

Annotations in Groovy

Annotations are thought for providing MetaData.

Such MetaData can be used by the compiler or other processing tools by using the source, the bytecode or at runtime. Annotations are not thought to have an semantic effect.... I always have to tell me this... Somehow I really have a problem with that. Anyway, Groovy doesn't have them. And Groovy won't get Annoatations in 1.0. But as soon as 1.0 is out we want to add them. Tools working with annoatations will then most likely work for Groovy too.

The new JUnit is an example of why Groovy should have annotations. It is not sure yet what set of default annotations Groovy will support, but people will be able to write annotations in Groovy. How much of the enhancements though Groovy can be used in annotations is open, we need to look at what the bytecode allows and if we can go around it... and if we want to do that.

And there is Hibernate... avoiding to persist the metaClass field is sometimes very important. We had several problems with serialization of that property, because the MetaClass is not serializeable and so the default serialization might fail. Using bean serialization might make things even worse. And Hibernate...


@Entity
class Customer implements Serializable {

@Transient
MetaClass metaClass

@Id
Long id

String firstName
String lastName
Date birthday

@Transient
Integer age

@Embedded
private Address homeAddress

@OneToMany(cascade=CascadeType.ALL)
@JoinColumn(name="CUSTOMER_ID")
Set orders
}

I "stole" that example form the Hibernate page and added the transient property for metaClass. Of course you don't have to add the getter and setter methods, as they are already generated by Groovy. Just add buisness code and be lucky ;)

And of course there are more tools using annoatations and I am sure there will be much more of them in the future. I am very curious if other dynamic languages on the JVM will go that way to. But as most scripting languages on the jvm do not generate compatible class objects it might be out of scope for many of them.

Monday, September 11, 2006

Groovy on Speed - a "fast mode" for Groovy

fast mode? What's that?

The Problem:
Did you ever try to meassure the performance of groovy? It is a quite difficult task. Because of the high integration of Groovy and Java, you are contantly calling Java and Groovy. For example think of some time consuming operations in a Java library. The Groovy part would probably call the library and then return the result. The overhead through Groovy is minimal.

Calling a Groovy method doing some calculations on the other hand might result in bad performance. The solution is quite simple. Identify these parts and write them in Java, by delegating to a Java library or subclass the Groovy class in Java and overwrite the methods in Java.

Of course this is a possibility. But it means you have to use Java for these parts. It means you can't use Groovy syntax for the operations. That sure is no problem for people seeing Groovy as frontend... somewhere... occasionally used, but no essential part. It does indeed become a problem for groovy evangelists.

What exactly is the Problem?
What Goovy makes slower than Java is it's ability to add/intercept methods at runtime and it's slightly different type conversion model. A method invocation in Groovy consists usually of several normal method calls, where the arguments are stored in a array, the classes of the arguments must be retrieved, a key is generated out of them, a hashmap is used to lookup the method and if that fails, then we have to test the available methods for compatible methods, select one of the methods based on the runtime type, create a key for the hasmap and then in the end, do a reflection like call on the method.

Benchmarks usually use the same typs of arguments, so the hasmap kicks in and allows a "fast" method selection. Only counting the method calls, you get around 4 calls before you are able to look in the Hashmap. Then you create the key, do the lookup in the map... given all that I think it is quite a wonder Groovy has such a good performance!

How to make it even faster?
The way to make something faster is either to find an algorithm with a better runtime complexity, or to reduce the constants. If we would not have to create the Object[], we might be able to reduce the costs very much.. I made some meassurments and it seems that invoking a method taking 3 arguments is 3 times slower than a method taking no arguments. That's because we not only have to create the Object[] for the 3 arguments, but we also have three objects which types we need to get, the key becomes longer, and as the key is a String there is a coherence of the length of the string and the number of arguments.

Java does avoid that. Java does select the method at compile time, so there is no Object[], no type extraction and such. Of course the VM checks the types of the parameters, but in case of Groovy the VM does have to do that more than once. The end of the call in Groovy is the same as in Java, but there has much action taken place before. So if we were able to select a method at compile time in groovy, we could avoid that problem completly.

What does it mean?
It means Groovy would be as fast as Java! But that is no general solution. Dynamic typing doesn't allow you to choose the methods at runtime. And the MetaClass facilities won't help here too. so you are changing the semantics of a method call when chanign the typing from dynamic to static.

So what is the Fast Mode then?
The idea is not do dynamic typing every where. In fact there was already a proposal going into that direction. It was even in the codebase, but it was a global switch. I think that is the wrong solution. My solution is to let the programer decide where he wants to have that. For simplicity I will talk about an annotation here, but there was no decision on what to use. Anyway, I would use this annotation for a class, for a method, for a block of code or a single statement. Each so marked part would loose dynamic typing and letting the compiler use static typing to choose the method. This means that the MetaClass is ignored. It does not mean, that we loose any of the addons to the jdk we made. It will still be possible to use "each" or native syntax for BigDecimals, lists and maps. It doesn't mean that you get back checked exceptions or such. I am not sure yet if the compiler should forbid the mixed usage of static typed and dynamic typed arguments, or if it simply should fall back to the normal invocation mode.. Anyway, so instead of asking the MetaClass to tranform the type and make the call the compiler, we will add code to make the type transformation and the method call based on the static types. We loose multimethods then, and we loose the ability to catch unknown methods and properties! So we loose one of the most appealing feature of Groovy. And given the fact, that a builder depends on them we loose them then too. But as you can limit the usage of the fast mode very much it is no problem at all.

Of course this adds much to the complexity of the language.

What about ivokedynamic?
invokedynamic is the great hope of all dynamic languages on the JVM. If it helps us to not to create the Object[] and if it helps us to avoid to create the hasmap key and if it helps us to avoid the type tranformations, then it might be a much better solution than my fast mode. But even if it does all this, it won't be in Java6. It is Java7 at best. And it is not yet defined. The work for this is still in the early stages and while I hope the result will be very good, it is no solution for now.

The Goal
The idea of the "fast mode" is to provide a way to give the maximum performance at certain places. Then you can do calculations in Groovy. I am no fan of using one language for all uasages, but on the other hand I think that such a mode would be very good for Groovy. And it would tell all those benchmark writters out there not to meassure a language with calculations. That is a very stupid thing to do I would say.

The implementation is surely not done in a few minutes and this "fast mode" will not go into Groovy 1.0. But 2.0 or even 1.1 may have that.

Other ideas?
There were other ideas, yes. For exmaple not to simply make the call through the Metaclass, but to let the compiler guess a method and check with the MetaClass if the guess is right. Sadly that doesn't help, because the method selection has still to be done and is the cost intensive factor here. Writing a hotspot engine for Groovy might be a solution. Java is making HotSpot open, so it is theoretically an option. And possible a much better option than anything else. But there are many unclear details.... none has the knowlefge to write something like that, noone knows the API, how is the HotSpot engine added to the normal Java SDK?

We always tried to avoid the need of a special Groovy JVM, even it is jsut the JVM with additional bytecodes. if the HotSpot engine can't be added to any normal JDK then I don't see why this should be any different.

Reality!
I think the space for doing optimizations is big. For example in the "fast mode" we probably don't need to always create a closure. Instead we can inline the closure handling method and the closure itself. But I think only a concrete inplementation of all the possibiilites would be a fair way to tell if something is better. But even if the normal method invocation in Groovy gets a boost fomr a better HotSpot engine, or new bytecode, it only means that the "fast mode" is no longer needed. I would always recommend to use the normal Groovy method invocation and identify the cost intensive parts. Neither will the "fast mode" solve all problems nor will it free you from the usage of profilers.

Saturday, September 09, 2006

Remote Closures

a better RMI?

The discussion about closure support in Java is going on. Well, Groovy does have Closures and I like them much. But I asked myself a question: "What happens when I want to execute a closure on a remote VM?"

I answered myself, that that possibly would be a problem. If we use the normal serialization mechanisms, then we have bad luck, a Closures is not Serializable atm. Well, ok, that could be changed, but does it make sense? Usually a closure is a bit like a annonymous inner class. That means in case of RMI you don't have a stub as RMI would require. So maybe the real question is about how to make Groovy's closures Serializeable?

I had some nasty thoughts about somehow serializing the AST into the class, but then I recognized, that the original source might be good enough for that. And it doesn't bloat the class files which would have a bad effect on classloading time.

So I thought, maybe give them custom read and write methods for serializetion, where we transfer the source as well as the enviroment the closure closes around. This could do the job. But then the MetaClass needs to be serializeable as well.

No, I think if we are able to get the source for the closure, then why not sending this to the remote machine? We would put the source and state information of the closure in a wrapper object and when deserializing we reproduce our closure. We use ObjectStreams that know what to do with the closures and we somehow have to foist our custom streams upon RMI.

Or we don't use RMI at all. If we use the ObjectStreams directly we don't have any problems. I may have time in the future ot take a look at spring and it ability to use different remoting services. But thinking of how RMI is done, I always have the feeling that it is completly surplus. How would I implement Remote Method invocation? I need some kind of cahnnel, transfering the method message, that is name of the method, some kind of id for the object the call is made on and a serialized version of the parameters. On the other end, I need to use the id to lookup the goal object, I need to deserialize the arguments and then just make the call using a MetaClass. It is easy!!

I think I will try an example implementation the next days and keep you informed about the result. The nice thing is, we can use that from Java too, thx to the good integration of Groovy in Java!

Friday, September 08, 2006

Getting a Groovy S.O.D.A

For all who don't know S.O.D.A., it is a query language for object databases. SODA is used by the successful db4o project - a object oriented database for Java and .NET

Imagine an object as graph. The object itself is a node, every field is a connection to another node, another object. You build up a query by defining an entry point into that graph and setting constraints on the connections to other nodes. I borrow an class from their tutorial, I hope that will be no problem

class Pilot {
String name
int points
}
The above defines a class named Pilot with two properties, in Groovy this means a java like class with getters and setters for name and point. Now look at a query:
Query query=db.query();
query.constrain(Pilot.class);
query.descend("name").constrain("Michael Schumacher");
With this I build a query pattern using Pilot as entry point, looking at the connection "name" and constraining that to Objects equals to "Michael Schumacher". So giving this query a database will write out all Object of class Pilot, with the value "Michael Schumacher" stored in "name"

Pretty neat, isn't it? It is typesafe and such... but a bit long... Well Groovy supports operators, why not try something else:
Query query=db.query();
query.constrain(Pilot.class);
query.name == "Michael Schumacher"
we save the descend and the constraint method. But it gets much better when it comes to connecting query parts for logical operations:
Query query=db.query();
query.constrain(Pilot.class);
Constraint constr=query.descend("name")
.constrain("Michael Schumacher");
query.descend("points")
.constrain(new Integer(99)).and(constr);
This query asks for all Michael Schumacher with 99 points. I think we can do this much better in Groovy:
Query query=db.query();
query.constrain(Pilot.class);
query.name == "Michael Schumacher" && query.points == 99
hail operators! This is clear and simple, we don't need the temporary variable any longer and saved some lines of code. And of course operations as greater, smaller, equal, not, and, or are all possible.

Is it possible to improve that even more?
Query query=db.query();
query.constrain(Pilot.class);
with (query) {
name == "Michael Schumacher" && points == 99
}
Maybe a matter of taste if that version is better, but imagine complex queries. I gues you will be lucky not always having to write "query" everywhere.

Another important part of SODA are evaluations. When we use evaluations, then every object our query had as result, is tested against an evaluation instance we provide, and tells the databse to include it in the result or not.
class NameLengthEvaluation implements Evaluation {
public void evaluate(Candidate candidate) {
Pilot p = (Pilot) candidate.getObject()
candidate.include( p.getName().length() < 5 )
}
}
query="db.query();"
This would get us all Pilots with a name of a length of less than 5. A small helper class will give us very much power for groovy:
class ClosureEvaluation implements Evaluation {
ClosureEvaluation(c) {this.closure = c}
def closure
public void evaluate(Candidate candidate) {
candidate.include(closure(candidate.object))
}
}
using this class we can use a closure as evaluation:
Query query=db.query()
query.constrain(Pilot.class)
query.constrain (new ClosureEvaluation() {
it.name.length()<5
})
overloading the constrain method we can even shorten this:
Query query=db.query()
query.constrain(Pilot.class)
query.constrain { it.name.length() < 5 }
Now it is very short, isn't it? We can also use a variant of the closure with two parameters to allow access to the object container, but given the fact, that we use a closure we can simply reuse already declared variables outside the closure.

Conclusion:
Yes, db4o provides a solution in Java too. I mean a shorter form of SODA queries named native queries. Basically the bytecode of the class is loaded, analyzed and transformed in a set of SODA queries and evaluations. Given these short forms in Groovy I am not sure I "need" native Queries.

Friday, September 01, 2006

Non Local Transfers in Groovy: A 90% Solution

The discussion about supporting break/continue and return in Closures is a old one.

I was always looking for a 100% solution that just works. Too sad I still haven't found one. But ok, let us look at my new proposal, which is inspired by a blog entry from John Rose about the ongoing discussion of closure support in Java. I think my proposal should cover 99% of all cases.

First let me introduce the term "appended closure". With that I mean closures appended to a method call. An example would be

list.each {println it}
or
if (atHome) {doHomework()}
Ok, the last could have been a closure, bit is none ;)

My solution now covers only these appended-closures. Closures passed around as variable are not covered. These appended closures are most looking like the normal control structures we have, as for example the while loop.

Now what happens when declaring a while loop? The compiler will make us a label marking the entry point of the loop, a label marking the exit point of the loop, and some gotos for break/continue. in Groovy Closures are inner classes, so they can't jump to a label using the bytecode. So we must transfer that exit state form the closure to the calling point and then do our goto there. There was much discussion about how to do that. We discussed return values, which won't work as we must exit the loop method which may be void. We discussed xceptions, which would work, but we where unsure if we really catch them all at the right places and having to rewrite all closure handling code out there is not nice too. We discussed additional fields, with the problem of using the same closure multiple times.

If you only look at appended closures, then some of the problems go away. We get a defined point where to catch the Exception, look at the fields or whatever. But we need the Exception, because we need to exit the method calling the closure. We need it in the case we want to do a break.

And if we limit break/continue to just this type of closures, we have no real problems. We can even do a labled break/continue and not jump only outside the loop method, but outside the surrounding loop too. So what would it look like?
def list = [1,10,100,10,1]
list.each {
it (it>10) break
println it
}
The intention is to print the values 1 and 10 and then stop processing. My suggestion now is to tranform this code into:
def list = [1,10,100,10,1]
int id = createLabelID()
Closure c = {
it (it>10) throw new ClosureBreakException(id)
println it
}
try {
list.each(c)
} catch (ClosureBreakException ce) {
if (ce.id != id) rethrow ce
}
Well, yes, looks like much code. But the compiler does the work for us. What about continue?
def list = [1,10,100,10,1]
list.each {
it (it>10) continue
println it
}
Printing 1,10,10,1. It would be transformed into something like
def list = [1,10,100,10,1]
list.each {
it (it>10) return
println it
}
ehm.. yes.. easy ;) All code in the method calling the closure is always executed then. Compared with a for-loop this is the part where the increment happens an the comparision is done. If we want to jump to a different point, like a surroung for loop, we need exceptions too, because we break the inner loop then
outer: while (foo) {
list.each {
it (it>10) continue outer
println it
}
}

is basically the same as
while (foo) {
list.each {
it (it>10) {doContinue = true; break}
println it
}
if (doContinue) continue
}
But as we use exceptions here, we don't need that "doContinue"
int id = createLabelID()
Closure c = {
it (it>10) throw new ClosureBreakException(id)
println it
}
while (foo) {
try {
list.each(c)
} catch (ClosureBreakException ce) {
if (ce.id != id) rethrow ce
continue
}
}
What about combinations of continue and break?
To enable this I suggest the ussage of more than one id
while (foo) {
list.each {
if (it<0)>10) break outer
}
}
is transformed to
int idBreak = createLabelID()
int idContinue = createLabelID()
Closure c = {
it (it<0)>10) throw new ClosureBreakException(idContinue)
}
outer: while (foo) {
try {
list.each(c)
} catch (ClosureBreakException ce) {
if (ce.id == idBreak ) break outer
if (ce.id == idContinue) continue outer
rethrow ce
}
}
The implementation is really straight forward.

Speed Issues?
We can tell the people that using break/continue might slow down.

Danger of doing break/continue on the wrong loop?
I think we eliminated that problem with our createID function, which produces a new unique id.

Why is it a 90% only?
What I can't do is:
def c = {if(it>10) break; println it}
list.each(c)
and expect the break to work. That is because we have no defined point where we can catch the exceptions. A normal continue can work.

What is the return value in case of continue?
The same as by break, nothing. People will have to know this. We can't have both, a return value and an exception, if the exception does not the job of transporting the return value. In case of a "collect" I expect to get at last a pratial list. To enable this, we must catch the exception and then store the return value od the loop method in there. The changes to the code I showed above is trivial. The changes to the loop method are trivial too, but must be done. The case of a continue is a bit more difficult. And to say the truth I don't know of a solution here. That is because

... I want to avoid people having to rewrite their closures handling code.
Respecting continue would mean to catch an exception in the closure handling method. I don't think that is nice. If I don't do it in case of a "break" my code still works, but might not return correct data. If I don't do it in case of a "continue" my code doesn't behave right and doesn't return correct data. And while supporting break is not needed in every method, supporting continue would be.

What about "return"?
We could support it the same way we support "break". No real problem. In fact that is the most easy version since we don't have to handle incomplete data.

My Suggestion:
Don't allow "continue" in closures. It might look strange, but it has semantic problems. People can always simulate it by returning from the closure with a special value or such and respecting that value in a loop. But avoid new users getting in trouble here and confronting them with more black magic as we already do, I suggest to replace "continue" with a "closure return". The advantage is, that it returns a value and thus is no problem in aspects of incomplete date. for Java ^ is suggested as additional return, I haven't thought about a name or pattern yet. break/return can be implemented as described above for appended closures. Other closures won't have break/return, the compiler would forbid them there. A bit of a problem is the new return. Becasue using the current return in a closure in current Groovy means to return from the closure if it is in a closure and return from the method if it is not in a closure. When seeing a closure as method this is ok, but we don't want to see them as methods. So "return" would change its semantics compared to older groovy versions.

All in all this suggestion here would allow people to use break and return in closures. It would allow to sue them in appended closures, the most used form of the closure when doing loops. A "closure return", or I should better say "block return" would replace the old continue statement. If people really wish to omit a single step, they have to write their own loop.

The advantages is that I don't have to write additional closure handling code for break/return. Only if I want to avoid the incomplete data in the case I return something. Next advantage is that I don't have to identify the closure somehow in the loop method, I don't care if the current closure caused the problem or something deeper inside. I have a defined point where I can catch the exception and don't have to be afraid that it might interfere with other loop methods.

The other possibility I see is to forbid them all ;)