Thursday, July 7, 2011

Classes and dynamic typing

Designing PineDL class system is proving to be harder than I expected.

There are a few things to consider when designing a class system: Inheritance, access modifiers, scopes, nested classes, virtual members, etc.

Inheritance can be fairly straightforward, at least in a statically typed programming language. In fact, inheritance can be easy to implement even in dynamically typed languages. Multiple inheritance can be harder, but it's still not that hard. Provided that you can figure out at compile time what the base class(es) is/are, things are simple enough.

Now, let's add access modifiers and scopes to the mix.
Initially, let's consider only two access modifiers - public and private.
Public is very easy to implement. Private can be more complex. First of all, we need to define what private means. One way to define it is a member that can only be accessed from a scope that's descendant of the class scope it is declared in.
In statically typed languages, this is easy enough.
However, consider this hypothetical situation in pseudo-code:
class Foo {
private var x;
public function bar(othervar) {
this.x = othervar.x;
}
}
Can we do this?
Let's say this language is statically typed and othervar is of type Foo. Then sure we can. The scope of bar is inside the scope of Foo, which is the type of othervar.
Now, let's say this is a dynamically typed language. We can not, at compile time, know if othervar is Foo or not. In addition, we don't know if othervar.x is private or public.
This means that, at runtime, we have to find out if othervar.x is private, and if it, we must find out if Foo has permissions to access it private othervar members. This may not be trivial.
But it gets worse.
Let's add protected to the mix. Protected members also allow access from derived classes, so it gets harder to check.
Let's say we define that protected members of a class can also be accessed from nested classes of derived classes. This makes sense, but only makes things even harder.

Finally, virtual members. Class Derived extends Base. Base defines a virtual member virtualfoo and Derived overrides it.
Let's consider the following code in a hypothetical statically typed language:
Base bar = new Derived();
bar.virtualfoo();
In the above case, because virtualfoo is virtual, Derived.virtualfoo is called.
In a dynamically typed language, the exact same thing happens, without any problem nor ambiguity.
Also, even in this case, the Base.virtualfoo must still exist somewhere in case Derived does something like "base.virtualfoo()"(super.virtualfoo in Java).

Now, what about members which aren't virtual?
Base bar = new Derived();
bar.nonvirtualfoo(); //Where nonvirtualfoo is defined in both Base and Derived.
Some languages may not allow the above to begin with. C# allows it, but includes a keyword "new" to ensure it is intentional.
In a dynamically typed language, this can be implemented, but even if it works, it wouldn't be intuitive.

Now, if both Base and Derived include a nonvirtualfoo() member, but Base's is private and the one in Derived is public, then it would make more sense, but still be problematic.

Here's what I thought so far:
1. Classes are important to have and so is inheritance
2. The public access modifier and virtual members are in.
3. The private access modifier and non-virtual members are at risk.
4. The protected access modifier is out.

One thing I thought, inspired by some JavaScript programming, would be to have - instead of private - something even more private. Some sort of instance-private, that could only be accessed with the "this" keyword. And perhaps some sort of inherited-private, that could be called also with the "super"/"base" keyword.

I haven't decided this yet, but it's clear that a conventional class/access modifier would not be intuitive nor efficient in PineDL. These are just some ideas on how to solve the problem.

Monday, February 28, 2011

Introducing PineDL 1.0a3

It has been a while since the last alpha, but today I am proud to announce a new alpha release of the PineDL compiler.

Besides some source code cleanup and code quality improvements(including some gendarme warnings fixed and basic unit testing), here are some of the new features:

1. The result of ++a/--a is now stored to a. Previously, this was evaluated as being pretty much the same as (a+1).
2. Added support for a[0]++ and ++a[0].
3. Implemented "for" statements
4. CompilerUtils, a new library designed to simplify compilation and execution of PineDL code.
5. The ability to execute a PineDL program and actually read the returned code. Previously, one could only call the main function and there was no way to check the return codes.
6. Added support for named while/for statements. This is a feature of languages such as Java and a notable lacking feature of C++. Of course, this is pretty much useless since there are no break/continue statements yet.
7. Added support for assignments like a=b, or even a,b=b,a. The second example would swap the contents of a and b, something that's really simple in PineDL.
8. Improved compatibility with Microsoft .NET(rather than just being Mono compatible).

There is still a lot of work to do. Here are some of the features I'll try to have implemented in alpha4.
1. do...while statements
2. break/continue, including named break/continue.
3. a,b = function_call();
4. a.b.c.d = foo;
5. class bar {}, including perhaps inheritance
6. Character constants('x', '\n', etc.)

Binary and Source code available.

Saturday, December 25, 2010

Introducing PineDL 1.0a2

I am proud to announce the release of the second alpha release of PineDL.
This release is mostly incremental. However, it is already a significant improvement over the last one.

Here are some of the changes:

1. The parser is now Antlr-based. I am hoping this will help maintainability and simplify development of the compiler.
2. Variable declarations
3. "While" loops
4. Function expressions
5. Postfix Increment expressions

Regarding variable declarations, the following syntax is used:

var x, y, z = 1, 2, 3;

Function expressions take the following syntax:

function(args) { return 123; }

Note that function expressions MUST have blocks as statements. function() return 1; is not supported.

Finally, postfix increment/decrement expressions are currently the only way to change the value of a variable. Prefix increment/decrement is not yet supported.

You can download the binary or (if you're curious) the source code. If you find bugs, feel free to report them to the bug tracker.

Wednesday, December 8, 2010

PineDL Alpha 1

Today, I have released an initial preview version of PineDL.
Note that when I say "initial preview", I mean that it is horribly incomplete and not particularly useful.
Notably, there are no variable declarations nor loops, so you'll have to use good old function arguments and functional programming style. Except that without lambdas. And no data structures besides immutable strings.

That said, both the binaries and source code are available.
I would commit the source code to the repositories, but google code is on maintenance.

Source Code
Binary

Feel free to report bugs on the tracker.

Monday, December 6, 2010

System.Reflection.Emit and PineDL

Even if one implements a lexer and a parser, that alone is not enough to design a programming language.

The lexer is a tool that receives text and outputs a list of tokens.
The parser is a tool that receives a list of tokens and exports a tree.

What we need now is a tool that receives a tree and either runs it or exports some format that can be ran.

There are many ways to achieve this. One possible way is to use LLVM. LLVM is a C/C++ library that allows JIT and AOT compiling, as well as executing the result.
LLVM is a relatively low-level API. Although it allows the programmer to abstract away from concepts such as executable file formats, it is basically a portable assembler.

LLVM does not come with garbage collection, object-oriented programming support nor exceptions. Instead, LLVM provides the tools to create and integrate these features(entirely optional).

Contrast this with virtual machines like JVM and .NET.
Both Java and .NET use an intermediate format which comes with the features mentioned above built-in. In fact, not only they are built-in, they are mandatory.
These are high-level virtual machines.

By "high-level", I do not mean "better" nor "worst". Just different.
They are also not entirely incompatible. For instance, Mono has a LLVM backend for .NET JIT compilation.

While LLVM grants much programming freedom, it comes at a cost: additional development complexity. This may or may not be acceptable.

Among the multiple libraries .NET provides is System.Reflection.Emit. Like LLVM, this namespace provides support for creating, executing and exporting .NET assemblies on runtime.
For .NET language development, it is ideal.

System.Reflection.Emit integrates extremely well in the .NET stack. To generate a language standard library, one just creates that assembly in a .NET language(like C#), and then uses normal Reflection APIs combined with Emit to achieve the desired output.

I will not cover System.Reflection.Emit usage here(at least not now), but I will be discussing how it impacts language development.

In first place, like LLVM, there is very little API difference between executing and exporting a dynamically generated assembly.

In second place, MSIL is more programmer-friendly than LLVM intermediate language in many ways. In particular, object-oriented programming support is built-in, so executing a call is just this:

il.Emit(OpCodes.Ldloc, var1);
il.Emit(OpCodes.Ldstr, "xyz");
il.Emit(OpCodes.Callvirt, typeof(Foo).GetMethod("bar"));
il.Emit(OpCodes.Stloc, var2);

Is roughly equivalent to:
var2 = var1.bar("xyz");
Where var1 extends or implements Foo and "bar" is a virtual method.

Note that Foo could be a core API type. System.Reflection.Emit appears to be a first-class .NET citizen. There is also almost no difference between using some API method and using a dynamically generated method, as MethodBuilder extends MethodInfo and can be used in the Emit APIs in any place where MethodInfo could be used.

For PineDL, this is very practical. I just create the core language classes in C#, and then seemlessly integrate them into the PineDL application. This without having to concern myself with details like garbage collection and internal representation of virtual methods and interfaces. It just works, which is a big plus.
PineDL in particular is a hard language to "quickly" design a garbage collector for, since there is greater-than-usual cyclic reference risk, making choices such as simple reference counters and Boehm's conservative GC a bad choice.

Right now, I have a very simple working PineDL prototype, written entirely in C#, which I have not yet committed to the PineDL repositories.
It is also very simple and incomplete, so I will not be releasing binaries nor source code at this point. I want to at least have variable declarations, loops and most useful operations working before releasing something. In other words, I want to have something to show before showing it.

The code is currently divided in a few projects: Pine.Lexer, Pine.Parser, Pine.Core, Pine.CodeGen and Pine.IO. There is also a small testing project.
Pine.IO is part of the PineDL standard API, although it has very few functions right now.

I'm hoping to have something to show before the end of the year.

Saturday, July 10, 2010

Regarding function ambiguity

Yet another ambiguity situation can be seen in PineDL. That being the function type vs. function declaration.

Consider the following (valid) expression:
function() return 0;
The above being a function that takes no arguments and returns 0.

Now, consider this in context:
var foo = function() return 0;;
Note that the double ';' is needed(the first to end the function statement, the other to end the variable declaration).

The 'function' keyword is, however, a type as well.
const mytype = function;
Being a type constant, operations can be applied to function, including calling.
const x = function();

Now, consider this piece of code:
x = function();;
What can this be?
1. Function constant calling, assigning the result to variable x, and finally an empty statement.
2. Function declaration, assigning the function to x.

But which?
Of course, some cases are obviously one of the two:
{var x = function();} //Obviously 1
var x = function(){}; //Obviously 2

But the problem is in edge cases.
There are a few solutions:
1. Demand functions to be declared with {}
2. Prefer one of the cases(problematic)
3. Demand functions to be declared with {}, but allow functions to be declared using the \foo() syntax, and allow those to have no {}. In addition, make sure \ is not a valid type constant.

I haven't decided which yet.

Saturday, July 3, 2010

Getting LLVM to work with MinGW

Today, I have tried to make LLVM to work on Windows with MinGW.
When trying to use LLVM, it is very clear that it isn't a "pure" Windows project. For instance, plain MinGW won't work. While I expected to have to install mingw32-make, I also did have to install MSYS, which I particularly hate.

After downloading the source code, there were a few problems I had to face:
1. Could not rename File exists
c:\MinGW\bin\ranlib.exe: unable to rename 'c:/Users/HP530/Downloads/llvm-2.7/llvm-2.7/llvm-2.7/Release/lib/libLLVMX86CodeGen.a'; reason: File exists
make[3]: *** [/c/Users/HP530/Downloads/llvm-2.7/llvm-2.7/llvm-2.7/Release/lib/libLLVMX86CodeGen.a] Error 1

These ones were annoying. To solve them, I first had to go to the Release/lib folder and see if the indicated file existed. If so, I had to delete it and try again. If not, just running make again would make it work.

2. Figuring out the correct order to use when compiling a sample "hello world!" project. Here is the way I used:
g++ `llvm-config --cxxflags` -o hworld.o -c hworld.cpp
g++ `llvm-config --ldflags` -o hworld.exe hworld.o `llvm-config --libs` -limagehlp -lpsapi


The -lpsapi and -limagehlp wasn't obvious, even when looking at LLVM's documentation. I happened to find those mentioned in some mailing list when googling for the problem.

3. Threading
For this one, stackoverflow helped me. Lots of undefined references to __imp__pthreads functions. To solve it, I had to run ./configure with --disable-threads

Now that I've got this out of the way, I hope I'll finally be able to learn LLVM.

References:
http://stackoverflow.com/questions/2129263/how-to-build-llvm-using-gcc-4-on-windows
http://osdir.com/ml/compilers.llvm.devel/2005-09/msg00067.html
http://llvm.org/docs/CommandGuide/html/llvm-config.html