|
|
|
|
|
|
|
|
|
|
What's News in the PlayBasic world ? - Find out here.
PlayBasic To DLL Work In Progress Update
|
By: Kevin Picone Added: October 7th, 2013
|
Category: All,Update,Tools,Machine Code
|
PlayBasic To DLL (Convert PlayBasic to Machine Code DLL's)
I've been quietly working on two sides of the tool in parallel, the translator tool, as well as making updates to PB so they can integrate side by side. It's pretty slow going as some things are added quicker than others, in particular commands that aren't actually commands in the legacy VM, which need to be wrapped in order to create an external interface. Most of the core operations work like this.
The current edition of the translator supports around 160->170 core commands as well as Integer, Float, String, Pointer , Arrays (int, float, string, used defined typed) and core logic. The great thing about using the PlayBasic compiler to build the byte code, is the resulting code has lots of redundancies removed for us. Enabling us to export cleaner & faster executing assembly.
The assembly generation routines are a cross between cookie cutter and smart at this point. It all depends upon the operation, the current generator can only optimize output code for extension speed, when it notices friendly sequences. The excitingly thing about that, is the machine code is already routinely faster than our competitors.
Here's a few tidbits from the Work In Progress blogs.
PlayBasic To Dll - One Stop Shop - (August 20, 2013)
It's 5:09am and the tool finally has it's first taste of automation. Previously, I had to compile the PlayBasic code to byte code, then copy the output object code to the test folder, run the convertor, cut'n'paste the resulting source code fragments into the dll template, assemble and repeat.. over and over... Which gets old real fast ! So the goal tonight has been to get the program to a point where I can point it at a PlayBasic source file (PBA file) and it'll call the compiler, build the byte code and do all the conversions itself. There's a couple of hacks for the time being, since the compiler will need a specific mode for this stuff, But all in all, it's pretty painless.
The build speed is pretty good, even though the test sources are very simple, the actual conversion and assembly stage is consistently executing in about 50 milliseconds (or less). The conversion engine only supports a single PlayBasic source file at this time, it doesn't support includes (couldn't be bothered at this stage).. The resulting DLL will have all the code in the source in it, even if it's not in use. So ideally you'd use it to compile stand alone include files that perform some brute force task. The tool expects the functions you want to be made visible (exported) to have "DLL_" at the start of the function name. The tool rips this off, it's just a way to ID what functions you wish to export, without having to change the PlayBasic compiler dramatically.
Once the DLL is built, you can not only link this to your application but bind it to be executed from memory (doesn't need to be exported to disc). Size wise the resulting DLL's are pretty small, weighing in about 5K for the output of the following.
Function DLL_JumpTableTest(ptr)
; integer for next step loop
For lp =0 to 100 step 10
pokeint ptr,lp
ptr+=4
next
print lp
EndFunction
Function DLL_ForLoopTest(ptr)
StepY=10
StepX=10
Counter=0
; integer for next step loop
For ylp =0 to 100 step StepY
For xlp =0 to 100 step StepX
Counter++
next
next
For ylp =100 to 0 step -StepY
For xlp =100 to 0 step -StepX
Counter++
next
next
a=100
b=200
swap a,b
print Counter
EndFUnction
Function DLL_JumpTableTest_BIG(a)
; integer for 'next with literal
for lp=0 to 100
a=b
next
; Integer for next with variable/register end value
for lp=0 to B
a=b
next
; Nested integer for / next loop
For xlp=A*B to 100
For ylp =0 to xlp*100
Stuff=xlp
next
next
on a goto label0,label1,label2,label3
a=999
goto done
Label0:
a=0
goto done
Label1:
a=111
goto done
Label2:
a=222
goto done
Label3:
a=333
goto done
Label4:
a=444
Done:
a=addtest(a,40000)
EndFunction
Psub AddTest(A,B)
result=a+B
EndPsub result
PlayBasic To Dll - Fractals anyone ? - ( August 24, 2013)
Attached you'll find a nice shiny example of what the DLL convertor is able to do today in the form of fractal render. I've modified the code slightly to work around a few functions that aren't supported as yet (the original code is in the source code board), but the result is a pleasant 18->19 times the performance improvement. The demo is drawing 640*480 pixels with potentially 200 square roots per pixel, So yeah.... that's a lot of work for the runtime to try and brute force.
The convertor currently has a hand full of opt's it can make when translating / exporting code, but it has no real awareness of the register management, so it's hitting the variable heap a lot more than what it should (extra memory accesses). Even without bust a gut though, it's working pretty well for now on my 8 year old athlon system..
Made a few register tweaks before tea tonight, then ported the bench mark code to a couple of competitors. No surprises the PlayBasic version runs 2.8 times faster than one, 4.5 times faster than the other... Oh dear, how embarrassing that must be for them...
PlayBasic To Dll - Strings -(August 28, 2013)
Been working on getting strings working most of the afternoon only to run into a strange crash when two string are added together. On inspection one string was legal (Hello World) and the other was null for some reason.. It's funny how your mind focuses in on a segment of code of you think is the problem, only to find the issue a few lines above it. Turned out the function initialization was killing the string buffer. Once corrected, it worked as expected.
So far I've only got a handful of core operators hooked up, it's just a matter of joining the dots. Something that is interesting is the when we start talking string management, we actually get less benefit from translating to machine code than what you might think. Unfortunately there's this idea out there that machine language is the golden bullet, but really it isn't and never was. In regards to strings.. well, there's no 'instruction set' in your CPU for doing string operations, everything is just reading and writing arrays of bytes. So if we add two big strings together, then regardless of how this operation was called, be it from machine code or the VM, the string joining operation is taking 99.9999% of execution time here.
Strings are a notorious bottle neck of programming languages, knowing that, the PlayBasic string engine is a very optimal solution, every effort has gone into making it as quick as possible. That's why I write string processing apps in it..
Here's a little something to bend your reality.. The function joins Hello World together 101 times. There's two version of the test, exported DLL version and the VM function. Bellow is clip from the current test code. So basically PB2DLL is pointed at this source and we get a nice shiny dll version couple of seconds later..
LinkDLL "StringTest.dll"
; String tests
DLL_StringTest() alias "stringtest" as string
EndLinkDLL
startinterval(0)
for test=0 to 100
Result$=DLL_StringTest()
next
print EndInterval(0)
print Len(Result$)
StartInterval(0)
for test=0 to 100
result$=DLL_StringTest2()
next
print EndInterval(0)
print Len(Result$)
Sync
waitkey
Function DLL_StringTest2()
s$="Hello World"
for lp =0 to 100
b$+=s$
next
result$=b$
Endfunction result$
Yep, obviously the machine code version is going to be faster, but the interesting thing is by how much ? It'd be easy to assume it's going to be 5->10 times faster, where it's actually only around 35% faster.. Why, because it's spending most of it's time copying characters, not executing VM opcodes.
To put that in some real world perspective, DLL version is 12 times faster than one competitor and 18 times faster than another.
PlayBasic To Dll - Typed Pointers Array Fields - (September 01, 2013)
Getting this working has been a detoured chain of events. The first problem was the disassembler didn't support most of the pointer opcodes I needed, so the first port of call was adding that functionality, just so I could translate it back into assembly. After adding the decoder, it's here we notice there extra additions in the array field writes with type pointers. Could have ignored this, as an extra opcode in output might not like sound like much waste here and there, but if that code is sitting in some brute force loop, then it's throwing away 1/3 of the operations performance for nothing. This wastefulness would then be translated to the machine code DLL too, so it'd just be extra unwanted wasted cycles having a negative impact when the code is within a loop.
The only way to solve such problems is fire up PlayBasic and take a look at what it generates in particular situations. For some reason it was adding the structure displacement offset to the temp pointer register, adding the array offset, then doing the write. When all it should need is the adding the array offset, then do the write since the writes opcodes support displacement. So the offset is virtually free. Moreover, it didn't support literal array indexes, which can be pre-computed at compile time and represented as one displacement. So that was yesterdays little chore. The results are as expected as it's 30% quicker. I suspect there might be a few more situations like that hidden away also.
Tonight's mini session has been all about hooking up the assembly generation side, which gone relatively well really. Meaning it can produce working code from the following.
Type CoolType
x,y,z
a#,b#,c#
ThisString$
IntArray(10)
FltArray#(10)
StrArray$(10)
cool
EndType
Function DLL_Pointer_FillCoolType(Address)
if Address
DIm Me as CoolType pointer
Me=Address
me.x=111
me.y=222
me.z=333
me.a#=111.111
me.b#=222.222
me.c#=333.333
; me.ThisSTring="String"
for a=0 to 9
me.intarray(a) = 2000+a
me.fltarray(a) = 2222.34+a
me.Strarray(a) = "storm"+str$(a)
next
me.intarray(1) = 1001
me.intarray(2) = 1002
me.intarray(3) = 1003
me.fltarray(0) = 1000.111
me.fltarray(1) = 1001.111
me.fltarray(2) = 1002.111
me.fltarray(3) = 1003.111
me.Strarray(0) = "cool1"
me.Strarray(1) = "cool2"
me.Strarray(2) = "cool3"
me.Strarray(3) = "cool4"
me.cool=123456
endif
EndFunction result$
So it's getting to pretty familiar level of functionality, there's of course plenty of no no's just waiting to trap you. The main one that comes to mind, would be the lack of auto casting in some translated operations. Like if you have a FOR/NEXT loop, then it currently only supports Integer loop counters, where in PB you can have Integer or floating point loop counters. Same with things like function parameters. If a function expects an integer and you pass it a float, you can get away with this in PB, since the runtime is recasting the parameter on demand, but the translator tool doesn't currently support this. It will.. just not today.
(To get the most up-to-date info, you'll have to read the complete blog on our forums)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|