Acorn Arcade forums: Programming: Are you bored?
|
Are you bored? |
|
This is a long thread. Click here to view the threaded list. |
|
ninjah |
Message #56974, posted by ninj at 16:01, 14/7/2004, in reply to message #56971 |
Member
Posts: 288
|
What's lcc? For some reason I don't understand, GCC doesn't yet do modules, but I've written a simple module (years ago) in the BASICassembler, and it wasn't too tricky. Mind you, it didn't use any data structures more complex than a single lookup table. I think I figured it out by looking at the labels Zap assigned to each address when viewing a module. Since then I've got the Norcroft compiler, but I've not used it to make modules (in truth, I've not used it to do very much at all).
16 registers (in reality 13, because you can't use PC, and you don't want to corrupt the stack register or the return pointer) are usually enough for the scope of a subroutine. When you enter a subroutine, you push all the existing registers (well, all the ones you're going to corrupt) onto the stack, and pull them off before you return. For any larger data structures, I'm afraid you need to start thinking pointers. You basically set aside the equivalent of a large char* in your assembler listing, and use offsets of that pointer to tweak bytes or words to store, in my case, a lookup table.
I think that in practice I find it harder to keep track of loops and conditions than data structures. Breaking everything down into subroutines makes this much easier to keep track of (though all those branch instructions are hardly efficient on an ARM processor). |
|
[ Log in to reply ] |
|
Simon Wilson |
Message #56976, posted by ksattic at 16:17, 14/7/2004, in reply to message #56974 |
Finally, an avatar!
Posts: 1291
|
16 registers (in reality 13, because you can't use PC, and you don't want to corrupt the stack register or the return pointer) REAL programmers use the PC between instructions for data storage.
Did I say real? I meant bad.
I think that in practice I find it harder to keep track of loops and conditions than data structures. I just find it hard to believe I can write decent assembler, when an optimising compiler can do tricks that I might not think of.
Breaking everything down into subroutines makes this much easier to keep track of (though all those branch instructions are hardly efficient on an ARM processor) By predicting taken for reverse-jumps and not-taken for forward-jumps, >95% branch prediction can be achieved. The XScale has brnach prediction hardware that I won't go into because I don't have the specs here to quote. Branch prediction aims to keep the pipeline full - important in a superscalar architecture.
What's lcc? Little C Compiler?
It's a small C compiler that can compile code quickly, but doesn't optimise like gcc can. Jeffrey Lee made a test version that can compile code for modules and link with module headers created with cmhg.
[Edited by ksattic at 17:19, 14/7/2004] |
|
[ Log in to reply ] |
|
ninjah |
Message #56979, posted by ninj at 16:46, 14/7/2004, in reply to message #56976 |
Member
Posts: 288
|
I think that in practice I find it harder to keep track of loops and conditions than data structures. I just find it hard to believe I can write decent assembler, when an optimising compiler can do tricks that I might not think of. Optimising compilers are pussies! Most of the 'optimisations' they do would be classified by a human as just 'not writing daft code'. Writing optimising compilers is only so tricky because it's hard to do the sort of things that the pattern matching human brain does in its sleep.
All that said, the sort of optimisations you'd need for the critical sections of Aemulor, well, I'll leave those to Adrian! |
|
[ Log in to reply ] |
|
Phil Mellor |
Message #56980, posted by monkeyson2 at 16:59, 14/7/2004, in reply to message #56979 |
Please don't let them make me be a monkey butler
Posts: 12380
|
Most of the 'optimisations' they do would be classified by a human as just 'not writing daft code'. Hiya! |
|
[ Log in to reply ] |
|
Jeffrey Lee |
Message #56982, posted by Phlamethrower at 17:17, 14/7/2004, in reply to message #56974 |
Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot stuff
Posts: 15100
|
What's lcc? "A retargetable compiler for ANSI C"
http://www.cs.princeton.edu/software/lcc/
(Or http://www.riscos.info/lcc/ for the RISC OS port, which I believe is a tad broken ATM. Once I've sorted the module stuff out, I'll be sending all my changes back to Chocky so he can update his download)
Jeffrey Lee made a test version that can compile code for modules and link with module headers created with cmhg. Providing cmhg produces correct code, of course
The version I and Simon have is 5.32, which I found in a free update somewhere on the Internet (Since neither of us own Castle C/C++). Unfortunately, as far as I can tell there seems to be a bug in the 32bit version of the service call handler. Anyone care to help a fellow coder out by offering us their copy of cmhg? |
|
[ Log in to reply ] |
|
Simon Wilson |
Message #56987, posted by ksattic at 19:26, 14/7/2004, in reply to message #56979 |
Finally, an avatar!
Posts: 1291
|
Optimising compilers are puppies! Aw, sweet!
Most of the 'optimisations' they do would be classified by a human as just 'not writing daft code'. My point exactly. |
|
[ Log in to reply ] |
|
Mark Scholes |
Message #56991, posted by mavhc at 20:35, 14/7/2004, in reply to message #56982 |
Member
Posts: 660
|
Can't you use CMunge? |
|
[ Log in to reply ] |
|
Simon Wilson |
Message #56994, posted by ksattic at 21:38, 14/7/2004, in reply to message #56991 |
Finally, an avatar!
Posts: 1291
|
Can't you use CMunge? I don't think CMunge is 32 bit compatible. At least, when I tried it, it didn't work. |
|
[ Log in to reply ] |
|
JMB |
Message #56998, posted by jmb at 23:59, 14/7/2004, in reply to message #56982 |
Member
Posts: 467
|
Providing cmhg produces correct code, of course
The version I and Simon have is 5.32, which I found in a free update somewhere on the Internet (Since neither of us own Castle C/C++). Unfortunately, as far as I can tell there seems to be a bug in the 32bit version of the service call handler. Anyone care to help a fellow coder out by offering us their copy of cmhg? From the changelog:
Changes from 5.30 to 5.37 =========================
Minor tweaks to the veneers, including a bug fix to avoid hitting a bug in StrongARM in the 32-bit veneers. Changed default APCS variant to 32-bit.
Changes from 5.37 to 5.42 =========================
1) Bugfix to APCS-32 generic-veneers and irq-handlers' despatch code: the lr on entry had no PSR bits in it, even when running in 26-bit mode. However, APCS-32 code running on a 26-bit system is permitted to return using MOVS PC,R14 - a notable case where this may happen is where the generic-veneer code has a tail-call optimised branch into the C library, which on a 26-bit system will always enforce lr's flags on exit, resulting in an unintended mode switch.
2) Added comment to the auto-generated header file to warn that the command handler arg_string may not be null-terminated (eg when called from BASIC's OSCLI statement).
3) Added ability to replace the C library module finalisation routine: keyword 'library-finalisation-code:' works much like 'library-enter-code:' and 'library-initialisation-code:'.
So you could well be right. Latest version is 5.42 and produces something along the lines of http://moose.mine.nu:6888/cmhgsvcv.txt |
|
[ Log in to reply ] |
|
Adrian Lees |
Message #57002, posted by adrianl at 01:35, 15/7/2004, in reply to message #56971 |
Member
Posts: 1637
|
One thing I've never "got" is assembly. I guess it's because I've never looked much into it, but I don't understand how to keep track of multiple variables, loops, etc, when all you have is 16 registers to play with. I think all programmers start out writing pretty poor assembler, to be honest. It's probably about the same as the output code of a compiler. With time & practice they learn more about how the processor works, they pick up a bagful of tricks, they become better at register allocation etc.
An experienced assembler programmer will always beat a compiler for at least 2 reasons - (i) you know more about the code than the compiler can safely assume (it /has/ to produce code that works under all circumstances, you can make assumptions that you know hold true). (ii) you can start with the compiler's output and improve it
As a rule of thumb, I reckon on the assembler version of an inner loop being at least twice as fast as the compiler's effort. However, your greatest gains come from improving the algorithm, then optimising the C source as far as possible, before converting to assembler.
Those are just general rules, of course. If you know the instruction set available, you may want to express the code so that, for example, it maps well onto the DSP or MMX instructions that you have at your disposal.
Is Aemulor all assembly? No, about 2/3 assembler, 1/3 C. However, I don't write C modules in the 'normal' way - I use different compiler settings to strip out a lot of overhead from function calls (frame pointers and stack checking - you can't extend your stack anyway in module code!)
And I use different linker settings & my own relocation code so that I can remove the constraints on static data - you can't define the following as global data with the normal build settings - "int b; int *a = &b;" Sometimes that's a pain, but - more importantly - with my altered compiler settings, you don't even get a warning when you do it without realising; then your code starts failing in unpredictable ways and working again when you put in trace statements etc etc!
The only 'downside' is that I can't call any SharedCLibrary routines (but with most of my modules there are other constraints that prevent this anyway!) so I write my own implementations of the few that I need, plus I can tune them for SA/XScale at the same time
How did you learn? By writing modules in assembler. But ARM code was deeply mysterious & wonderful to me in those days, and I was only 15 years old
I didn't have a C compiler until my first commercial project & like you couldn't afford it. Now there are free alternatives, so I guess your best option is to hassle until you've got a working lcc/CMunge setup... and, if you want/need any help/testing let me know.
PS. I actually learned 6502 assembler first, on the Electron/BBC, where you had only 3 general purpose registers! |
|
[ Log in to reply ] |
|
Adrian Lees |
Message #57003, posted by adrianl at 01:46, 15/7/2004, in reply to message #56974 |
Member
Posts: 1637
|
16 registers (in reality 13, because you can't use PC, and you don't want to corrupt the stack register or the return pointer) are usually enough for the scope of a subroutine. Unless it's something complicated like an image processing transform, IDCT, image scaling... then the register set looks woefully small, especially given the (current) lack of MMX-like instructions, forcing you to unpack your bytes/hwords into registers to operate on them independently.
though all those branch instructions are hardly efficient on an ARM processor). On the latest processors, unconditional B and BL instructions are very cheap (single cycle) because the instruction prefetch stage has its own adder and can calculate the target address. The return instruction, however, (MOV PC,R14 for example) has to percolate through the pipeline until its executed, causing the pipeline flush that you're familiar with.
As Simon says - - the Branch Target Buffer can help with conditional branches too. Unfortunately nothing mitigates the cost of loading PC from a register, as MOV PC,R14. Presumably the designers decided this wasn't worth the hassle; technically it's possible, but you need extra logic to check that the already-pipelined instructions don't alter the source register (R14) before the MOV PC,Rm is actually executed. Also you still can't improve the common case of loading the return address from the stack (LDM Rn,{....pc} ) |
|
[ Log in to reply ] |
|
Adrian Lees |
Message #57004, posted by adrianl at 01:50, 15/7/2004, in reply to message #56976 |
Member
Posts: 1637
|
REAL programmers use the PC between instructions for data storage. I use it to set booleans to true
STR PC,bool_value ;store non-zero value to set flag ... .bool_value DCD 0
One of my favourite tricks was a zeroing a register for free by combining it with an earlier load/store, eg.
STR a2,[a1],-a1 MOV pc,lr
when I need to return a1=0. Unfortunately this has been outlawed now (Rn == Rm unpredictable)
[Edited by adrianl at 02:56, 15/7/2004] |
|
[ Log in to reply ] |
|
Adrian Lees |
Message #57005, posted by adrianl at 02:07, 15/7/2004, in reply to message #56998 |
Member
Posts: 1637
|
Latest version is 5.42 and produces something along the lines of http://moose.mine.nu:6888/cmhgsvcv.txt There you see the reason that I don't like the standard 'module in C' build settings!
By just handwriting your own assembler stub that calls your C code, you'll have a lot less mess; for most of your service calls you'll know they're only called in SVC mode, you won't have the [sl,#-5xx] ugliness, you'll know which 1-3 registers carry useful info for your function and you'll be left with about 6-8 instructions instead of 35!
I'm not saying every programmer should, BTW, but I am saying that RISC OS modules in C are very messy and needs rethinking. Maybe we should be using the 'sb' register as for later calling standards. |
|
[ Log in to reply ] |
|
Tony Haines |
Message #57011, posted by Loris at 10:39, 15/7/2004, in reply to message #56971 |
Ha ha, me mine, mwahahahaha
Posts: 1025
|
One thing I've never "got" is assembly. I guess it's because I've never looked much into it, but I don't understand how to keep track of multiple variables, loops, etc, when all you have is 16 registers to play with. Imagine that you are coding in C. Create 15 variables, R0-14. Call these 'registers' You can give them other names, if you want.
You are only allowed to perform operations on R0-14. You can also move variables or array data etc. to or from these registers. But if you want to access an array the array-variable(s) must be held in registers as well.
You can use a register (traditionally R13) as a stack to store other variables, if you want. You can use R14, but it gets corrupted if you call a subroutine, and you musn't use it inside a subroutine unless you preserve its initial value.
Whenever you want to manipulate another variable, first transfer it into a register. When you need more variables, you must free up space by for example stacking another register temporarily. Try to avoid storing and recovering R0-14 where possible. Sometimes you might find that you are juggling intermediate results and 'variables' are held in different registers at different points in your code.
Does that help?
[Edited by Loris at 11:42, 15/7/2004] |
|
[ Log in to reply ] |
|
John D |
Message #57018, posted by john at 13:02, 15/7/2004, in reply to message #56994 |
Member
Posts: 261
|
Can't you use CMunge? I don't think CMunge is 32 bit compatible. At least, when I tried it, it didn't work. :| CMunge 0.44 works here to generate fine 32bit code AFAICT. However I can't find anywhere you can get it from, version 0.42 is available from RISC OS Ltd's support web site. |
|
[ Log in to reply ] |
|
Adrian Lees |
Message #57027, posted by adrianl at 16:34, 15/7/2004, in reply to message #57011 |
Member
Posts: 1637
|
Imagine that you are coding in C. Create 15 variables, R0-14. Call these 'registers' typedef unsigned uint;
uint __rt_udiv(uint r0, uint r1) { uint r2, r3, r12;
r2 = r0; if (!r2) return r0;
r12 = 0x80000000U; if (r1 < r12) r12 = r1;
l00: if (r12 <= r2) goto l07; if (r12 <= (r2 << 1)) goto l06; if (r12 <= (r2 << 2)) goto l05; if (r12 <= (r2 << 3)) goto l04; if (r12 <= (r2 << 4)) goto l03; if (r12 <= (r2 << 5)) goto l02; if (r12 <= (r2 << 6)) goto l01; if (r12 > (r2 << 7)) { r2 <<= 8; goto l00; }
l08: if (r1 >= (r2 << 7)) { r1 -= (r2 << 7); r3 += r3 + 1; } else r3 += r3; l01: if (r1 >= (r2 << 6)) { r1 -= (r2 << 6); r3 += r3 + 1; } else r3 += r3; l02: if (r1 >= (r2 << 5)) { r1 -= (r2 << 5); r3 += r3 + 1; } else r3 += r3; l03: if (r1 >= (r2 << 4)) { r1 -= (r2 << 4); r3 += r3 + 1; } else r3 += r3; l04: if (r1 >= (r2 << 3)) { r1 -= (r2 << 3); r3 += r3 + 1; } else r3 += r3; l05: if (r1 >= (r2 << 2)) { r1 -= (r2 << 2); r3 += r3 + 1; } else r3 += r3; l06: if (r1 >= (r2 << 1)) { r1 -= (r2 << 1); r3 += r3 + 1; } else r3 += r3; l07: if (r1 >= r2) { r1 -= r2; r3 += r3 + 1; } else r3 += r3;
if (r0 <= (r2 >> 1)) { r2 >>= 8; goto l08; }
r0 = r3; return r0; }
This is hardly stretching the compiler's 'brains' but at least Norcroft doesn't bog up this code. The C language forces you to duplicate the r3 = r3 + r3 calculations in the latter half (unless you want to define a carry flag as c = (a >= b) I suppose) but Norcroft does at least manage to recombine these to form:
MOV r3,r3,LSL #1 ADDCS r3,r3,#1
It unfortunately fails to then spot that those two instructions can be replaced by ADC r3,r3,r3. |
|
[ Log in to reply ] |
|
Jeffrey Lee |
Message #57041, posted by Phlamethrower at 21:37, 15/7/2004, in reply to message #56998 |
Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot stuff
Posts: 15100
|
Changes from 5.30 to 5.37 ...
So you could well be right. Latest version is 5.42 and produces something along the lines of http://moose.mine.nu:6888/cmhgsvcv.txt Cheers for that
Could you email me that cmhgsvcv.txt or something? Your site seems to be down at the moment.
However, your greatest gains come from improving the algorithm, then optimising the C source as far as possible, before converting to assembler. Yup, that's what Michael Abrash says so it must be true
If you've got broadband then downloading his (now free) Graphics Programming Black Book from somewhere might be a good idea. Of course all the assembler stuff is x86-based, and the VGA programming stuff is complete madness, but there's plenty of other great stuff there - especially if you're writing a 3D game
REAL programmers use the PC between instructions for data storage. I use it to set booleans to true Bah, *real* programmers would use the PC to store data all the time. PSR flags in 26-bit mode, anyone?
Can't you use CMunge? I don't think CMunge is 32 bit compatible. At least, when I tried it, it didn't work. CMunge 0.44 works here to generate fine 32bit code AFAICT. However I can't find anywhere you can get it from, version 0.42 is available from RISC OS Ltd's support web site. Ah - I only have 0.32
I'm not sure which version Simon was using, but it didn't work properly on his Iyonix when he tried it. Unfortunately newer versions of CMunge aren't likely to appear, because Justin Fletcher removed all of his software from his site after some people sent him nasty emails following the Castle-GPL dispute (Plus movspclr.co.uk seems to have disappeared entirely now)
If CMunge doesn't work then I'll write my own alternative.
[Edited by Phlamethrower at 22:44, 15/7/2004] |
|
[ Log in to reply ] |
|
JMB |
Message #57043, posted by jmb at 21:54, 15/7/2004, in reply to message #57041 |
Member
Posts: 467
|
Changes from 5.30 to 5.37 ...
So you could well be right. Latest version is 5.42 and produces something along the lines of http://moose.mine.nu:6888/cmhgsvcv.txt Cheers for that
Could you email me that cmhgsvcv.txt or something? Your site seems to be down at the moment. Try http://81.86.244.131:6888/cmhgsvcv.txt then. /me slaps dyndns' naff dns propogation. |
|
[ Log in to reply ] |
|
Jeffrey Lee |
Message #57044, posted by Phlamethrower at 22:04, 15/7/2004, in reply to message #57043 |
Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot stuff
Posts: 15100
|
Try http://81.86.244.131:6888/cmhgsvcv.txt then. /me slaps dyndns' naff dns propogation. Nope - still not working. There's a chance it's to do with my uni's proxy server, but I've never had problems with it before.
CMunge 0.42 can be found here, in StubsG. |
|
[ Log in to reply ] |
|
JMB |
Message #57045, posted by jmb at 22:25, 15/7/2004, in reply to message #57044 |
Member
Posts: 467
|
Try http://81.86.244.131:6888/cmhgsvcv.txt then. /me slaps dyndns' naff dns propogation. Nope - still not working. There's a chance it's to do with my uni's proxy server, but I've never had problems with it before. Probably doesn't like the port number http://www.ecs.soton.ac.uk/~jmb202/riscos/cmhgsvcv.txt |
|
[ Log in to reply ] |
|
Jeffrey Lee |
Message #57046, posted by Phlamethrower at 22:49, 15/7/2004, in reply to message #57045 |
Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot stuff
Posts: 15100
|
Probably doesn't like the port number Probably not, but sites using port 8080 work fine
http://www.ecs.soton.ac.uk/~jmb202/riscos/cmhgsvcv.txt Yay! |
|
[ Log in to reply ] |
|
Adrian Lees |
Message #57049, posted by adrianl at 02:21, 16/7/2004, in reply to message #57041 |
Member
Posts: 1637
|
Yup, that's what Michael Abrash says so it must be true
A few times I've handtuned some assembler code only to prototype a new idea in C and found that it blows the asm code away!
If you've got broadband then downloading his (now free) Graphics Programming Black Book.. Hey, I bought a copy of that a few years ago. Great doorstop, er, I mean book
There's some interesting stuff in there, you're right (well, I wouldn't have bought it otherwise, would I?!)... but the Pentium U-V pipeline stuff makes me want to run straight to Intel's office and beat the designers over the head with Abrash's book! They must get paid per transistor used.
It almost turns code optimisation into a trial-and-error process At least with ARM you can still, more or less, count instructions. |
|
[ Log in to reply ] |
|
Pages (2): |< <
2
|
Acorn Arcade forums: Programming: Are you bored? |
|