|
Malloc damnation |
|
VincceH (20:12 16/4/2008) Phlamethrower (21:10 16/4/2008) VincceH (22:46 16/4/2008) VincceH (23:48 16/4/2008) Phlamethrower (23:53 16/4/2008) VincceH (10:05 17/4/2008) VincceH (10:09 17/4/2008) VincceH (10:25 17/4/2008) Phlamethrower (11:02 17/4/2008) VincceH (12:16 17/4/2008) VincceH (17:03 17/4/2008) VincceH (18:39 17/4/2008) VincceH (23:10 17/4/2008) adamr (08:47 19/4/2008) VincceH (13:11 19/4/2008) VincceH (21:56 19/4/2008) VincceH (22:01 19/4/2008) Phlamethrower (00:18 20/4/2008) VincceH (08:34 20/4/2008) VincceH (10:18 20/4/2008) VincceH (10:14 17/4/2008)
|
|
VinceH |
Message #107064, posted by VincceH at 20:12, 16/4/2008 |
Lowering the tone since the dawn of time
Posts: 1600
|
So.
WebChange's script interpreter was working fine last night.
Tonight, I linked in the source file for the search and replace code. Nothing in this source file is being called as yet. No other part of the sources have been touched, other than to add:
#include "CM_Replacer.c"
Now the script interpreter is complaining that there's an unrecoverable error in run time system. Not enough memory, malloc failed (heap overwritten). Oh joy.
The original error was happening on a fgets, where the script file was read in and processed one line at a time. That's now been rewritten to load the whole script into memory, and extract a line at a time. And is now working. Yay.
Except that something *else* is now broken, with the same problem. I've tracked it down to a calloc in another function. (All of which, incidentally, are called via a veneer that should report an more user programmer friendly error if they fail - ie which calloc is failing.)
Tracking through the functions as they're called up to that point, and AFAICS for every malloc or calloc there's a free; they're all correctly matched and nested.
So basically, it's BROKEN and ATM I'm hitting my head against a brick wall because I can't find the problem.
GAAAAAAH! |
|
[ Log in to reply ] |
|
Jeffrey Lee |
Message #107065, posted by Phlamethrower at 21:10, 16/4/2008, in reply to message #107064 |
Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot stuff
Posts: 15100
|
Do your wrappers add sentinels to the allocations? If you store all your allocations in a linked list you could check for heap corruption yourself and narrow the search area until you find the problem.
I've recently written some malloc/free/etc. wrappers myself that can be used to track allocations and look for buffer overruns. I've attached a copy of the code below, so you could try using that if you don't feel like writing your own. |
|
[ Log in to reply ] |
|
VinceH |
Message #107066, posted by VincceH at 22:46, 16/4/2008, in reply to message #107065 |
Lowering the tone since the dawn of time
Posts: 1600
|
No, it's just a very simple function that's called with the same arguments as calloc, but with one extra: the error message to report if the calloc fails. Just founded on lazyitis.
The script interpreter as it stands works on pretty much the same principles as before1, complete with an utterly ugly script language. My intention is to rewrite the script language itself, for a few reasons, the changes I have in mind2 mean a fundemental rewrite of the interpreter - and will in turn affect the front end.
So I was going to wait until after Wakefield - the idea being to have it working as is for the show. However, perhaps I should concentrate on that fundemental change now. Given this bug is proving to be a major headache, why bother to track it down only to start a complete rewrite of the problem code in a couple of weeks?
Provided I can spend some time over the next couple of days to spec up the new language, I can probably thrash up the new interpreter over the weekend easily enough - but doing so limits the time I have to properly test the replace code (which is what I added tonight - and remains untested because this problem appeared) and then bolt on the extended replace functions.
The upshot is that it's going to be slightly less complete than I'd hoped for demonstrating at Wakefield this year. (I don't think I'd have been selling it anyway just yet, although it was remotely possible - but now it's definite that I won't.)
1. On this one, I cheated in order to save time, and dug out a backup of my ancient source code to base it on3.
2. Some of which come from my thinking a little more about a C rewrite of Trellis, funnily enough.
3. Ironically, doing this meant I spotted the root cause of an ancient, long standing bug and fixed it. How pointless was that, then? |
|
[ Log in to reply ] |
|
VinceH |
Message #107067, posted by VincceH at 23:48, 16/4/2008, in reply to message #107066 |
Lowering the tone since the dawn of time
Posts: 1600
|
Hmm.
In the interests of science, I've just recompiled the front end... and running it now results in a NSFTH.
That's intriguing.
Why?
The datestamp on the last compiled front end is newer than the datestamp on any of the front end source files. So the code hasn't changed. The previously compiled front end runs. The newly compiled front end doesn't.
This would appear to indicate that I've broken my library code, which has changed - and my first instinct would be to suggest the change I made to it tonight, given that the problem first occured tonight.
However, there are two flaws with that theory:
Firstly, the change I made tonight was after the problem reared its ugly head, and was a part of the change to run the script from memory instead of disk (ie the solution to the initial occurence).
Secondly, the only changes before that also predate the last successful compilation of the front end.
Which combined makes me want to start panicking. Plus I've a long day tomorrow so I should already be in bed. |
|
[ Log in to reply ] |
|
Jeffrey Lee |
Message #107068, posted by Phlamethrower at 23:53, 16/4/2008, in reply to message #107067 |
Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot stuff
Posts: 15100
|
Are you sure it's not something silly like using an uninitialised variable/buffer? |
|
[ Log in to reply ] |
|
VinceH |
Message #107072, posted by VincceH at 10:05, 17/4/2008, in reply to message #107068 |
Lowering the tone since the dawn of time
Posts: 1600
|
With the front end, I'm pretty damned sure it must be - It looks like there's something up with the global vars I use for window and icon handles. I've tracked the NSFTH down to the choices loading, the point where the options are selected in the main window. However, if I fudge it so that the first choice is ignored, the second one triggers it... fudge it to ignore that, the third one triggers it... and so on.
Which suggests some/many/all all of the globals are either unset or being trampled on - but following it through, no new code has been called by the NSFTH. AFAICS it's all old, previously working code.
On the bright side, that both the front and back end are affected (though slightly differently) points at the library code, so I can concentrate on that - perhaps I've made a minor mod somewhere that I've forgotten about and which is brokened)
Now lets watch as my pda won't post to the forums.. |
|
[ Log in to reply ] |
|
VinceH |
Message #107073, posted by VincceH at 10:09, 17/4/2008, in reply to message #107072 |
Lowering the tone since the dawn of time
Posts: 1600
|
Now lets watch as my pda won't post to the forums.. It did, though, so at least something's working today. I might have been thinking of my old N70. |
|
[ Log in to reply ] |
|
VinceH |
Message #107074, posted by VincceH at 10:14, 17/4/2008, in reply to message #107072 |
Lowering the tone since the dawn of time
Posts: 1600
|
Now lets watch as my pda won't post to the forums.. It did, though, so at least something's working today. I might have been thinking of my old N70. |
|
[ Log in to reply ] |
|
VinceH |
Message #107077, posted by VincceH at 10:25, 17/4/2008, in reply to message #107073 |
Lowering the tone since the dawn of time
Posts: 1600
|
I've just realised that the datestamps on my source files are a red herring - When I was testing the 'updater' code, I was changing the system date to random ones in the past as a quick and easy way to give files old dstes - so if I modified or added any functions at the same time... bah!
(I didn't touch the front end at all while doing that, so it still looks like broken library code to me, so I'll have a careful plod through everything that's called up to the NSFTH tonight and see if I can spot a brokened bit which looks like a trampler) |
|
[ Log in to reply ] |
|
Jeffrey Lee |
Message #107079, posted by Phlamethrower at 11:02, 17/4/2008, in reply to message #107077 |
Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot stuff
Posts: 15100
|
I've just realised that the datestamps on my source files are a red herring - When I was testing the 'updater' code, I was changing the system date to random ones in the past as a quick and easy way to give files old dstes - so if I modified or added any functions at the same time... bah! In that case... are you sure your makefile isn't getting confused and not rebuilding some of the source files? |
|
[ Log in to reply ] |
|
VinceH |
Message #107080, posted by VincceH at 12:16, 17/4/2008, in reply to message #107079 |
Lowering the tone since the dawn of time
Posts: 1600
|
I (still) don't use makefiles - I've always just compiled using the !cc front end by dragging in the first/main file which #includes any others that are needed.
However, I wonder if the compiler itself looks at any date stamps (in O?) and gets confused?
Another oddity I noticed now I'm thinking about it is that the newer (brokened) front end binary is slightly smaller than the older (unbrokened) one. If anything, it should be a bit bigger. A stray // or ten somewhere, perhaps.
Hmm, or a whole chunk of code enclosed in /* */ which I've accidentally made StrongEd do a couple of times - but I usually spot it when it happens and correct. (It's actually a mouse problem, with a click often 'following through' and a second click, which hasn't happened, being acted upon) |
|
[ Log in to reply ] |
|
VinceH |
Message #107081, posted by VincceH at 17:03, 17/4/2008, in reply to message #107080 |
Lowering the tone since the dawn of time
Posts: 1600
|
Well it's definitely not a datestamp issue, anyway. I've just re-stamped everything with today's date and re-compiled (library first, then the front end, then the back end) and all problems remain.
So it's plod through the code time. |
|
[ Log in to reply ] |
|
VinceH |
Message #107082, posted by VincceH at 18:39, 17/4/2008, in reply to message #107081 |
Lowering the tone since the dawn of time
Posts: 1600
|
I think I've cracked it.
Once upon a time, there were three little pointers, which were initialised via a calloc, into which resulting space some information was parsed as it was read from certain files.
Somewhere over the last few days, I decided the information they held would be useful in a number of other places (not least, in both the front and back ends) so I moved the relevant functions into the shared code, and moved the pointers into the global variable space.
Hey presto - that brokened both the front and back ends in bizarre and unpredictable ways.
Of course, in the process of finding it, I've changed a few other things as I went (eg tidying up function names such that I can see which file they're in from the prefix, instead of saying "that one is probably in this file...") - which means I'll have brokened lots of other stuff in the process. But hey, at least these are breakendages that will be highlighted at the compilation stage...
So, yay! |
|
[ Log in to reply ] |
|
VinceH |
Message #107083, posted by VincceH at 23:10, 17/4/2008, in reply to message #107082 |
Lowering the tone since the dawn of time
Posts: 1600
|
I think I've cracked it. Sort of, anyway. The front end now works as it did before - but the back end is still falling over. I suppose I shouldn't be surprised only one is fixed, given that they were failing in different ways (NSFTH in the front end versus malloc failed, heap overwritten in the back end).
But having the front end up and running again is A Good Thingtm - it gets me out of panic mode. |
|
[ Log in to reply ] |
|
Adam |
Message #107090, posted by adamr at 08:47, 19/4/2008, in reply to message #107080 |
Member
Posts: 112
|
I (still) don't use makefiles - I've always just compiled using the !cc front end by dragging in the first/main file which #includes any others that are needed. Doesn't this mean that every time you make a change to one file in your app you have to sit around waiting for every single source file to be compiled? Can anyone say if CC provides a better way of managing a project?
Also, for malloc/free checking, you could use: http://www.geocities.com/siliconvalley/horizon/8596/fortify.html
...which works a treat for me.
Finally, what's NSFTH?
Adam |
|
[ Log in to reply ] |
|
VinceH |
Message #107094, posted by VincceH at 13:11, 19/4/2008, in reply to message #107090 |
Lowering the tone since the dawn of time
Posts: 1600
|
I (still) don't use makefiles - I've always just compiled using the !cc front end by dragging in the first/main file which #includes any others that are needed. Doesn't this mean that every time you make a change to one file in your app you have to sit around waiting for every single source file to be compiled? Yes, but sitting around waiting is hardly arduous for an application the size of WebChange. It's not as if having started it compiling there's time to make a cup of tea - it's more a case that having started it compiling, there's time to think it's time to make a cup of tea. And once that thought has been thought, the job is done.
If it was something the size of Firefox, it would be a different fettle of kish.
Can anyone say if CC provides a better way of managing a project? Simply making appropriate use of libraries helps a great deal - once you have a function that you know works (er... but see below), and might be useful elsewhere, you can build it into a library and out of the program sources. The library only needs recompiling when you make a change.
Also, for malloc/free checking, you could use: http://www.geocities.com/siliconvalley/horizon/8596/fortify.html
...which works a treat for me. Maybe I should - but I've found the problem now, and it was incredibly stupid - and has cost me several days of programming work, with barely a week until the show and with the program not only unfinished, but now further from being finished than it was when the problem hit.
For some reason, in my standard library calloc function, I was doing the following (all but the important bits not typed in here because I'm lazy): // function declaration, variables, etc snipped address=calloc(space,of); // reaction to the allocation failing snipped // here's the incredibly stupid bit: address[space]=0;
That last line is a recent addition AFAICS (my backup from a couple of weeks ago doesn't include it) and breaks even the simplest of programs - and this one is far from simple. If the function allocates space for a character array (my most common use) then the last line is writing a zero to the first byte after that array.
eg: space=1000, of=1 it allocates address[0] to address[999] and then writes a zero to address[space] - which is address[1000]
Doh!
And in my code trawling, until 20 minutes ago, the one function I wasn't checking was the library function that actually performed the calloc, because I didn't expect to find a line which actually wrote to the allocated memory (well, just above it) in that function.
I can't think why I added it - it serves no purpose anyway, assuming I meant to zero the last byte because calloc zeros the allocated space anyway. I can only guess that I might have added it after a tipple or three and was perhaps not as clear headed as I should have been.
Finally, what's NSFTH? "No stack for trap handler" - it's one of RISC OS's less friendly and helpful error messages, and clearly means "PROGRAMMER = TWONK" |
|
[ Log in to reply ] |
|
VinceH |
Message #107096, posted by VincceH at 21:56, 19/4/2008, in reply to message #107094 |
Lowering the tone since the dawn of time
Posts: 1600
|
Woohoo!
I've now pretty much undone everything that came of my fast cure for excess hair (removed things I'd added in my efforts to find the problem, replaced things I'd removed for the same reason - and spotted a couple of minor improvements here and there) and it's now back to where it was when my idiotic addition to my calloc function reared up and bit me.
At that point, I had just added the basic search and replace code, and had hit the button (actually, run the obey file) to give it its first test.
That was approximately three days and one hour ago - so today, three days and one hour later, I've been able to hit that same button (actually, run that same obey file) again to give that code its (same) first test.
And given that everything went pear shaped on the 16th - I'm pleased to say the first test of the replace code was a complete and utter success. It worked faultlessly.
The downside, of course, is that I'm three and a half days behind on where I should be with it, given that the show is next week. And tomorrow is the only full day I have.
Expect me to be very twitchy and probably completely bald by Saturday.
Ho hum. |
|
[ Log in to reply ] |
|
VinceH |
Message #107097, posted by VincceH at 22:01, 19/4/2008, in reply to message #107096 |
Lowering the tone since the dawn of time
Posts: 1600
|
And sod it if I'm not going to lose a chunk of tomorrow as well, because I've now decided that I'm going to have a hangover tomorrow morning.
I deserve one!
Or two.
Or three.
Or however many it takes.
Cheers! |
|
[ Log in to reply ] |
|
Jeffrey Lee |
Message #107098, posted by Phlamethrower at 00:18, 20/4/2008, in reply to message #107097 |
Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot Hot stuff
Posts: 15100
|
Wasn't being drunk the cause of your problem anyway? |
|
[ Log in to reply ] |
|
VinceH |
Message #107099, posted by VincceH at 08:34, 20/4/2008, in reply to message #107098 |
Lowering the tone since the dawn of time
Posts: 1600
|
Probably.
Although I suspect more a case of 'slightly tipsy' than drunk. I'm sure that if I was just drunk, it would have been more like
adrdessp[ace]=-0';
rather than address[space]=0;
And then it just wouldn't have compiled and I'd have found the problem.
(That said, I have found that when my programming is done under the influence, I just don't make the typos I do when typing under the same influence in other contexts. So maybe drunk is the right word.)
Anyway, no hangover, so I can do stuff this morning - although I might pretend I have one just to get away from the screen for a few hours. TBH, after the last few days I'm sick of looking at WebChange's source code, which was the point of opening a bottle last night! |
|
[ Log in to reply ] |
|
VinceH |
Message #107100, posted by VincceH at 10:18, 20/4/2008, in reply to message #107099 |
Lowering the tone since the dawn of time
Posts: 1600
|
Anyway, no hangover, so I can do stuff this morning - although I might pretend I have one just to get away from the screen for a few hours. TBH, after the last few days I'm sick of looking at WebChange's source code, which was the point of opening a bottle last night! And I then promptly went on to apply some wildcard tests to the search and replace function, discovered that they did something nasty to the html files, and spent the last hour or so trying to track down the bug.
It's taken me just over an hour to realised that the wildcard replace code was actually working faultlessly the whole time. The 'problem' was my search term - and is actually something I've warned users about since I first added wildcards all those years ago.
Consider:
My trading name is Soft Rock Software.
My test data was a copy of my site.
So the words "Soft Rock Software" appear in numerous places on the pages, as do links and references to "softrock".
My wildcard search term was "Sof*ock"
DOH!
The first match was fine: "Soft Rock" in the title tag.
The second match had wiped a chunk out - everything from just after the first match to the tail end of a link.
It had (quite correctly) matched the "Sof" in "Software" to the "ock" in a link to "softrock.co.uk", with the wildcard matching (and therefore wiping out) everything in between.
I definitely need a break. |
|
[ Log in to reply ] |
|
|