|
#91
|
|
|
|
|
Old Wolf wrote:
> Richard Heathfield <r...@see.sig.invalid> wrote: > .... snip ... > >> Let me give you an example from ordinary English, where >> whitespace delimiters are not sufficient: >> >> "What did he say?", said Albert. >> "He just said, 'I'll be there', I think", replied the captain. >> > > Now, consider the whitespace-separated tokens: > > A bit sidetracked from the original thread, but is > there actually any problem here besides identifying > whether a ' symbol is a quote mark or an apostrophe? And I gather you consider that a trivial problem? Please describe your algorithm. |
|
|
|
#92
|
|
|
|
|
jameskuyper wrote:
> CBFalconer wrote: > > His question was basically about how to translate the C++ algorithm to > C. So what you're saying is that he must answer his own question > before he can ask it here? I'm curious, where do you think he should > go to get help with the translation, since you've ruled out coming > here for help with it; and C++BFalconer would presumably rule out > going to clc++ for such a question? And when he finally does ask it, > according to you, his question is required to take the form "How do I > translate this algorithm {algorithm already translated into C}, into > C?". That's patently ridiculous. You certainly make a good point. |
|
#93
|
|
|
|
|
Keith Thompson wrote:
> CBFalconer <cbfalconer> writes: > > Yes, given the definition above, this string: > > " " > > contains two "words". Are you suggesting that that's a problem? I didn't specify a string. I meant those characters contiguous (i.e. one strictly following the other) in the input stream. The detection I specified above can be done with one char look ahead. The presence (and necessity) of such a look ahead scheme may not be obvious to the casual reader. In C it revolves around the ungetc() function. > > Obviously a program that's intended to recognize C identifiers would > have to use a different rule. But the OP didn't say anything about C > identifiers, so I'm not sure why you're bringing them up. > > Incidentally, on my initial reading of your followup, I thought your > use of the word "contiguous" was meant to be related to the use in > arnuld's definition of "word" (the one I had suggested earlier). In > fact, they're quite different; in the definition of "word" it refers > to the characters being adjacent in the input, not to their numeric > representations. A more careful reading of what you wrote indicates > that you just meant that the notation 'a'..'z' doesn't make sense > unless the representations of those characters are numerically > contiguous. I thought I should point this out in case anyone else is > confused. Right. I should have specified 'the values of the chars are contiguous'. The point being that ASCII works fine, but EBCDIC doesn't. The C lexer will be a good example, because what it has to detect is well defined. |
|
#94
|
|
|
|
|
> On Mon, 15 Sep 2008 14:43:36 +0100, Ben Bacarisse wrote:
> I was being a bit vague. Lets leave actual array pointers out of > this. I mean that Richard was talking about changing the char ** as > seen from the calling function. The thing you are intending to pass, > a char **, is in some sense a pointer to the whole array: from it all > of the array's data is accessible. The trouble is you can can't > change this char ** inside the function -- not in a way that has any > effect outside. All you can do is change the various things it points > to. > ...SNIP.... > If a function needs to change an int, you pass an int *. If it needs > to change int *, you pass an int **. If it needs to change and int ** > you must pass an int ***. > ... SNIP.... > Typo! I meant you *can't* write any value into **ppc! Sorry. There > are two typos, I now see. It should have read: "*ppc is NULL -- you > set it to be NULL before the call. You can't write any value into > **ppc." see my new post titled "pointers passed by copying ?" |
|
#95
|
|
|
|
|
Old Wolf said:
> On Sep 15, 4:50 pm, Richard Heathfield <r...@see.sig.invalid> wrote: > > A bit sidetracked from the original thread, but is > there actually any problem here besides identifying > whether a ' symbol is a quote mark or an apostrophe? I think it's about here that I like to pretend I'm from Missouri. Show me. |
|
#96
|
|
|
|
|
On Sep 16, 3:43 pm, CBFalconer <cbfalco> wrote:
> Old Wolf wrote: > > Richard Heathfield <r...@see.sig.invalid> wrote: > > >> "He just said, 'I'll be there', I think", replied the captain. > > > A bit sidetracked from the original thread, but is > > there actually any problem here besides identifying > > whether a ' symbol is a quote mark or an apostrophe? > > And I gather you consider that a trivial problem? Please describe > your algorithm. Not at all, I was just checking that there wasn't some other problem besides this one, that I hadn't seen. |
|
#97
|
|
|
|
|
Old Wolf said:
> On Sep 16, 3:43 pm, CBFalconer <cbfalco> wrote: > > Not at all, I was just checking that there > wasn't some other problem besides this one, > that I hadn't seen. Hyphens are another issue: "will-o'-the-wisp" illustrates where both the hyphen and the apostrophe are part of the word, but there are situ- ations where the hyphen (and newline) are not part of the word, just as there are situations where 'apostrophes' are not part of the word. Then there's the whole issue of "what is an alphabetic character"? If we simply say A-Za-z, we exclude a vast range of words from languages such as French, German, Spanish, Polish, and Russian. I'm not saying we shouldn't do that, but we should be aware that the decision is costly in terms of internationalisation. Is 'C++' a word? How about 'G#m'? You might or might not consider that to be a word, but a musician might. And yet they may have a very different opinion about 'H#m'. What about numbers? Is 42 a word? How about 3Com? Is the copyright symbol a word? What about the trademark and registered trademark symbols? Can they be part of a word? Consider, for example, Microsoft<sup>(R)</sup>. How about full stops (or 'periods' as some people call them)? Consider: "U.S.A.", "B.B.C.", "etc.", etc. What about &? Is that a word? To any one of these questions, you may say, "yes, that's allowable as part of a word", or you may say, "no, it's not allowable". But your decision may well differ from someone else's decision. And having decided, how do you design your algorithm so that it accepts "fo'c'sle" as one word rather than three? A dictionary? If you're going to do /that/, the algorithm is indeed trivial (modulo bugs): 1. start with s = "" and an empty word list 2. c = getch 3. if EOF continue from 8. 4. s += c 5. if s in dictionary continue from 2. 6. else s -= c. if s != "" add s to word list s = c 7. continue from 2. 8. if s != "" add s to word list 9. stop but now you have to list in your dictionary every single character combination that you consider to be a word. Big dictionary. (For a start, every word will need at least three entries: "word", "Word", "WORD".) The dictionary approach is clumsy in the extreme, and the algorithmic approach gets more and more difficult as you get pickier and pickier about what does and what does not constitute a word. |
|
#98
|
|
|
|
|
On Sep 17, 10:02 am, Richard Heathfield <r...@see.sig.invalid> wrote:
> but now you have to list in your dictionary every single character > combination that you consider to be a word. Big dictionary. (For a start, > every word will need at least three entries: "word", "Word", "WORD".) > > The dictionary approach is clumsy in the extreme, and the algorithmic > approach gets more and more difficult as you get pickier and pickier about > what does and what does not constitute a word. Surely there is no approach other than using a sophisticated dictionary. For example: 'Tis the season to be playin' there is no rule to deduce whether we have quote marks or apostrophes, besides knowing that 'Tis is a word. The dictionary can includes rules such as the fact that if "abcd" is a word, then so is "Abcd"; it can know that acronyms can be written with periods, and so on. Now where it gets harder is if you have to accept text from people who make spelling mistakes and typoes :) |
|
#99
|
|
|
|
|
Old Wolf said:
> On Sep 17, 10:02 am, Richard Heathfield <r...@see.sig.invalid> wrote: >> but now you have to list in your dictionary every single character >> combination that you consider to be a word. Big dictionary. (For a >> start, every word will need at least three entries: "word", "Word", >> "WORD".) >> >> The dictionary approach is clumsy in the extreme, and the algorithmic >> approach gets more and more difficult as you get pickier and pickier >> about what does and what does not constitute a word. > > Surely there is no approach other than using > a sophisticated dictionary. Yes, there is. There is the "good enough for Professor Jenkins[1]" approach, in which we define "word" as non-empty contiguous sequence of non-whitespace characters delimited on the left by SOF or whitespace and on the right by EOF or whitespace. This is not only good enough for Professor Jenkins[1] but frequently good enough in the Real World, too. Not that the Real World has any bearing, but I just thought I'd mention it. [1] cf Gary Larson (the one with the duck) |
|
#100
|
|
|
|
|
> On Mon, 15 Sep 2008 09:28:36 +0000, Richard Heathfield wrote:
> ...SNIP... > But what if something goes wrong? You'll need to be able to report an > error. The natural way to do this is via a return value, which means we > can't use that value for either the list or the count, and that leads us > to: what we will do with that return value ? If something wrong occurs I can simply exit the program telling the user that he did some thing stupid and he is responsible for that. > int get_words(char ***, size_t *); > > Since they don't need to modify the caller's status, sort_words and > print_words can be of type int(char **, size_t). I think there is qsort in std. lib. , hence we can use that but I don't know whether it modifies the original array or not. > Up to you, but I wouldn't bother setting a limit (or, if I did, I'd set it > at a million or so, and treat any string longer than that as a reportable > error). With dynamic allocation, you don't /need/ to set a limit; you > simply allocate as you go, and reallocate if necessary. okay, I will write the program in parts. First we will write a simple program that will ask the user to input and we will store that word dynamically using calloc in some array. It will be called get_single_word and it will form the basis of get_words function which will store all words in an array. get_single_word returns an int because I want to use get_single_word in get_words like this: while( get_single_word ) { /* code for get_words */ } Here is my code for get_single_word. PROBLEM: it does not print anything I entered: /* a program to get a single word from stdin */ #include <stdio.h> #include <stdlib.h> enum { AVERAGE_SIZE = 28 }; int get_single_word( char* ); int main( void ) { char* pw; /* pw means pointer to word */ get_single_word( pw ); printf("word you entered is: %s\n", pw); return 0; } int get_single_word( char* pc ) { int idx; int ch; char *pc_begin; pc = calloc(AVERAGE_SIZE-1, sizeof(char)); pc_begin = pc; if( (! pc) ) { perror("can not allocate memory, sorry babe!"); return 1; } for( idx = 0; ( (ch = getchar()) != EOF ); ++idx, ++pc ) { if( AVERAGE_SIZE == idx ) { /* use realloc here which I have no idea how to write */ } *pc = ch; } *++pc = '\0'; free(pc_begin); return 0; } =================== OUTPUT ================== [arnuld@dune ztest]$ gcc -ansi -pedantic -Wall -Wextra test.c [arnuld@dune ztest]$ ./a.out like word you entered is: [arnuld@dune ztest]$ |
|
#101
|
|
|
|
|
> On Wed, 17 Sep 2008 10:07:59 +0500, arnuld wrote:
> .... SNIP... > Here is my code for get_single_word. PROBLEM: it does not print anything > I entered: > .... SNIP... I have even tried using pointer to pointer but that still leaves me with the same problem: int main( void ) { char* pw; /* pw means pointer to word */ get_single_word( &pw ); printf("word you entered is: %s\n", pw); return 0; } int get_single_word( char** pc ) { int idx; int ch; char *pc_begin; *pc = calloc(AVERAGE_SIZE-1, sizeof(char)); pc_begin = *pc; if( (! *pc) ) { perror("can not allocate memory, sorry babe!"); return 1; } for( idx = 0; ( (ch = getchar()) != EOF ); ++idx, ++*pc ) { if( AVERAGE_SIZE == idx ) { /* use realloc here which I have no idea how to write */ } **pc = ch; } *++pc = '\0'; free(pc_begin); return 0; } |
|
#102
|
|
|
|
|
On Tue, 16 Sep 2008 06:28:30 +0000, Richard Heathfield posted:
> Old Wolf said: >> I think it's about here that I like to pretend I'm from Missouri. > > Show me. As it polls redder with the Palin nomination, Huck sighed, 'Ashcroft sucks." |
|
#103
|
|
|
|
|
arnuld said:
>> On Mon, 15 Sep 2008 09:28:36 +0000, Richard Heathfield wrote: >>> ...SNIP... > >> But what if something goes wrong? You'll need to be able to report an >> error. The natural way to do this is via a return value, which means we >> can't use that value for either the list or the count, and that leads us >> to: > > what we will do with that return value ? If something wrong occurs I can > simply exit the program telling the user that he did some thing stupid > and he is responsible for that. Yes, you could do that, except that (a) it might not be the user's stupid fault (it may simply be that your machine is low on memory), and (b) there may be a way to recover. If this is a mere learning exercise and the learning task is not error recovery, then yes, by all means bomb out. That's the "student solution" and, like cryptosporidium, is very common. >> int get_words(char ***, size_t *); >> >> Since they don't need to modify the caller's status, sort_words and >> print_words can be of type int(char **, size_t). > > I think there is qsort in std. lib. , hence we can use that but I don't > know whether it modifies the original array or not. It does modify the original array (by sorting it, would you believe?), but it won't modify the *pointer*, the one that indicates the location of the first element of the array. >> Up to you, but I wouldn't bother setting a limit (or, if I did, I'd set >> it at a million or so, and treat any string longer than that as a >> reportable error). With dynamic allocation, you don't /need/ to set a >> limit; you simply allocate as you go, and reallocate if necessary. > > okay, I will write the program in parts. First we will write a simple > program that will ask the user to input and we will store that word > dynamically using calloc in some array. It will be called get_single_word > and it will form the basis of get_words function which will store all > words in an array. Good. This sounds like functional decomposition - always a good way to start off. > get_single_word returns an int because I want to use > get_single_word in get_words like this: > > while( get_single_word ) > { > /* code for get_words */ > } Presumably that's pseudocode, and you intend get_single_word to be a function call, and the "code for get_words" consists of inserting into an array the word retrieved by get_single_word(). Yes, that's reasonable. [..] >> int get_single_word( char* ); > > int main( void ) > { > char* pw; /* pw means pointer to word */ >> get_single_word( pw ); As the program prepares to call get_single_word, it evaluates pw - but the value of pw is indeterminate, so evaluating it results in undefined behaviour. In get_single_word, you intend to modify the pointer (by calloc and possibly realloc), and that change needs to 'stick' in the caller, so it's no good just passing the value. You must pass the /address/ of pw, and make other necessary modifications to the function interface. This is why, on this occasion, your program didn't output what you expected it to output. |
|
#104
|
|
|
|
|
arnuld said:
> I have even tried using pointer to pointer but that still leaves me with > the same problem: No, it leaves you with a different problem. The symptoms may or may not be the same, but the problem is different. >> int main( void ) > { > char* pw; /* pw means pointer to word */ >> get_single_word( &pw ); > > printf("word you entered is: %s\n", pw); You need <stdio.h> if you wish to call printf. > return 0; > } >> > int get_single_word( char** pc ) > { > int idx; > int ch; > char *pc_begin; > > *pc = calloc(AVERAGE_SIZE-1, sizeof(char)); You need <stdlib.h> if you wish to call calloc. Also, why nail the call to the type? This is better: *pc = calloc(AVERAGE_SIZE - 1, sizeof **pc); > pc_begin = *pc; > > if( (! *pc) ) > { > perror("can not allocate memory, sorry babe!"); > return 1; > } Okay - although it's better not to embed messages like this in library functions if you can avoid it. Don't forget to check that return value in the caller. > for( idx = 0; ( (ch = getchar()) != EOF ); ++idx, ++*pc ) I thought you wanted to stop at whitespace? Also, it's better to move pc_begin than *pc, if you must move either of them. Given that you have idx keeping track of things, I see no reason to modify *pc (and plenty of reasons not to), and no reason for pc_begin to exist at all. You can simply do (*pc)[idx] = ch; If you don't like the (), you could keep pc_begin, point it to *pc as you have done, and just do: pc_begin[idx] = ch; instead. No need to increment any pointers. > { > if( AVERAGE_SIZE == idx ) > { > /* use realloc here which I have no idea how to write */ I'll show you how shortly. In the meantime, let's continue to look at what you've got. > *++pc = '\0'; pc is char **, so ++pc is char ** (and utterly invalid), and *++pc is char *, so you're setting a wild pointer to 0. Not good. Could be worse, but not good. > free(pc_begin); Why allocate it at all, if you're going to throw it away before you've even used it? Here's a better way to do this - still not a great way, but a better way. I haven't tested it, by the way, but I'd be mildly surprised if it doesn't work perfectly first time. #include <stdio.h> #include <stdlib.h> #include <ctype.h> #define AVERAGE_SIZE 16 #define GSW_OK 0 /* success */ #define GSW_ENOMEM 1 /* can't allocate buffer - no word fetched */ #define GSW_ENORESIZE 2 /* can't resize buffer - partial word fetched */ int get_single_word( char** pc ) { int rc = GSW_ENOMEM; /* if we succeed, we'll update the status */ size_t idx = 0; int ch; char *pc_begin = NULL; size_t cursize = AVERAGE_SIZE; char *new = NULL; *pc = calloc(cursize, sizeof **pc); if(*pc != NULL) { rc = GSW_OK; /* so far so good */ pc_begin = *pc; while((ch = getchar()) != EOF && isspace((unsigned char)ch)) { continue; /* skipping leading whitespace */ } while(GSW_OK == rc && (ch = getchar()) != EOF && !isspace((unsigned char)ch)) { if(cursize == idx + 1) { new = realloc(*pc, 2 * cursize * sizeof *new); if(new == NULL) { rc = GSW_ENORESIZE; /* error - couldn't enlarge */ pc_begin[idx] = '\0'; } else { *pc = new; } } } if(GSW_OK == rc) { pc_begin[idx++] = ch; } } if(*pc != NULL) { pc_begin[idx] = '\0'; } return rc; } |
|
#105
|
|
|
|
|
> On Wed, 17 Sep 2008 06:29:37 +0000, Richard Heathfield wrote:
> Yes, you could do that, except that (a) it might not be the user's stupid > fault (it may simply be that your machine is low on memory), and (b) there > may be a way to recover. If this is a mere learning exercise and the > learning task is not error recovery, then yes, by all means bomb out. > That's the "student solution" and, like cryptosporidium, is very common. http://en.wikipedia.org/wiki/Cryptosporidium ...aye.... , so lets learn the practical aspects like error-recovery too. I don't like academic solutions BTW |
|
|
|
|
| Similar Threads | |
| Best way to input from stdin? I'm writing a program that supports input from stdin. To be able to do that I tend to rely on a simple loop that tests the return of fgets(), such as the following... |
|
| How to accept input from stdin? Hi, I try to make a wrapper around an existing program, which would behave exactly the same as the original one. But my following attempt was failed. Would you pleaes let me... |
|
| Input using stdin How can I give input to a program using STDIN Suppose I want the program to take the value x=10 On some other site i found it as STDIN.read,but its not working. |
|
| getting input from stdin Hi Im new to unix scripting and now Im trying to get user input from stdin and this is what I did echo "enter your name: " read name and it will run with the pointer to... |
|
| Checking available input on stdin I know this has probably come up frequently, but couldn't find a satisfactory reference... I have some code which needs to read from stdin but must not block waiting for... |
|
|
All times are GMT. The time now is 03:20 AM. | Privacy Policy
|