keyongtech


  keyongtech > c > 10/2008

 #91  
09-16-08, 04:43 AM
CBFalconer
Old Wolf wrote:
> Richard Heathfield <r...@see.sig.invalid> wrote:
>

.... snip ...
>
>> Let me give you an example from ordinary English, where
>> whitespace delimiters are not sufficient:
>>
>> "What did he say?", said Albert.
>> "He just said, 'I'll be there', I think", replied the captain.
>>
> > Now, consider the whitespace-separated tokens:

>
> A bit sidetracked from the original thread, but is
> there actually any problem here besides identifying
> whether a ' symbol is a quote mark or an apostrophe?


And I gather you consider that a trivial problem? Please describe
your algorithm.
 #92  
09-16-08, 04:46 AM
CBFalconer
jameskuyper wrote:
> CBFalconer wrote:
>
> His question was basically about how to translate the C++ algorithm to
> C. So what you're saying is that he must answer his own question
> before he can ask it here? I'm curious, where do you think he should
> go to get help with the translation, since you've ruled out coming
> here for help with it; and C++BFalconer would presumably rule out
> going to clc++ for such a question? And when he finally does ask it,
> according to you, his question is required to take the form "How do I
> translate this algorithm {algorithm already translated into C}, into
> C?". That's patently ridiculous.


You certainly make a good point.
 #93  
09-16-08, 05:01 AM
CBFalconer
Keith Thompson wrote:
> CBFalconer <cbfalconer> writes:
>
> Yes, given the definition above, this string:
>
> " "
>
> contains two "words". Are you suggesting that that's a problem?


I didn't specify a string. I meant those characters contiguous
(i.e. one strictly following the other) in the input stream. The
detection I specified above can be done with one char look ahead.
The presence (and necessity) of such a look ahead scheme may not be
obvious to the casual reader. In C it revolves around the ungetc()
function.

>
> Obviously a program that's intended to recognize C identifiers would
> have to use a different rule. But the OP didn't say anything about C
> identifiers, so I'm not sure why you're bringing them up.
>
> Incidentally, on my initial reading of your followup, I thought your
> use of the word "contiguous" was meant to be related to the use in
> arnuld's definition of "word" (the one I had suggested earlier). In
> fact, they're quite different; in the definition of "word" it refers
> to the characters being adjacent in the input, not to their numeric
> representations. A more careful reading of what you wrote indicates
> that you just meant that the notation 'a'..'z' doesn't make sense
> unless the representations of those characters are numerically
> contiguous. I thought I should point this out in case anyone else is
> confused.


Right. I should have specified 'the values of the chars are
contiguous'. The point being that ASCII works fine, but EBCDIC
doesn't. The C lexer will be a good example, because what it has
to detect is well defined.
 #94  
09-16-08, 05:37 AM
arnuld
> On Mon, 15 Sep 2008 14:43:36 +0100, Ben Bacarisse wrote:

> I was being a bit vague. Lets leave actual array pointers out of
> this. I mean that Richard was talking about changing the char ** as
> seen from the calling function. The thing you are intending to pass,
> a char **, is in some sense a pointer to the whole array: from it all
> of the array's data is accessible. The trouble is you can can't
> change this char ** inside the function -- not in a way that has any
> effect outside. All you can do is change the various things it points
> to.


> ...SNIP....


> If a function needs to change an int, you pass an int *. If it needs
> to change int *, you pass an int **. If it needs to change and int **
> you must pass an int ***.


> ... SNIP....


> Typo! I meant you *can't* write any value into **ppc! Sorry. There
> are two typos, I now see. It should have read: "*ppc is NULL -- you
> set it to be NULL before the call. You can't write any value into
> **ppc."



see my new post titled "pointers passed by copying ?"
 #95  
09-16-08, 07:28 AM
Richard Heathfield
Old Wolf said:

> On Sep 15, 4:50 pm, Richard Heathfield <r...@see.sig.invalid> wrote:
>
> A bit sidetracked from the original thread, but is
> there actually any problem here besides identifying
> whether a ' symbol is a quote mark or an apostrophe?


I think it's about here that I like to pretend I'm from Missouri.

Show me.
 #96  
09-16-08, 10:24 PM
Old Wolf
On Sep 16, 3:43 pm, CBFalconer <cbfalco> wrote:
> Old Wolf wrote:
> > Richard Heathfield <r...@see.sig.invalid> wrote:

>
> >> "He just said, 'I'll be there', I think", replied the captain.

>
> > A bit sidetracked from the original thread, but is
> > there actually any problem here besides identifying
> > whether a ' symbol is a quote mark or an apostrophe?

>
> And I gather you consider that a trivial problem?  Please describe
> your algorithm.


Not at all, I was just checking that there
wasn't some other problem besides this one,
that I hadn't seen.
 #97  
09-16-08, 11:02 PM
Richard Heathfield
Old Wolf said:

> On Sep 16, 3:43 pm, CBFalconer <cbfalco> wrote:
>
> Not at all, I was just checking that there
> wasn't some other problem besides this one,
> that I hadn't seen.


Hyphens are another issue: "will-o'-the-wisp" illustrates where both the
hyphen and the apostrophe are part of the word, but there are situ-
ations where the hyphen (and newline) are not part of the word, just as
there are situations where 'apostrophes' are not part of the word.

Then there's the whole issue of "what is an alphabetic character"? If we
simply say A-Za-z, we exclude a vast range of words from languages such as
French, German, Spanish, Polish, and Russian. I'm not saying we shouldn't
do that, but we should be aware that the decision is costly in terms of
internationalisation.

Is 'C++' a word? How about 'G#m'? You might or might not consider that to
be a word, but a musician might. And yet they may have a very different
opinion about 'H#m'.

What about numbers? Is 42 a word? How about 3Com?

Is the copyright symbol a word? What about the trademark and registered
trademark symbols? Can they be part of a word? Consider, for example,
Microsoft<sup>(R)</sup>.

How about full stops (or 'periods' as some people call them)? Consider:
"U.S.A.", "B.B.C.", "etc.", etc.

What about &? Is that a word?

To any one of these questions, you may say, "yes, that's allowable as part
of a word", or you may say, "no, it's not allowable". But your decision
may well differ from someone else's decision.

And having decided, how do you design your algorithm so that it accepts
"fo'c'sle" as one word rather than three? A dictionary? If you're going to
do /that/, the algorithm is indeed trivial (modulo bugs):

1. start with s = "" and an empty word list
2. c = getch
3. if EOF continue from 8.
4. s += c
5. if s in dictionary
continue from 2.
6. else
s -= c.
if s != ""
add s to word list
s = c
7. continue from 2.
8. if s != ""
add s to word list
9. stop

but now you have to list in your dictionary every single character
combination that you consider to be a word. Big dictionary. (For a start,
every word will need at least three entries: "word", "Word", "WORD".)

The dictionary approach is clumsy in the extreme, and the algorithmic
approach gets more and more difficult as you get pickier and pickier about
what does and what does not constitute a word.
 #98  
09-17-08, 05:04 AM
Old Wolf
On Sep 17, 10:02 am, Richard Heathfield <r...@see.sig.invalid> wrote:
> but now you have to list in your dictionary every single character
> combination that you consider to be a word. Big dictionary. (For a start,
> every word will need at least three entries: "word", "Word", "WORD".)
>
> The dictionary approach is clumsy in the extreme, and the algorithmic
> approach gets more and more difficult as you get pickier and pickier about
> what does and what does not constitute a word.


Surely there is no approach other than using
a sophisticated dictionary. For example:

'Tis the season to be playin'

there is no rule to deduce whether we have
quote marks or apostrophes, besides knowing
that 'Tis is a word.

The dictionary can includes rules such as
the fact that if "abcd" is a word, then
so is "Abcd"; it can know that acronyms
can be written with periods, and so on.

Now where it gets harder is if you have to
accept text from people who make spelling
mistakes and typoes :)
 #99  
09-17-08, 05:37 AM
Richard Heathfield
Old Wolf said:

> On Sep 17, 10:02 am, Richard Heathfield <r...@see.sig.invalid> wrote:
>> but now you have to list in your dictionary every single character
>> combination that you consider to be a word. Big dictionary. (For a
>> start, every word will need at least three entries: "word", "Word",
>> "WORD".)
>>
>> The dictionary approach is clumsy in the extreme, and the algorithmic
>> approach gets more and more difficult as you get pickier and pickier
>> about what does and what does not constitute a word.

>
> Surely there is no approach other than using
> a sophisticated dictionary.


Yes, there is. There is the "good enough for Professor Jenkins[1]"
approach, in which we define "word" as non-empty contiguous sequence of
non-whitespace characters delimited on the left by SOF or whitespace and
on the right by EOF or whitespace.

This is not only good enough for Professor Jenkins[1] but frequently good
enough in the Real World, too.

Not that the Real World has any bearing, but I just thought I'd mention it.

[1] cf Gary Larson (the one with the duck)
 #100  
09-17-08, 06:07 AM
arnuld
> On Mon, 15 Sep 2008 09:28:36 +0000, Richard Heathfield wrote:


> ...SNIP...


> But what if something goes wrong? You'll need to be able to report an
> error. The natural way to do this is via a return value, which means we
> can't use that value for either the list or the count, and that leads us
> to:


what we will do with that return value ? If something wrong occurs I can
simply exit the program telling the user that he did some thing stupid and
he is responsible for that.



> int get_words(char ***, size_t *);
>
> Since they don't need to modify the caller's status, sort_words and
> print_words can be of type int(char **, size_t).



I think there is qsort in std. lib. , hence we can use that but I don't
know whether it modifies the original array or not.




> Up to you, but I wouldn't bother setting a limit (or, if I did, I'd set it
> at a million or so, and treat any string longer than that as a reportable
> error). With dynamic allocation, you don't /need/ to set a limit; you
> simply allocate as you go, and reallocate if necessary.




okay, I will write the program in parts. First we will write a simple
program that will ask the user to input and we will store that word
dynamically using calloc in some array. It will be called get_single_word
and it will form the basis of get_words function which will store all
words in an array. get_single_word returns an int because I want to use
get_single_word in get_words like this:

while( get_single_word )
{
/* code for get_words */
}


Here is my code for get_single_word. PROBLEM: it does not print anything
I entered:


/* a program to get a single word from stdin */


#include <stdio.h>
#include <stdlib.h>

enum { AVERAGE_SIZE = 28 };


int get_single_word( char* );

int main( void )
{
char* pw; /* pw means pointer to word */


get_single_word( pw );

printf("word you entered is: %s\n", pw);

return 0;
}



int get_single_word( char* pc )
{
int idx;
int ch;
char *pc_begin;

pc = calloc(AVERAGE_SIZE-1, sizeof(char));
pc_begin = pc;

if( (! pc) )
{
perror("can not allocate memory, sorry babe!");
return 1;
}

for( idx = 0; ( (ch = getchar()) != EOF ); ++idx, ++pc )
{
if( AVERAGE_SIZE == idx )
{
/* use realloc here which I have no idea how to write */
}

*pc = ch;
}

*++pc = '\0';
free(pc_begin);

return 0;
}

=================== OUTPUT ==================
[arnuld@dune ztest]$ gcc -ansi -pedantic -Wall -Wextra test.c
[arnuld@dune ztest]$ ./a.out
like
word you entered is:
[arnuld@dune ztest]$
 #101  
09-17-08, 06:33 AM
arnuld
> On Wed, 17 Sep 2008 10:07:59 +0500, arnuld wrote:

> .... SNIP...


> Here is my code for get_single_word. PROBLEM: it does not print anything
> I entered:


> .... SNIP...



I have even tried using pointer to pointer but that still leaves me with
the same problem:


int main( void )
{
char* pw; /* pw means pointer to word */


get_single_word( &pw );

printf("word you entered is: %s\n", pw);

return 0;
}



int get_single_word( char** pc )
{
int idx;
int ch;
char *pc_begin;

*pc = calloc(AVERAGE_SIZE-1, sizeof(char));
pc_begin = *pc;

if( (! *pc) )
{
perror("can not allocate memory, sorry babe!");
return 1;
}

for( idx = 0; ( (ch = getchar()) != EOF ); ++idx, ++*pc )
{
if( AVERAGE_SIZE == idx )
{
/* use realloc here which I have no idea how to write */
}

**pc = ch;
}

*++pc = '\0';
free(pc_begin);

return 0;
}
 #102  
09-17-08, 07:17 AM
Ron Ford
On Tue, 16 Sep 2008 06:28:30 +0000, Richard Heathfield posted:

> Old Wolf said:
>> I think it's about here that I like to pretend I'm from Missouri.

>
> Show me.


As it polls redder with the Palin nomination, Huck sighed, 'Ashcroft
sucks."
 #103  
09-17-08, 07:29 AM
Richard Heathfield
arnuld said:

>> On Mon, 15 Sep 2008 09:28:36 +0000, Richard Heathfield wrote:
>>> ...SNIP...

>
>> But what if something goes wrong? You'll need to be able to report an
>> error. The natural way to do this is via a return value, which means we
>> can't use that value for either the list or the count, and that leads us
>> to:

>
> what we will do with that return value ? If something wrong occurs I can
> simply exit the program telling the user that he did some thing stupid
> and he is responsible for that.


Yes, you could do that, except that (a) it might not be the user's stupid
fault (it may simply be that your machine is low on memory), and (b) there
may be a way to recover. If this is a mere learning exercise and the
learning task is not error recovery, then yes, by all means bomb out.
That's the "student solution" and, like cryptosporidium, is very common.

>> int get_words(char ***, size_t *);
>>
>> Since they don't need to modify the caller's status, sort_words and
>> print_words can be of type int(char **, size_t).

>
> I think there is qsort in std. lib. , hence we can use that but I don't
> know whether it modifies the original array or not.


It does modify the original array (by sorting it, would you believe?), but
it won't modify the *pointer*, the one that indicates the location of the
first element of the array.


>> Up to you, but I wouldn't bother setting a limit (or, if I did, I'd set
>> it at a million or so, and treat any string longer than that as a
>> reportable error). With dynamic allocation, you don't /need/ to set a
>> limit; you simply allocate as you go, and reallocate if necessary.

>
> okay, I will write the program in parts. First we will write a simple
> program that will ask the user to input and we will store that word
> dynamically using calloc in some array. It will be called get_single_word
> and it will form the basis of get_words function which will store all
> words in an array.


Good. This sounds like functional decomposition - always a good way to
start off.

> get_single_word returns an int because I want to use
> get_single_word in get_words like this:
>
> while( get_single_word )
> {
> /* code for get_words */
> }


Presumably that's pseudocode, and you intend get_single_word to be a
function call, and the "code for get_words" consists of inserting into an
array the word retrieved by get_single_word(). Yes, that's reasonable.

[..]
>> int get_single_word( char* );

>
> int main( void )
> {
> char* pw; /* pw means pointer to word */
>> get_single_word( pw );


As the program prepares to call get_single_word, it evaluates pw - but the
value of pw is indeterminate, so evaluating it results in undefined
behaviour. In get_single_word, you intend to modify the pointer (by calloc
and possibly realloc), and that change needs to 'stick' in the caller, so
it's no good just passing the value. You must pass the /address/ of pw,
and make other necessary modifications to the function interface.

This is why, on this occasion, your program didn't output what you expected
it to output.
 #104  
09-17-08, 07:57 AM
Richard Heathfield
arnuld said:

> I have even tried using pointer to pointer but that still leaves me with
> the same problem:


No, it leaves you with a different problem. The symptoms may or may not be
the same, but the problem is different.

>> int main( void )

> {
> char* pw; /* pw means pointer to word */
>> get_single_word( &pw );

>
> printf("word you entered is: %s\n", pw);


You need <stdio.h> if you wish to call printf.

> return 0;
> }
>>

> int get_single_word( char** pc )
> {
> int idx;
> int ch;
> char *pc_begin;
>
> *pc = calloc(AVERAGE_SIZE-1, sizeof(char));


You need <stdlib.h> if you wish to call calloc. Also, why nail the call to
the type? This is better:

*pc = calloc(AVERAGE_SIZE - 1, sizeof **pc);

> pc_begin = *pc;
>
> if( (! *pc) )
> {
> perror("can not allocate memory, sorry babe!");
> return 1;
> }


Okay - although it's better not to embed messages like this in library
functions if you can avoid it.

Don't forget to check that return value in the caller.

> for( idx = 0; ( (ch = getchar()) != EOF ); ++idx, ++*pc )


I thought you wanted to stop at whitespace?

Also, it's better to move pc_begin than *pc, if you must move either of
them. Given that you have idx keeping track of things, I see no reason to
modify *pc (and plenty of reasons not to), and no reason for pc_begin to
exist at all. You can simply do (*pc)[idx] = ch;

If you don't like the (), you could keep pc_begin, point it to *pc as you
have done, and just do: pc_begin[idx] = ch; instead. No need to increment
any pointers.

> {
> if( AVERAGE_SIZE == idx )
> {
> /* use realloc here which I have no idea how to write */


I'll show you how shortly. In the meantime, let's continue to look at what
you've got.

> *++pc = '\0';


pc is char **, so ++pc is char ** (and utterly invalid), and *++pc is char
*, so you're setting a wild pointer to 0. Not good. Could be worse, but
not good.

> free(pc_begin);


Why allocate it at all, if you're going to throw it away before you've even
used it?

Here's a better way to do this - still not a great way, but a better way. I
haven't tested it, by the way, but I'd be mildly surprised if it doesn't
work perfectly first time.

#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>

#define AVERAGE_SIZE 16

#define GSW_OK 0 /* success */
#define GSW_ENOMEM 1 /* can't allocate buffer - no word fetched */
#define GSW_ENORESIZE 2 /* can't resize buffer - partial word fetched */

int get_single_word( char** pc )
{
int rc = GSW_ENOMEM; /* if we succeed, we'll update the status */
size_t idx = 0;
int ch;
char *pc_begin = NULL;
size_t cursize = AVERAGE_SIZE;
char *new = NULL;

*pc = calloc(cursize, sizeof **pc);
if(*pc != NULL)
{
rc = GSW_OK; /* so far so good */
pc_begin = *pc;

while((ch = getchar()) != EOF && isspace((unsigned char)ch))
{
continue; /* skipping leading whitespace */
}
while(GSW_OK == rc &&
(ch = getchar()) != EOF &&
!isspace((unsigned char)ch))
{
if(cursize == idx + 1)
{
new = realloc(*pc, 2 * cursize * sizeof *new);
if(new == NULL)
{
rc = GSW_ENORESIZE; /* error - couldn't enlarge */
pc_begin[idx] = '\0';
}
else
{
*pc = new;
}
}
}
if(GSW_OK == rc)
{
pc_begin[idx++] = ch;
}
}

if(*pc != NULL)
{
pc_begin[idx] = '\0';
}

return rc;
}
 #105  
09-17-08, 07:59 AM
arnuld
> On Wed, 17 Sep 2008 06:29:37 +0000, Richard Heathfield wrote:

> Yes, you could do that, except that (a) it might not be the user's stupid
> fault (it may simply be that your machine is low on memory), and (b) there
> may be a way to recover. If this is a mere learning exercise and the
> learning task is not error recovery, then yes, by all means bomb out.
> That's the "student solution" and, like cryptosporidium, is very common.


http://en.wikipedia.org/wiki/Cryptosporidium

...aye.... , so lets learn the practical aspects like error-recovery too. I
don't like academic solutions BTW

Similar Threads
Best way to input from stdin?

I'm writing a program that supports input from stdin. To be able to do that I tend to rely on a simple loop that tests the return of fgets(), such as the following...

How to accept input from stdin?

Hi, I try to make a wrapper around an existing program, which would behave exactly the same as the original one. But my following attempt was failed. Would you pleaes let me...

Input using stdin

How can I give input to a program using STDIN Suppose I want the program to take the value x=10 On some other site i found it as STDIN.read,but its not working.

getting input from stdin

Hi Im new to unix scripting and now Im trying to get user input from stdin and this is what I did echo "enter your name: " read name and it will run with the pointer to...

Checking available input on stdin

I know this has probably come up frequently, but couldn't find a satisfactory reference... I have some code which needs to read from stdin but must not block waiting for...


All times are GMT. The time now is 03:20 AM. | Privacy Policy