keyongtech


  keyongtech > ruby > 09/2009

 #1  
09-27-09, 04:46 PM
ThomasW
Hi,

first of all I have to say I'm relatively unexperienced with Ruby and
also new to regular expressions. This causes me some problems:

I'm parsing text files and am using a lot of regexps for this.
Initially I was doing something like this:

file.each_line { |line|
if line =~ /^pattern[a]*/
process_pattern_a(line)
elsif line =~ /pat+e(rn)? b\s*$/
process_pattern_b(line)
# some more elsifs
end
}

But this was really, really slow. My suspicion is that the regexp
objects are recreated and thrown away for every iteration. Storing
all patterns in a table and referencing them like

file.each_line { |line|
if line =~ $line_patterns["pattern a"]
process_pattern_a(line)
elsif line =~ $line_patterns["pattern b"]
process_pattern_b(line)
# some more elsifs
end
}

made things tremendously faster, but I'm not really keen on storing
every regular expression that occurs somewhere in my program in this
table or as a variable. This splits up code that I would like to have
at one place and can create variable clutter.[*]

Is it the case that such "anonymous" objects like regexps (maybe also
strings?) are re-created whenever the code snippet they are defined in
is executed? If so, is there a convenient way of preventing this? Is
this only the case for regexps or also for strings and other objects?
(Why is it the case at all - I can't make any sense of it?) I would
like to learn how I can write Ruby code that is reasonably efficient
in this regard because the impact on execution time in the described
situation was so immense. (I'm currently using Ruby 1.9.1.)

Thanks!
Thomas W.

[*] I maybe could also store the regexps and the to be executed
functions in a table with the regexps as keys and the functions as
values, iterating through them until a matching regexp key was found
so that the function that is stored as a value can be executed. But
this is only possible in situations similar to the described one.
 #2  
09-27-09, 05:12 PM
Ehsanul Hoque
> Is it the case that such "anonymous" objects like regexps (maybe also
> strings?) are re-created whenever the code snippet they are defined in
> is executed? If so=2C is there a convenient way of preventing this? Is
> this only the case for regexps or also for strings and other objects?
> (Why is it the case at all - I can't make any sense of it?) I would
> like to learn how I can write Ruby code that is reasonably efficient
> in this regard because the impact on execution time in the described
> situation was so immense. (I'm currently using Ruby 1.9.1.)


Yes=2C indeed a new object is indeed created every time an anonymous object=
is created. The only core object I know of for which this is not true is t=
he symbol=2C which is basically an immutable string. There may be others I'=
m not aware of though. I suppose your code shows that there just might be a=
need for the symbol equivalent of a regexp.
=0A=
__________________________________________________ _______________=0A=
Hotmail=AE has ever-growing storage! Don=92t worry about storage limits.=0A=
[url down]
rial_Storage_062009=
 #3  
09-27-09, 05:54 PM
Thairuby ->a, b {a + b}
Is this ok? But it still use variable :(

file.each_line { |line|
if line =~ (a ||= $line_patterns["pattern a"])
process_pattern_a(line)
elsif line =~ (b ||= $line_patterns["pattern b"])
process_pattern_b(line)
# some more elsifs
end
}
 #4  
09-27-09, 07:04 PM
Caleb Clausen
On 9/27/09, Ehsanul Hoque <ehsanul_g3> wrote:
>> Is it the case that such "anonymous" objects like regexps (maybe also
>> strings?) are re-created whenever the code snippet they are defined in
>> is executed? If so, is there a convenient way of preventing this? Is
>> this only the case for regexps or also for strings and other objects?
>> (Why is it the case at all - I can't make any sense of it?) I would
>> like to learn how I can write Ruby code that is reasonably efficient
>> in this regard because the impact on execution time in the described
>> situation was so immense. (I'm currently using Ruby 1.9.1.)

>
> Yes, indeed a new object is indeed created every time an anonymous object is
> created. The only core object I know of for which this is not true is the
> symbol, which is basically an immutable string. There may be others I'm not
> aware of though. I suppose your code shows that there just might be a need
> for the symbol equivalent of a regexp.


Actually, I believe that regexp literals are created only once even if
they're executed multiple times. The exception to this would be when
you use #{} within a regexp... that forces ruby to not only create a
new object each time the regexp literal is executed, it has to
recompile the regexp each time.... and that is really slow. You can
bypass this behavior by using the o regexp option, but that only works
right if the value of the inclusion (what's inside #{}) is guaranteed
to be the same on each execution.

Thomas, are you using #{} within your regexps? If so, you should try
sticking an o on the end of each one; that will probably solve your
performance problem. for instance
x =~ /foo#{bar}/o
instead of
x =~ /foo#{bar}/
 #5  
09-27-09, 08:08 PM
ThomasW
On 27 Sep., 20:04, Caleb Clausen <vikk> wrote:
> On 9/27/09, Ehsanul Hoque <ehsanul> wrote:
>
> > Yes, indeed a new object is indeed created every time an anonymous object is
> > created. The only core object I know of for which this is not true is the
> > symbol, which is basically an immutable string.


I think that's not quite what I meant. Of course, if I define the
same regular expression twice at different places, there would be two
regexp objects.


>
> Actually, I believe that regexp literals are created only once even if
> they're executed multiple times. The exception to this would be when
> you use #{} within a regexp... that forces ruby to not only create a
> new object each time the regexp literal is executed, it has to
> recompile the regexp each time.... and that is really slow. You can
> bypass this behavior by using the o regexp option, but that only works
> right if the value of the inclusion (what's inside #{}) is guaranteed
> to be the same on each execution.
>



Thanks so much! Your suspicion was right, I am indeed using #{} in
some of the regular expressions, and the o option does fix the issue.
And your explanation why the expressions would otherwise be recompiled
in every iteration is now very obvious to me.

Now my code is already a bit shorter :)!

Thomas W.
 #6  
09-27-09, 08:57 PM
Gary Wright
On Sep 27, 2009, at 11:50 AM, ThomasW wrote:
> I'm parsing text files and am using a lot of regexps for this.
> Initially I was doing something like this:
>
> file.each_line { |line|
> if line =~ /^pattern[a]*/
> process_pattern_a(line)
> elsif line =~ /pat+e(rn)? b\s*$/
> process_pattern_b(line)
> # some more elsifs
> end
> }


This example is perfect for Ruby's case statement:

file.each_line { |line|
case line
when /^pattern[a]*/o
process_pattern_a(line)
when /pat+e(rn)? b\s*$/o
process_pattern_b(line)
# more when clauses
else
# handle no match
end
}



Gary Wright
 #7  
09-27-09, 09:20 PM
Thairuby ->a, b {a + b}
Thairuby ->a, b {a + b} wrote:
> Is this ok? But it still use variable :(
>
> file.each_line { |line|
> if line =~ (a ||= $line_patterns["pattern a"])
> process_pattern_a(line)
> elsif line =~ (b ||= $line_patterns["pattern b"])
> process_pattern_b(line)
> # some more elsifs
> end
> }


I'm wrong typing. It would be

file.each_line { |line|
if line =~ (a ||= /^pattern[a]*/)
process_pattern_a(line)
elsif line =~ (b ||= /pat+e(rn)? b\s*$/)
process_pattern_b(line)
# some more elsifs
end
}

Does it have o option for string? :)
 #8  
09-27-09, 09:21 PM
ThomasW
On 27 Sep., 21:57, Gary Wright <gwtm> wrote:
[..]
>    when /pat+e(rn)? b\s*$/o
>     process_pattern_b(line)
>    # more when clauses
>    else
>      # handle no match
>    end
>
> }
>
> Gary Wright


Thanks for that tip. I wasn't aware that this also works with regexp
matches. It's great that it does! By the way, is there anything
substantially different from an elsif chain, except for being slightly
less typing?

Thomas W.
 #9  
09-28-09, 12:59 AM
Gary Wright
On Sep 27, 2009, at 4:25 PM, ThomasW wrote:
> Thanks for that tip. I wasn't aware that this also works with regexp
> matches. It's great that it does! By the way, is there anything
> substantially different from an elsif chain, except for being slightly
> less typing?


The semantics are the same in this case but I think the
case statement highlights the fact that you are doing a
sequence of matches against a single object, whereas the
standard if/then/else is a more general construct.

Gary Wright
 #10  
09-28-09, 01:49 AM
Josh Cheek
[Note: parts of this message were removed to make it a legal post.]

On Sun, Sep 27, 2009 at 3:20 PM, Thairuby ->a, b {a + b} <
kabkab> wrote:

> Thairuby ->a, b {a + b} wrote:
>
> I'm wrong typing. It would be
>
> file.each_line { |line|
> if line =~ (a ||= /^pattern[a]*/)
> process_pattern_a(line)
> elsif line =~ (b ||= /pat+e(rn)? b\s*$/)
> process_pattern_b(line)
> # some more elsifs
> end
> }
>
> Does it have o option for string? :)
> --
> Posted via [..].
>Unfortunately, I don't think this does anything, because a and b are

declared within the block, so while the scope is the same, the extent is
not. Essentially, a and b are no longer bound, after each iteration of the
loop. So upon entering each iteration, they do not retain their previously
assigned values.

This can be illustrated:

"patterna\npatte b".each_line do |line|

p line

puts "defined?(a) => #{defined?(a).inspect}"
puts "defined?(b) => #{defined?(b).inspect}"

if line =~ (a ||= /^pattern[a]*/)
elsif line =~ (b ||= /pat+e(rn)? b\s*$/)
else
end

puts "defined?(a) => #{defined?(a).inspect}"
puts "defined?(b) => #{defined?(b).inspect}" , ''

end
__END__

Which has the following output:
"patterna\n"
defined?(a) => nil
defined?(b) => nil
defined?(a) => "local-variable(in-block)"
defined?(b) => "local-variable(in-block)"

"patte b"
defined?(a) => nil
defined?(b) => nil
defined?(a) => "local-variable(in-block)"
defined?(b) => "local-variable(in-block)"

You can see, that a and b were defined after the if statement in "patterna",
but were no longer defined before the if statement for "patte b"
 #11  
09-28-09, 02:09 AM
Thairuby ->a, b {a + b}
Oh, I forgot the scope of local variable :(
Thank you very much for your explanation.
 #12  
09-28-09, 02:52 AM
ThomasW
On 28 Sep., 03:09, "Thairuby ->a, b {a + b}" <kab>
wrote:
> Oh, I forgot the scope of local variable :(
> Thank you very much for your explanation.
> --
> Posted viahttp://www.ruby-forum.com/.


Thairuby, thanks anyway for your effort :).
Similar Threads
destroy referenced objects

Hello, pretend some noob has (in a fake-static class) provided the following method public static kill_object($obj) { if (!is_object($obj)) return false;

Error C2233 - Arrays of objects containing zero-sized arrays are illegal.

I can compile the following (and it works as intended) with OpenWatcom, but VC++ 2005 won't compile (error C2233). // Using C99 Flexible Arrays typedef struct...

Why aren't these objects being kept alive

Hi there, This is actually (probably) a C# or .NET issue. I'm working on an VS 2005 AddIn and at one point I'm caching all "EnvDTE80.ErrorItem" objects that result from...

remoting and referenced objects

Hi All, I have a problem with referenced data updates using Remoting. I can access and update all data obtained from a server's existing object from any client. However,...

When remoting objects do referenced objects come along?

Lets say I've got an object EmployeeManager that contains a reference to a bunch on Employee objects as well as a settings object. If I marshal the EmployeeManager object...


All times are GMT. The time now is 11:41 AM. | Privacy Policy