|
|
||||||
|
#1
|
|
|
|
|
Hello,
I'm not sure whether this is a problem or not, or how to determine whether it is one. Say memory access (read and write) happens in 64-bit chunks, and I'm looking at 32-bit variables. This would mean that either some other variable is also written when writing a 32-bit variable (which means that all access to 32-bit variables is of the read-modify-write type, affecting some other variable also), or that all 32-bit variables are stored in their own 64-bit chunk. With single-threaded applications, that's a mere performance question. But with multi-threaded applications, there's no way I can imagine that would avoid the read-modify-write problems the first alternative would create, as it is nowhere defined what the other variable is that is also written -- so it can't be protected by a lock. Without it being protected by a lock, there's nothing that prevents a thread from altering it while it is in the middle of the read-modify-write cycle, which means that the end of it will overwrite the altered value with the old value. However, there must be a way to deal with this, otherwise multi-threaded applications in C++ wouldn't be possible. What am I missing? Thanks, Gerhard |
|
|
|
#2
|
|
|
|
|
Gerhard Fiedler wrote:
[..] > it is nowhere defined what the other variable is that is also written -- so > it can't be protected by a lock. Without it being protected by a lock, > there's nothing that prevents a thread from altering it while it is in the > middle of the read-modify-write cycle, which means that the end of it will > overwrite the altered value with the old value. > > However, there must be a way to deal with this, otherwise multi-threaded > applications in C++ wouldn't be possible. > > What am I missing? The fact that C++ does not specify any of that, maybe. Try 'comp.programming.threads' as your starting point since it's the multi-threading that you're concerned about. The problem does not seem to be language-specific, and as such does not belong to a language newsgroup. V |
|
#3
|
|
|
|
|
On Jun 24, 3:59 pm, Victor Bazarov <vAbaza> wrote:
> Gerhard Fiedler wrote: >> >> The fact that C++ does not specify any of that, maybe. > But C++0x will. IIRC, accroding to the draft standard, an implementation is prohibited to do many kind of speculative writes (with the exception of bitfields) to locations that wouldn't be written unconditionally anyway (or something like that). If a specific architecture didn't allow 32 bit load/stores to 32 bit objects, it would require the implementation to pad every object to the smaller load/store granularity. Pretty much all common architectures allow access to memory at least at 8/16/32 bit granularity (except for DSPs I guess), so it is not a problem. Current compilers do not implement the rule above, but thread aware compilers approximate it well enough that, as long as you use correct locks, things work correctly *most of the time* (some compilers have been known to miscompile code which used trylocks for example). > Try 'comp.programming.threads' as your starting point since it's the > multi-threading that you're concerned about. The problem does not seem > to be language-specific, and as such does not belong to a language > newsgroup. > Actually, discussing whether the next C++ standard prohibits speculative writes, is language specific and definitely on topic. |
|
#4
|
|
|
|
|
On 2008-06-24 11:50:26, gpderetta wrote:
> On Jun 24, 3:59 pm, Victor Bazarov <vAbaza> wrote: Just for the record: I didn't really miss that. I just thought that how a very common problem present in a sizable part of C++ applications is being handled across compilers and platforms is actually on topic in a group about the C++ language. > But C++0x will. IIRC, accroding to the draft standard, an implementation > is prohibited to do many kind of speculative writes (with the exception > of bitfields) to locations that wouldn't be written unconditionally > anyway (or something like that). > > If a specific architecture didn't allow 32 bit load/stores to 32 bit > objects, it would require the implementation to pad every object to the > smaller load/store granularity. Pretty much all common architectures > allow access to memory at least at 8/16/32 bit granularity (except for > DSPs I guess), so it is not a problem. Ah, I didn't know that. So on common hardware (maybe x86, x64, AMD, AMD64, IA-64, PowerPC, ARM, Alpha, PA-RISC, MIPS, SPARC), memory access is possible in byte granularity? Which then means that no common compiler would write to locations that are not the actual purpose of the write access? > Current compilers do not implement the rule above, but thread aware > compilers approximate it well enough that, as long as you use correct > locks, things work correctly *most of the time* (some compilers have > been known to miscompile code which used trylocks for example). Do you have any links about which compilers specifically don't create code that works correctly? One objective of mine is to be able to separate this "most of the time" into two clearly defined subsets, one of which works "all of the time" :) > Actually, discussing whether the next C++ standard prohibits > speculative writes, is language specific and definitely on topic. Is "speculative writes" the technical term for the situation I described? Thanks, Gerhard |
|
#5
|
|
|
|
|
On Jun 24, 5:51 pm, Gerhard Fiedler <geli> wrote:
> On 2008-06-24 11:50:26, gpderetta wrote: > > > If a specific architecture didn't allow 32 bit load/stores to 32 bit > > objects, it would require the implementation to pad every object to the > > smaller load/store granularity. Pretty much all common architectures > > allow access to memory at least at 8/16/32 bit granularity (except for > > DSPs I guess), so it is not a problem. > > Ah, I didn't know that. So on common hardware (maybe x86, x64, AMD, AMD64, > IA-64, PowerPC, ARM, Alpha, PA-RISC, MIPS, SPARC), memory access is > possible in byte granularity? Which then means that no common compiler > would write to locations that are not the actual purpose of the write > access? All x86 derivatives allow 8/16/32/64 access at any offset. I think both PowerPC and ARM allows access at any granularity as the access is properly aligned. IIRC very old Alphas only allowed accessing aligned 32/64 bits (no byte access), but it got fixed because it was extremely inconvenient. I do not know about IA-64, MIPS, SPARC and PA-RISC, but I would be extremely surprised if they didn't. > > > Current compilers do not implement the rule above, but thread aware > > compilers approximate it well enough that, as long as you use correct > > locks, things work correctly *most of the time* (some compilers have > > been known to miscompile code which used trylocks for example). > > Do you have any links about which compilers specifically don't create code > that works correctly? One objective of mine is to be able to separate this > "most of the time" into two clearly defined subsets, one of which works > "all of the time" :) > Many in corner cases do. Usually these are considered bugs and are fixed when they are encountered. See for example http://www.airs.com/blog/archives/79 > > Actually, discussing whether the next C++ standard prohibits > > speculative writes, is language specific and definitely on topic. > > Is "speculative writes" the technical term for the situation I described? > I'm not sure if it applies to this example. I think that "speculative store" is defined as the motion of a store outside of its position in program order (usually sinking it outside of loops or branches). It doesn't take much to generalize the concept to that of the *addition* of a store not present in the original program (i.e. adjacent fields overwrites). For details see "Concurrency memory model compiler consequences" by Hans Bohem: http://www.open-std.org/jtc1/sc22/wg...007/n2338.html HTH, |
|
#6
|
|
|
|
|
On Jun 24, 7:50 am, gpderetta <gpdere> wrote:
> On Jun 24, 3:59 pm, Victor Bazarov <vAbaza> wrote: > > The fact that C++ does not specify any of that, maybe. > > But C++0x will. A search on "hans boehm c++ memory model" should bring further information on that. Including videos of Hans Boehm's presentations on the topic. Here is a start: http://www.hpl.hp.com/personal/Hans_Boehm/c++mm/ Ali |
|
#7
|
|
|
|
|
On Jun 24, 3:48 pm, Gerhard Fiedler <geli> wrote:
> I'm not sure whether this is a problem or not, or how to > determine whether it is one. It's potentially one. > Say memory access (read and write) happens in 64-bit chunks, > and I'm looking at 32-bit variables. This would mean that > either some other variable is also written when writing a > 32-bit variable (which means that all access to 32-bit > variables is of the read-modify-write type, affecting some > other variable also), or that all 32-bit variables are stored > in their own 64-bit chunk. > With single-threaded applications, that's a mere performance > question. But with multi-threaded applications, there's no way > I can imagine that would avoid the read-modify-write problems > the first alternative would create, as it is nowhere defined > what the other variable is that is also written -- so it can't > be protected by a lock. Without it being protected by a lock, > there's nothing that prevents a thread from altering it while > it is in the middle of the read-modify-write cycle, which > means that the end of it will overwrite the altered value with > the old value. > However, there must be a way to deal with this, otherwise > multi-threaded applications in C++ wouldn't be possible. Most hardware provides for single byte writes (even when the read is always 64 bits), and takes care that it works correctly. From what I understand, this wasn't the case on some early DEC Alphas, and it certainly wasn't the case on many older platforms, where when you wrote a byte, the hardware would read a word, and rewrite it. The upcoming version of the standard will address this problem; if nothing changes, it will require that *most* accesses to a single "object" work. (The major exception is bit fields. If you access an object that is declared as a bit field, and any other thread may modify any object in the containing class, you need to explicitly synchronize.) Implementations for processors where the hardware doesn't support this have their work cut out for them (but better them than us), and byte accesses on such implementations are likely to be very slow. |
|
#8
|
|
|
|
|
On 2008-06-24 18:17:52, James Kanze wrote:
> > Most hardware provides for single byte writes (even when the read is > always 64 bits), and takes care that it works correctly. What I find a bit disconcerting is that it seems so difficult to find out whether a given hardware actually does this. Reality seems to confirm that it actually is "most" (or otherwise "most" programs would probably crash a lot more than they do), but I haven't found any documentation about any specific guarantees of specific compilers on specific platforms. (I'm mainly interested in VC++ and gcc.) Does somebody have any pointers for me? Thanks, Gerhard |
|
#9
|
|
|
|
|
In article <1om696gj5nba5$.dlg>, gelists
says... [ ... ] > What I find a bit disconcerting is that it seems so difficult to find out > whether a given hardware actually does this. Reality seems to confirm that > it actually is "most" (or otherwise "most" programs would probably crash a > lot more than they do), but I haven't found any documentation about any > specific guarantees of specific compilers on specific platforms. (I'm > mainly interested in VC++ and gcc.) Does somebody have any pointers for me? There are a number of problems with that. The first is that when you get to exotic multiprocessors, a lot of ideas have been tried, and even though only a few have really gained much popularity, there are still some that bend almost any rule you'd like to make. Another problem is that even on a given piece of hardware, the behavior can be less predictable than you'd generally like. For example, recent versions of the Intel x86 processors all have Memory Type and Range Registers (MTRRs). Using an MTRR, one can adjust the behavior of memory writes individually for ranges of memory. You can get write-back caching, write-through caching, write combining, or no caching at all -- all on the same machine at the same time for different ranges of memory. Also keep in mind that most modern computers use caching. In a typical case, any read from or write to main memory happens an entire cache line at a time. Bookkeeping is also done on the basis of entire cache lines, so the processor doesn't care how many bits in a cache line have been modified -- from its viewpoint, the cache line as a whole is either modified or not. If, for example, another processor attempts to read memory that falls in that cache line, the entire line is written to memory before the other processor can read it. Even if the two are entirely disjoint, if they fall in the same cache line, the processor treats them as a unit. |
|
#10
|
|
|
|
|
On Jun 25, 12:53 am, Jerry Coffin <jcof> wrote:
> In article <1om696gj5nba5>, geli...@gmail.com > says... > [ ... ] > > What I find a bit disconcerting is that it seems so > > difficult to find out whether a given hardware actually does > > this. Reality seems to confirm that it actually is "most" > > (or otherwise "most" programs would probably crash a lot > > more than they do), but I haven't found any documentation > > about any specific guarantees of specific compilers on > > specific platforms. (I'm mainly interested in VC++ and gcc.) > > Does somebody have any pointers for me? It depends mostly on the hardware architecture, not the compiler. The compiler will generate byte, half-word, etc. load and store machine instructions (assuming they exist, of course); the problem is what the hardware does with them. For Sparc architecture, see http://www.sparc.org/specificationsDocuments.html. I presume that other architecture providers (e.g. Intel, AMD, etc.) have similar pages. [...] > Also keep in mind that most modern computers use caching. In a > typical case, any read from or write to main memory happens an > entire cache line at a time. Bookkeeping is also done on the > basis of entire cache lines, so the processor doesn't care how > many bits in a cache line have been modified -- from its > viewpoint, the cache line as a whole is either modified or > not. If, for example, another processor attempts to read > memory that falls in that cache line, the entire line is > written to memory before the other processor can read it. Even > if the two are entirely disjoint, if they fall in the same > cache line, the processor treats them as a unit. That's true to a point. Most modern architectures also ensure cache coherence at the hardware level: if one thread writes to the first byte in a cache line, and a different thread (on a different core) writes to the second byte, the hardware will ensure that both writes eventually end up in main memory; that the write back of the cache line from one core won't overwrite the changes made by the other core. This issue was discussed in detail by the committee; in the end, it was decided that given something like: struct S { char a; char b; } ; or char a[2] ; one thread could modify S::a or a[0], and the other S::b or a[1], without any explicit synchronization, and the compiler had to make it work. This was accepted because in fact, just emitting store byte instructions is sufficient for all of the current architectures. |
|
#11
|
|
|
|
|
On 2008-06-25 04:58:41, James Kanze wrote:
> > It depends mostly on the hardware architecture, not the compiler. The > compiler will generate byte, half-word, etc. load and store machine > instructions (assuming they exist, of course); the problem is what the > hardware does with them. > > For Sparc architecture, see [..]. > I presume that other architecture providers (e.g. Intel, AMD, etc.) > have similar pages. Thanks. I thought that it would also depend on how the compiler generates the code, but I guess you're right in assuming that any (halfway decent) compiler will generate 8-bit writes for 8-bit variables if that is possible :) > > That's true to a point. Most modern architectures also ensure cache > coherence at the hardware level: if one thread writes to the first byte > in a cache line, and a different thread (on a different core) writes to > the second byte, the hardware will ensure that both writes eventually > end up in main memory; that the write back of the cache line from one > core won't overwrite the changes made by the other core. Taken all this together, it seems that on "most modern architectures" cache coherency is mostly guaranteed by the hardware, and for example it is not necessary to use memory barriers or locks for access to volatile boolean variables that are only read or written (never using a read-modify-write cycle). Is this correct? What is all this talk about different threads seeing values out of order about, if the cache coherency is maintained by the hardware in this way? Gerhard |
|
#12
|
|
|
|
|
On Jun 25, 3:44 pm, Gerhard Fiedler <geli> wrote:
<snip> > Taken all this together, it seems that on "most modern architectures" cache > coherency is mostly guaranteed by the hardware, and for example it is not > necessary to use memory barriers or locks for access to volatile boolean > variables that are only read or written (never using a read-modify-write > cycle). Is this correct? What is all this talk about different threads > seeing values out of order about, if the cache coherency is maintained by > the hardware in this way? Cache coherency is not the only part of a system that can reorder load and stores. Write buffers and OoO machinery are also responsible. Even x86 which has an otherwise fairly strong memory model, requires for example StoreLoad memory barriers (i.e. mfence or locked operations). So, AFAIK the answer is no: in general, and for most compilers, even volatile is not enough. |
|
#13
|
|
|
|
|
On Jun 25, 3:44 pm, Gerhard Fiedler <geli> wrote:
> On 2008-06-25 04:58:41, James Kanze wrote: [...] > > For Sparc architecture, > > seehttp://www.sparc.org/specificationsDocuments.html. I > > presume that other architecture providers (e.g. Intel, AMD, > > etc.) have similar pages. > Thanks. I thought that it would also depend on how the > compiler generates the code, but I guess you're right in > assuming that any (halfway decent) compiler will generate > 8-bit writes for 8-bit variables if that is possible :) Well, it would be nice if they'd document it. But in practice, I don't worry too much about a compiler generating code to load a word, change one byte of it, and then storing it, if the hardware has a single instruction byte store. > >> Also keep in mind that most modern computers use caching. > >> In a typical case, any read from or write to main memory > >> happens an entire cache line at a time. Bookkeeping is also > >> done on the basis of entire cache lines, so the processor > >> doesn't care how many bits in a cache line have been > >> modified -- from its viewpoint, the cache line as a whole > >> is either modified or not. If, for example, another > >> processor attempts to read memory that falls in that cache > >> line, the entire line is written to memory before the other > >> processor can read it. Even if the two are entirely > >> disjoint, if they fall in the same cache line, the > >> processor treats them as a unit. > > That's true to a point. Most modern architectures also > > ensure cache coherence at the hardware level: if one thread > > writes to the first byte in a cache line, and a different > > thread (on a different core) writes to the second byte, the > > hardware will ensure that both writes eventually end up in > > main memory; that the write back of the cache line from one > > core won't overwrite the changes made by the other core. > Taken all this together, it seems that on "most modern > architectures" cache coherency is mostly guaranteed by the > hardware, and for example it is not necessary to use memory > barriers or locks for access to volatile boolean variables > that are only read or written (never using a read-modify-write > cycle). Is this correct? What is all this talk about different > threads seeing values out of order about, if the cache > coherency is maintained by the hardware in this way? Several things. The first, of course, is what we've just been talking about only concerns a single cache line; the hardware might not be so careful between cache lines (which results in multiple physical writes). But the real reason is that reads and writes, even to the cache, are pipelined in the processor itself, and can be reordered in the pipeline. Thus, for example, if we suppose two int's, i and j, both initially 0, and one processor executes: store #1, i store #1, j a second processor can still see the condition i==0, j==1, because either the first processor has reordered the writes (because of pipeline considerations), or because the second recognized that it already had a read of the cache line with j in its pipeline, and used the results of that read for j. |
|
#14
|
|
|
|
|
In article <9usooapqayyx.dlg>, gelists
says... [ ... ] > Taken all this together, it seems that on "most modern architectures" cache > coherency is mostly guaranteed by the hardware, and for example it is not > necessary to use memory barriers or locks for access to volatile boolean > variables that are only read or written (never using a read-modify-write > cycle). Is this correct? What is all this talk about different threads > seeing values out of order about, if the cache coherency is maintained by > the hardware in this way? Yes and no. The hardware normally ensures coherency for a single variable -- but it doesn't know anything about the relationships you've established between variables. For example, assume a really simple situation where you have some data and a bool to tell when the data is valid: struct whatever { int data1; float data2; bool valid; public: whatever() : valid(false) {} } thing; If you have code like: thing.data1 = 1; thing.data2 = 2.0f; thing.valid = true; The hardware will assure that when a write has taken place to any of the variables, any other core looking at the memory location of that variable will see the value that was written. Now, we don't care at all about the relative order in which data1 and data2 are written -- whichever way the hardware can do it the fastest is fine by us. BUT we need to assure that 'valid' is only see as true AFTER the values have been written to both data1 and data2. The hardware doesn't know this on its own. It just sees three separate assignments to three separate variables. As such, the programmer needs to "inform" the hardware about the relationship involved. |
|
#15
|
|
|
|
|
On 2008-06-25 14:56:16, Jerry Coffin wrote:
>> Taken all this together, it seems that on "most modern architectures" >> cache coherency is mostly guaranteed by the hardware, and for example >> it is not necessary to use memory barriers or locks for access to >> volatile boolean variables that are only read or written (never using a >> read-modify-write cycle). Is this correct? What is all this talk about >> different threads seeing values out of order about, if the cache >> coherency is maintained by the hardware in this way? > > Yes and no. [Lots of useful stuff snipped.] Thanks to all who responded in this thread. It has helped me a good deal in understanding what I can rely on and what not. Gerhard |
|
|
| Similar Threads | |
| Using a VBA Variable function from Access table data in Access que I want to use a VBA function that is a variable that references values in a table is Access. I want to use the function in the normal query design of Access. So, in the... |
|
| Handing over a String Variable from one Access Applikation to a second Access Application Hi Following: - I have 1 basic Access Application - On some events within that 1. Acc Application a 2. Acc Application will open I want to hand over the value of a string... |
|
| Member variable versus local variable access performance Lets say you have a method that is called hundreds of times a second and its job is to write data to a variety of variables (ints, doubles, strings, etc.). These values... |
|
| Code access permission to access processes running in memory Does anyone know the standard code access permission to be able to manipulate a process using the Process class? Thanks Simon. |
|
| How to access a variable in exception handler? Variable not intialized error Hi Guys, I am saving a xml notification in variable xmlNotification of XmlDocument type.For some reason if I get a exception I want to send the notification with failed... |
|
|
All times are GMT. The time now is 12:01 AM. | Privacy Policy
|