In quad-core processors became standard for desktop computers , while servers have 10 and 12 core processors. From Moore's law it can be predicted that the number of cores per processor will double every 18—24 months. This could mean that after a typical processor will have dozens or hundreds of cores. An operating system can ensure that different tasks and user programmes are run in parallel on the available cores. However, for a serial software programme to take full advantage of the multi-core architecture the programmer needs to restructure and parallelise the code.
A speed-up of application software runtime will no longer be achieved through frequency scaling, instead programmers will need to parallelise their software code to take advantage of the increasing computing power of multicore architectures. Optimally, the speedup from parallelization would be linear—doubling the number of processing elements should halve the runtime, and doubling it a second time should again halve the runtime. However, very few parallel algorithms achieve optimal speedup. Most of them have a near-linear speedup for small numbers of processing elements, which flattens out into a constant value for large numbers of processing elements.
The potential speedup of an algorithm on a parallel computing platform is given by Amdahl's law . A program solving a large mathematical or engineering problem will typically consist of several parallelizable parts and several non-parallelizable serial parts. This puts an upper limit on the usefulness of adding more parallel execution units. The bearing of a child takes nine months, no matter how many women are assigned. Amdahl's law only applies to cases where the problem size is fixed. In practice, as more computing resources become available, they tend to get used on larger problems larger datasets , and the time spent in the parallelizable part often grows much faster than the inherently serial work.
Both Amdahl's law and Gustafson's law assume that the running time of the serial part of the program is independent of the number of processors. Amdahl's law assumes that the entire problem is of fixed size so that the total amount of work to be done in parallel is also independent of the number of processors , whereas Gustafson's law assumes that the total amount of work to be done in parallel varies linearly with the number of processors.
Understanding data dependencies is fundamental in implementing parallel algorithms. No program can run more quickly than the longest chain of dependent calculations known as the critical path , since calculations that depend upon prior calculations in the chain must be executed in order. However, most algorithms do not consist of just a long chain of dependent calculations; there are usually opportunities to execute independent calculations in parallel.
Let P i and P j be two program segments. Bernstein's conditions  describe when the two are independent and can be executed in parallel. For P i , let I i be all of the input variables and O i the output variables, and likewise for P j. P i and P j are independent if they satisfy. Violation of the first condition introduces a flow dependency, corresponding to the first segment producing a result used by the second segment.
The second condition represents an anti-dependency, when the second segment produces a variable needed by the first segment. The third and final condition represents an output dependency: when two segments write to the same location, the result comes from the logically last executed segment.
In this example, instruction 3 cannot be executed before or even in parallel with instruction 2, because instruction 3 uses a result from instruction 2. It violates condition 1, and thus introduces a flow dependency.
Parallel Scientific Computation: A Structured Approach using BSP and MPI - Oxford Scholarship
In this example, there are no dependencies between the instructions, so they can all be run in parallel. Bernstein's conditions do not allow memory to be shared between different processes. For that, some means of enforcing an ordering between accesses is necessary, such as semaphores , barriers or some other synchronization method. Subtasks in a parallel program are often called threads.
- Captcha | Turing Test ;
- Applications of Graph Transformations with Industrial Relevance: Third International Symposium, AGTIVE 2007, Kassel, Germany, October 10-12, 2007, Revised Selected and Invited Papers.
- Principles of charged particle acceleration.
- A 6th Bowl of Chicken Soup for the Soul: 101 More Stories to Open the Heart And Rekindle The Spirit!
- Parallel Scientific Computing in C++ and MPI!
Some parallel computer architectures use smaller, lightweight versions of threads known as fibers , while others use bigger versions known as processes. However, "threads" is generally accepted as a generic term for subtasks. Without synchronization, the instructions between the two threads may be interleaved in any order. For example, consider the following program:. If instruction 1B is executed between 1A and 3A, or if instruction 1A is executed between 1B and 3B, the program will produce incorrect data.
This is known as a race condition. The programmer must use a lock to provide mutual exclusion. A lock is a programming language construct that allows one thread to take control of a variable and prevent other threads from reading or writing it, until that variable is unlocked. The thread holding the lock is free to execute its critical section the section of a program that requires exclusive access to some variable , and to unlock the data when it is finished.
Therefore, to guarantee correct program execution, the above program can be rewritten to use locks:. One thread will successfully lock variable V, while the other thread will be locked out —unable to proceed until V is unlocked again. This guarantees correct execution of the program. Locks may be necessary to ensure correct program execution when threads must serialize access to resources, but their use can greatly slow a program and may affect its reliability.
Locking multiple variables using non-atomic locks introduces the possibility of program deadlock. An atomic lock locks multiple variables all at once. If it cannot lock all of them, it does not lock any of them.
Parallel Scientific Computation: A Structured Approach using BSP and MPI
If two threads each need to lock the same two variables using non-atomic locks, it is possible that one thread will lock one of them and the second thread will lock the second variable. In such a case, neither thread can complete, and deadlock results. Many parallel programs require that their subtasks act in synchrony. This requires the use of a barrier.
Barriers are typically implemented using a lock or a semaphore. However, this approach is generally difficult to implement and requires correctly designed data structures. Not all parallelization results in speed-up. Generally, as a task is split up into more and more threads, those threads spend an ever-increasing portion of their time communicating with each other or waiting on each other for access to resources. This problem, known as parallel slowdown ,  can be improved in some cases by software analysis and redesign.
Applications are often classified according to how often their subtasks need to synchronize or communicate with each other. An application exhibits fine-grained parallelism if its subtasks must communicate many times per second; it exhibits coarse-grained parallelism if they do not communicate many times per second, and it exhibits embarrassing parallelism if they rarely or never have to communicate. Embarrassingly parallel applications are considered the easiest to parallelize.
Parallel programming languages and parallel computers must have a consistency model also known as a memory model. The consistency model defines rules for how operations on computer memory occur and how results are produced. One of the first consistency models was Leslie Lamport 's sequential consistency model. Sequential consistency is the property of a parallel program that its parallel execution produces the same results as a sequential program.
Specifically, a program is sequentially consistent if "the results of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program".
Software transactional memory is a common type of consistency model. Software transactional memory borrows from database theory the concept of atomic transactions and applies them to memory accesses. Mathematically, these models can be represented in several ways.
Subscriviu-vos al butlletí
Introduced in , Petri nets were an early attempt to codify the rules of consistency models. Dataflow theory later built upon these, and Dataflow architectures were created to physically implement the ideas of dataflow theory. Beginning in the late s, process calculi such as Calculus of Communicating Systems and Communicating Sequential Processes were developed to permit algebraic reasoning about systems composed of interacting components. Michael J. Flynn created one of the earliest classification systems for parallel and sequential computers and programs, now known as Flynn's taxonomy.