نموذج الهيكلة والتجميع

نسخة مفحوصة من هذه الصفحة اعتمدت في 14 أكتوبر 2020 بناء على هذه النسخة.

نموذج الهيكلة والتجميع (بالانجليزية: نموذج الهيكلة والتجميع)هو عبارة عن نموذج برمجي، تم تطويره من قبل شركة جوجل لإجراء عمليات حسابية بشكل متزامن على كمية كبيرة جداً من البيانات (عدة بيتابايت^[1]), وتتم هذه العمليات على عدة حواسيب تسمى بالعناقيد الحاسوبية. كما يطلق اسم MapReduce على اسم المكتبة البرمجية لهذا النموذج وليس فقط على النموذج نفسه. باستخدام طريقة الهيكلة والتجميع سيتم معالجة البيانات على ثلاثة مراحل (التجزئة، الخلط, التجميع) وبالانجليزية على التوالي (Map, shuttle ,reduce), اثنتان منهم تتم من قِبَل المستخدم وهما التجزئة والتجميع. -- وبالتالي يمكن إجراء العمليات الحسابية على عدة حواسيب وفي الوقت نفسه، أي بشكل متواز. تكمن أهمية هذه التقنية في الدور الذي تلعبه في توزيع العمليات على عدة حساب وإجرائها بشكل متزامن، ومن ثم جمع النتائج في المرحلة الثانية(الخلط). بصورة أوضح يمكننا القول أننا نقسم المشكلة الكبيرة إلى عدة أجزاء ونحل كل جزء منها على حدا ومن ثم نجمع هذه الحلول مع بعضها لتشكل في النهاية الحل للمشكلة الكبيرة. فكرة هذا النموذج مستوحاة من الدوال Map و Reduce, اللذان يستخدمان بكثرة في البرمجة الوظيفية.^[2]

في عام 2010 تم منح براءة اختراع أمريكية على هذه التقنية.^[3]

طريقة العمل

MapReduce
Zwischen Ergebnisse: نتائج غير نهائية

الصورة (MapReduce) توضح كيفية حساب البيانات في تقنية الهيكلة والتجميع وهي كالتالي:

في البداية لدينا البيانات (D,A,T,A) المُراد حسابها، ويتم توزيع هذه البيانات على مجموعة من عمليات الهيكلة Map, وكل عملية تقوم بوظيفة معينة يتم تحديدها من قِبَل المستخدم.
يتم تنفيذ هذه العمليات بشكل متوازي.
كل عملية من هذه العمليات تقوم بطرح مخرجاتها الأولية (هذه المخرجات ممثلة باللون الزهري), مع العلم أن العملية الواحدة يمكنها توزيع مخرجاتها على أكثر من مكان تخزين.
هنا نأتي إلى مرحلة الخلط، حيث يتم خلط هذه البيانات مع بعضها البعض، وهنا أيضاً يتم تبادل البيانات بشكل كبير بين الأجهزة (الأماكن التي تتواجد فيها النتائج الأولية).
عندما تنتهي مرحلة الخلط فإنه يمكننا القول أنها بداية مرحلة التجميع Reduce (باللون الأزرق).
يتم هنا إجراء عملية تجميع واحدة لكل مكان تخزين(اللون الزهري) وتتم أيضاً هذه العمليات بشكل متواز، ومن ثم تطرح كل عملية من هذه العمليات مخرجاتها على حدا، وهي (X,Y,Z).

تعريف اقتران الهيكلة والتجميع

يقوم اقتران الهيكلة والتجميع بتحويل قائمة من المدخلات إلى قائمة من المخرجات، وكل قائمة-المدخلات والمخرجات- تتكون من قيم ومفتاح لكل قيمة، كما هو الحال في دوال التجزئة أو ال دالة تجزئة.

{\begin{aligned}\mathrm {MapReduce} :(K\times V)^{*}&\to (L\times W)^{*}\\{\lbrack (k_{1},v_{1}),\ldots ,(k_{n},v_{n})\rbrack }&\mapsto \lbrack (l_{1},w_{1}),\ldots ,(l_{m},w_{m})\rbrack \end{aligned}}

شرح اقتران الهيكلة والتجميع

المجموعات $K$ و $L$ تحوي على المفاتيح، بينما تحوي المجموعات $V$ و $W$ على القيم.
كل المفاتيح $k\in K$ هي من نفس النوع، مثلاً: نص أو String.
كل المفاتيح $l\in L$ هي من نفس النوع، مثلاً: عدد صحيح.
كل القيم $v\in V$ هي من نفس النوع، مثلاً: عدد عشري.
كل القيم $w\in W$ هي من نفس النوع، مثلاً: عدد حقيقي.
إذا كانت $A$ و $B$ هي مجموعات، فإن $A\times B$ هي مجموعة كل الأزواج $(a,b)$ , بحيث أن $a\in A$ و $b\in B$ (الضرب الديكارتي).
إذا كانت $M$ هي مجموعة، فإن $M^{*}$ هي مجموعة كل المجموعات التي يمكن تكوينها من عناصر $M$ , أو ما تسمى بنجمة كلين.

تعريف اقتران الهيكلة Map واقتران التجميع Reduce

{\begin{aligned}\mathrm {Map} :K\times V&\to (L\times W)^{*}\\(k,v)&\mapsto \lbrack (l_{1},x_{1}),\ldots ,(l_{r_{k}},x_{r_{k}})\rbrack \end{aligned}}

أو

{\begin{aligned}\mathrm {Reduce} :L\times W^{*}&\to X^{*}\\\left(l,{\lbrack y_{1},\ldots ,y_{s_{l}}\rbrack }\right)&\mapsto \lbrack w_{1},\ldots ,w_{m_{l}}\rbrack \end{aligned}}

روابط خارجية

MapReduce-MPI MapReduce-MPI Library

صحف

"CloudSVM: Training an SVM Classifier in Cloud Computing Systems"-paper by F. Ozgur Catak, M. Erdal Balaban, Springer, Lecture Notes in Computer Science,Pervasive Computing and Networked World 2012 from مجلس البحوث العلمية والتكنولوجية في تركيا and جامعة إسطنبول
"A Hierarchical Framework for Cross-Domain MapReduce Execution" — paper by Yuan Luo, Zhenhua Guo, Yiming Sun, Beth Plale, Judy Qiu; from جامعة إنديانا and Wilfred Li; from جامعة كاليفورنيا (سان دييغو)
"Interpreting the Data: Parallel Analysis with Sawzall" — paper by Rob Pike, Sean Dorward, Robert Griesemer, Sean Quinlan; from مختبرات جوجل
"Evaluating MapReduce for Multi-core and Multiprocessor Systems" — paper by Colby Ranger, Ramanan Raghuraman, Arun Penmetsa, Gary Bradski, and Christos Kozyrakis; from جامعة ستانفورد
"Why MapReduce Matters to SQL Data Warehousing" — analysis related to the August, 2008 introduction of MapReduce/SQL integration by Aster Data Systems and Greenplum
"MapReduce for the Cell B.E. Architecture" — paper by Marc de Kruijf and Karthikeyan Sankaralingam; from جامعة ويسكونسن-ماديسون
"Mars: A MapReduce Framework on Graphics Processors" — paper by Bingsheng He, Wenbin Fang, Qiong Luo, Naga K. Govindaraju, Tuyong Wang; from جامعة هونغ كونغ للعلوم والتكنولوجيا; published in Proc. PACT 2008. It presents the design and implementation of MapReduce on graphics processors.
"A Peer-to-Peer Framework for Supporting MapReduce Applications in Dynamic Cloud Environments" — paper by Fabrizio Marozzo, Domenico Talia, Paolo Trunfio; from جامعة كالابريا; published in Cloud Computing: Principles, Systems and Applications, N. Antonopoulos, L. Gillam (Editors), chapt. 7, pp. 113–125, Springer, 2010, ISBN 978-1-84996-240-7.
"Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters" — paper by Hung-Chih Yang, Ali Dasdan, Ruey-Lung Hsiao, and D. Stott Parker; from ياهو! and جامعة كاليفورنيا; published in Proc. of ACM SIGMOD, pp. 1029–1040, 2007. (This paper shows how to extend MapReduce for relational data processing.)
FLuX: the Fault-tolerant, Load Balancing eXchange operator from جامعة كاليفورنيا (بركلي) provides an integration of partitioned parallelism with process pairs. This results in a more pipelined approach than Google's MapReduce with instantaneous failover, but with additional implementation cost.
"A New Computation Model for Rack-Based Computing" — paper by Foto N. Afrati; Jeffrey D. Ullman; from جامعة ستانفورد; Not published as of November 2009. This paper is an attempt to develop a general model in which one can compare algorithms for computing in an environment similar to what map-reduce expects.
FPMR: MapReduce framework on FPGA—paper by Yi Shan, Bo Wang, Jing Yan, Yu Wang, Ningyi Xu, Huazhong Yang (2010), in FPGA '10, Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays.
"Tiled-MapReduce: Optimizing Resource Usages of Data-parallel Applications on Multicore with Tiling"—paper by Rong Chen, Haibo Chen and Binyu Zang from جامعة فودان; published in Proc. PACT 2010. It presents the Tiled-MapReduce programming model which optimizes resource usages of MapReduce applications on multicore environment using tiling strategy.
"Tiled MapReduce: Efficient and Flexible MapReduce Processing on Multicore with Tiling"—paper by Rong Chen, and Haibo Chen from جامعة شانغهاي جياو تونغ; published in ACM TACO, 10(1), 2013. It extends the earlier version of Ostrich to support several usage scenarios such as online and incremental computing on multicore machines.
"Scheduling divisible MapReduce computations "—paper by Joanna Berlińska from جامعة آدم ميتسكيفيتش في بوزنان and Maciej Drozdowski from Poznan University of Technology; Journal of Parallel and Distributed Computing 71 (2011) 450-459, دُوِي:10.1016/j.jpdc.2010.12.004. It presents scheduling and performance model of MapReduce.
"Nephele/PACTs: A Programming Model and Execution Framework for Web-Scale Analytical Processing"—paper by D. Battré, S. Ewen, F. Hueske, O. Kao, V. Markl, and D. Warneke from TU Berlin published in Proc. of ACM SoCC 2010. The paper introduces the PACT programming model, a generalization of MapReduce, developed in the Stratosphere research project.
"MapReduce and PACT - Comparing Data Parallel Programming Models"—paper by A. Alexandrov, S. Ewen, M. Heimel, F. Hueske, O. Kao, V. Markl, E. Nijkamp, and D. Warneke from TU Berlin published in Proc. of BTW 2011.

كتب

Jimmy Lin and Chris Dyer. "Data-Intensive Text Processing with MapReduce" (manuscript)

مقررات تعليمية

Cluster Computing and MapReduce course from Google Code University contains video lectures and related course materials from a series of lectures that was taught to Google software engineering interns during the Summer of 2007.
MapReduce in a Week course from Google Code University contains a comprehensive introduction to MapReduce including lectures, reading material, and programming assignments.
MapReduce course, taught by engineers of جوجل Boston, part of 2008 Independent Activities Period at معهد ماساتشوستس للتقنية.

بيبلوغرافيا

MapReduce bibliography by A. Kamil, 2010

المصادر

^ Google spotlights data center inner workings | Tech news blog - CNET News.com نسخة محفوظة 19 أكتوبر 2013 على موقع واي باك مشين. ^{[وصلة مكسورة]}
^ "Our abstraction is inspired by the map and reduce primitives present in Lisp and many other functional languages." -"MapReduce: Simplified Data Processing on Large Clusters", von Jeffrey Dean und Sanjay Ghemawat, مختبرات جوجل نسخة محفوظة 11 ديسمبر 2017 على موقع واي باك مشين.
^ USP 7,650,331 (United States Patent and Trademark Office) نسخة محفوظة 03 يوليو 2017 على موقع واي باك مشين.

[1] Google spotlights data center inner workings | Tech news blog - CNET News.com نسخة محفوظة 19 أكتوبر 2013 على موقع واي باك مشين. ^{[وصلة مكسورة]}

[map-2] "Our abstraction is inspired by the map and reduce primitives present in Lisp and many other functional languages." -"MapReduce: Simplified Data Processing on Large Clusters", von Jeffrey Dean und Sanjay Ghemawat, مختبرات جوجل نسخة محفوظة 11 ديسمبر 2017 على موقع واي باك مشين.

[3] USP 7,650,331 (United States Patent and Trademark Office) نسخة محفوظة 03 يوليو 2017 على موقع واي باك مشين.

[1]

[2]

[3]

وصلة مكسورة

ضبط استنادي
دولية	ملف الضبط الاستنادي الافتراضي الدَّولي (VIAF)
وطنية	المكتبة القومية الإسرائيلية (J9U) مكتبة الكونغرس (LCNAF)