02Mar 2018

THE AMBIENT SCRUTINIZE OF SCHEDULING ALGORITHMS IN BIG DATA TERRITORY.

  • Ph.D (Computer Science & Engineering), M.Tech, Assistant Professor , Department of Information Technology, AL Baha University, AL Baha, Kingdom of Saudi Arabia (KSA).
Crossref Cited-by Linking logo
  • Abstract
  • Keywords
  • References
  • Cite This Article as
  • Corresponding Author

Today scenario, we live in the data age and a key metric of existing times is the amount of data that is originates ubiquitously around us. At present-time intense increase in the number of Internet subscriber and connected devices, as well as rising of the IoT. As an outcome, quantities of data are originated (so called Big Data), such as user data (structured, unstructured, or semi structured), sensor data and log files. It is an increasingly business for companies to collect and analysis Big Data and provides insights to their client. In general processing such spacious amount of data with multifarious formats can be time consuming. The Hadoop is an open source framework that is used to process spacious amounts of data in an economical and proficient way, and job scheduling has become a significant factor to attain high performance in Hadoop cluster. The job scheduling algorithms are essential for efficient make use of cluster resources and executing them in short time. The fundamental purpose of this paper is to present a classification of Hadoop schedulers along with their existing scheduling algorithm in Hadoop territory. In addition, this paper paraphrases the features, advantages, disadvantages, and limitations of several Hadoop scheduling algorithms.


  1. Kim, G.-H., Trimi, S., & Chung, J.-H. (2014). Big-data applications in the government sector. Communications
of the ACM, 57(3), pp 78?85.
  1. Suthaharan, "Big data classification: Problems and challenges in network intrusion prediction with machine learning", ACM SIGMETRICS Performance Evaluation Review 41.4, pp. 70-73, 2014.
  2. Yusuf Perwej, ?An Experiential Study of the Big Data,? for published in the International Transaction of Electrical and Computer Engineers System (ITECES), USA,? ISSN (Print): 2373-1273 ISSN (Online): 2373-1281, Vol. 4, No. 1, page 14-25, March 2017, DOI:10.12691/iteces-4-1-3.
  3. Y oo, D., K. M. Sim. ?A Comparative Review of Job Scheduling for MapReduce.? , In: IEEE International Conference on Cloud Computing and Intelligence Systems, 2011, pp. 353-358.
  4. Nikhat Akhtar, Firoj Parwej, Dr. Yusuf Perwej, ?A Perusal Of Big Data Classification And Hadoop Technology,? for published in the International Transaction of? Electrical and Computer Engineers System (ITECES), USA,? ISSN (Print): 2373-1273 ISSN (Online): 2373-1281, Vol. 4, No. 1, page 26-38, May 2017, DOI: 10.12691/iteces-4-1-4.
  5. Khan, M. Li, P. Ashton, G. Taylor, and J. Liu, ?Big data analytics on PMU measurements,? in Fuzzy Systems and Knowledge Discovery (FSKD), 2014 11th International Conference on, 2014, pp. pp 715?719.
  6. Tang, L. Jiang, J. Zhou, K. Li, and K. Li, ? A self -adaptive scheduling algorithm for reduce start time ?, Future Generation Co mputer Systems, 2014.
  7. Rasooli, D. G. Down, "A hybrid scheduling approach for scalable heterogeneous hadoop systems", High Performance Computing Networking Storage and Analysis (SCC) 2012 SC Companion:. IEEE, pp. 1284-1291, 2012.
  8. Guilherme W. Cassales, Andrea S. Char~ao, Manuele Kirsch Pinheiro, Carine Souveyet, Luiz A. Steffenel, Context-aware scheduling for apache hadoop over pervasive environments, Procedia Comput. Sci. 52 (2015) 202?209.
  9. Firoj Parwej, Nikhat Akhtar, Dr. Yusuf Perwej, ?A Close-Up View About Spark in Big Data Jurisdiction,? for published in the International Journal of Engineering Research and Applications (IJERA), ISSN: 2248-9622 (Online), www.ijera.com, Vol. 8, Issue 1, (Part -I1), page 26-41, January 2018.
DOI: 10.9790/9622-0801022641
  1. Yongcai Tao, Qing Zhang, Lei Shi, Pinhua Chen, "Job Scheduling Optimization for Multi-User MapReduce Clusters", 2011 Fourth International Symposium on Parallel Architectures, Algorithms and programming, IEEE, Pg 213-217, 978-0-76954575-2/11, DOI 10.1109/PAAP.2011.33
  2. Songcheng Jin, Shuqiang Yang, Yan Jia, "Optimization of Task Assignment Strategy for Map-Reduce", 2012 2nd International Conference on Computer Science and Network Technology, 978-1-4673-2964-4/12, pg 57-61, IEEE 2012, Changchun, China.
  3. Zhang, C. Wu, Z. Li, C. Guo, M. Chen, and F. C. M. Lau. Moving big data to the cloud: An online cost-minimizing approach. Selected Areas in Communications, IEEE Journal on, vol. 31, no. 12, pp. 2710?2721, 2013
  4. Liu, J. Xu, Z. Liu, X. Liu, Evaluating task scheduling in hadoop-based cloud systems, in: 2013 IEEE International Conference on Big Data, IEEE, 2013, pp. 47?53.
  5. Lisia S. Dias, Marianthi.G. Ierapetritou, Integration of scheduling and control under uncertainties: review and challenges, Chem. Eng. Res. Des. 116 (December 2016) pp. 98?113.
  6. Xu X, Cao L, Wang X. Adaptive task scheduling strategy based on dynamic workload adjustment for heterogeneous Hadoop clusters. IEEE Systems Journal. 2016 Jun;10(2):pp 471- 482
  7. He et al., "Matchmaking: A new MapReduce scheduling technique", Cloud Computing Technology and Science (CloudCom) 2011 IEEE 3rd Int. Conf., 2011.
  8. Qin, H. Jiang, "A dynamic and reliability-driven scheduling algorithm for parallel real-time jobs executing on heterogeneous clusters", J. Parallel Distrib. Comput., vol. 65, no. 8, pp. 885-900, Aug. 2005.
  9. X Bu, J Rao, C Xu, Interference and Locality-Aware Task Scheduling for MapReduce Applications in Virtual Clusters HPDC'13, New York, NY, USA:Copyright 2013 ACM 978?1-4503-1910-2/13/06, pp. 17-21, June 2013.
  10. Kao Yu-Chon, Chen Ya-Shu, Data-locality-aware mapreduce realtime scheduling framework The Journal of Systems and Software, vol. 112, pp. 65-77, 2016.
  11. Chang, M. Kodialam, R. R. Kompella, T. Lakshman, M. Lee, S. Mukherjee, "Scheduling in mapreduce-like systems for fast completion time", IEEE INFOCOM. IEEE, 2011.
  12. Casavant et al., "A taxonomy of scheduling in general-purpose distributed computing systems", Software Engineering IEEE Transactions on 14, 2, pp. 141-154, 1988.
  13. Bellavista et al., "Priority-based Resource Scheduling in Distributed Stream Processing Systems for Big Data Applications", ACM 7th International Conference on Utility and Cloud Computing, 2014.
  14. Rasooli, D. G. Down, "A hybrid scheduling approach for scalable heterogeneous hadoop systems", High Performance Computing Networking Storage and Analysis (SCC) 2012 SC Companion:. IEEE, 1284-1291, 2012.
  15. Zaharia, A. Konwinski, A.D. Joseph, R. Katz and I. Stoica, ?Improving MapReduce performance in heterogeneous environments ? In: OSDI 2008: 8th USENIX Symposium on Operating Systems Design and Implementation 2008.
  16. Nguyen, T. Simon, M. Halem, D. Chapman and Q. Le, ?A hybrid scheduling algorithm for data intensive workloads in a MapReduce environment?, In: Proceedings of the 2012 IEEE/ ACM fifth international conference on utility and cloud computing. Washington, DC, USA: IEEE computer society; UCC'12, 2012, pp. 161-168.
  17. Rasooli, D. Down, "A Hybrid Scheduling Approach for Scalable Heterogeneous Hadoop Systems?, High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion, IEEE International Conference on Salt Lake City, UT, USA, pp. 1284-1291, 10-16 Nov. 2012, DOI: 10.1109/SC.Companion.2012.155
  18. Ya-Wen Cheng, Shou-Chih Lo, "Improving Fair Scheduling Performance on Hadoop?, Platform Technology and Service (PlatCon), IEEE International Conference on Busan, South Korea, 13-15 Feb. 2017,
DOI: 10.1109/PlatCon.2017.7883710
  1. Sharma, A. Ganpati, "Performance evaluation of fair and capacity scheduling in Hadoop YARN?, Green Computing and Internet of Things (ICGCIoT), IEEE International Conference on Noida, India, Pages: 904 - 906 , 8-10 Oct. 2015
DOI: 10.1109/ICGCIoT.2015.7380591
  1. Q Chen, D. Zhang, M. Guo, Q. Deng Q and S. Guo, ?SAMR: a self-adaptive MapReduce scheduling algorithm in heterogeneous environment?, In: The 10th international conference on computer and information technology. IEEE; 2010. p. 2736?43.
  2. Tian, H. Zhou, Y. He, and L. Zha, ?A dynamic MapReduce scheduler for heterogeneous workloads,? in Proceedings of the 2009 Eighth International Conference on Grid and Cooperative Computing-Volume 00, pp. 218?224, IEEE Computer Society, 2009.
  3. Zaharia, D. Borthakur, J.S. Sarma, K. Elmeleegy, S. Shenker and I. Stoica, ? Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling?, In: Proceedings of the fifth European conference on computer systems. New York, NY, USA: ACM; 2010, pp. 265?278.
  4. Qiaomin Xie, Mayank Pundir, Yi Lu, Cristina L. Abad, Roy H. Campbell, Pandas: robust locality-aware scheduling with stochastic delay optimality, IEEE/ACM Trans. Netw. pp.1?14, 2016, doi.org/10.1109/TNET.2016.2606900.
  5. A Kumar, V.K Konishetty, K. Voruganti and G. Rao, ?CASH: context aware scheduler for Hadoop?, In: Proceedings of the international conference on advances in computing, communications and informatics. New York, NY, USA: ACM; 2012. p. 52?61.
  6. Cassales GW, Charao AS, Pinheiro MK, Souveyet C, Steffenel LA, ?Context-Aware Scheduling for Apache Hadoop over Pervasive Environments‖?, The 6th International Conference on Ambient Systems, Networks and Technologies, Procedia Computer Science, Vol- 52, pp. 202 ? 209, ISSN: 1877- 0509, 2015
  7. Kc and K. Anyanwu. "Scheduling hadoop jobs to meet deadlines." Cloud Computing Technology and Sci. (CloudCom), 2010 IEEE 2nd Int. Conf. IEEE, pp. 388-392, 2010.
  8. Dazhao Cheng, Jia Rao, Changjun Jiang, Xiaobo Zhou, Resource and deadline aware job scheduling in dynamic hadoop clusters, in: IEEE 29th International Parallel and Distributed Processing Symposium, 2015.
  9. Polo J, Castillo C, Carrera D, Becerra Y, Whalley I, Steinder M, Torres J, Ayguade E ,? Resource-Aware Adaptive Scheduling for MapReduce Clusters‖ Middleware?, LNCS 7049, pp. 187?207, 2011, ISSN: 0302- 9743.
  10. He, Y. Lu, and D. Swanson, ?Matchmaking: A new Mapreduce scheduling technique,? Cloud Computing Technology and Sci (CloudCom), 2011 IEEE 3rd Int. Conf. on IEEE, pp. 40-47, 2011.
  11. Xiaoyu Sun, Chen He and Ying Lu ?ESAMR: An Enhanced Self-Adaptive MapReduce Scheduling Algorithm?(2012) IEEE 18th International Conference on Parallel and Distributed Systems.
  12. Lei, T. Wo and C. Hu, ? CREST: Towards fast speculation of straggler tasks in MapReduce?, In: The eighth international conference on e-business engineering. IEEE; 2011, pp. 311-316
  13. Hammoud and M. Sakr, ? Locality-aware reduce task scheduling for MapReduce?, In: The third international conference on cloud computing technology and science. IEEE, 2011, p. 570?576.
  14. Ibrahim, H. Jin, L. Lu, B. He, G. Antoniu and S. Wu,? Maestro: replica-aware map scheduling for MapReduce?, In: The 12th international symposium on cluster, cloud and grid computing. IEEE/ACM; 2012, p. 435? 477.
  15. Ahmad, S. Lee, M. Thottethodi and T. Vijaykumar, ? MapReduce with communication overlap (MARCO) ?, J Parallel Distrib Comput, Vol. 73, NO. 5, 2013, pp. 608?628.
  16. Hammoud, M. Rehman and M. Sakr, ?Center-of-Gravity reduce task scheduling to lower MapReduce network traffic?, In: International conference on cloud computing. IEEE, 2012, p. 49?58.
  17. Tang, L. Jiang, J. Zhou, K. Li, and K. Li. "A self-adaptive scheduling algorithm for reduce start time." Future Generation Computer Systems, 2014.
  18. Rasooli and D.G. Down, ?COSHH: A classification and optimization based scheduler for heterogeneous Hadoop systems?, Future Generation Computer Systems, 36, 2014, pp. 1-15.
  19. Zhang, P.G. Harrison, ?Performance of a priority weighted round robin mechanism for differentiated service networks,? In the Proc. of 16th International Conference on Computer Communications and Network (ICCCN), 1198-1203.2007
  20. Cho, M. Rahman, T. Chajed, I. Gupta, C. Abad, N. Roberts and P. Lin, ?Natjam: design and evaluation of eviction policies for supporting priorities and deadlines in MapReduce clusters?, Proceeding SOCC '13 Proceedings of the 4th annual Symposium on Cloud Computing Article No. 6 , doi:10.1145/2523616.2523624
  21. Tan, A. Chin, Z. Z. Hu, Y. Hu, S. Meng, X. Meng and L. Zhang, ?DynMR: dynamic MapReduce with ReduceTask interleaving and MapTask backfilling?, April 2014, EuroSys '14: Proceedings of the Ninth European Conference on Computer Systems, doi:10.1145/2592798.2592805
  22. Wang, D. Shen, G. Yu, T. Nie and Y. Kou, ?A Throughput Driven Task Scheduler for Improving MapReduce Performance in Job-Intensive Environments?, 2013 IEEE International Congress on Big Data, 2013, pp: 211 -218 , doi:10.1109/BigData.Congress.2013.36
  23. Li, Y. Wang, Y. Jiao, C. Xu and W. Yu,?CooMR: cross-task coordination for efficient data management in MapReduce programs?, Proceeding SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, ISBN: 978-1-4503-2378-9, Article No. 42, November 17 - 21, 2013, doi:10.1145/2503210.2503276
  24. Verma, L. Cherkasova and R. H. Campbell, ?ARIA: automatic resource inference and allocation for MapReduce environments?, Proceeding ICAC '11 Proceedings of the 8th ACM international conference on Autonomic computing, ISBN: 978-1-4503-0607-2 ,Pages 235-244, , doi:10.1145/1998582.1998637
  25. Tian, G. Li, W. Yang and R. Buyya, ?HScheduler: an optimal approach to minimize the make span of multiple MapReduce jobs?, The Journal of Supercomputing June 2016, volume 72, Issue 6, pp 2376?2393,
doi:10.1007/s11227-016-1737-4
  1. Elmeleegy. Piranha: Optimizing short jobs in Hadoop. Proceedings of the VLDB Endowment, vol. 6, no. 11, pp. 985?996, 2013.

[Yusuf Perwej. (2018); THE AMBIENT SCRUTINIZE OF SCHEDULING ALGORITHMS IN BIG DATA TERRITORY. Int. J. of Adv. Res. 6 (Mar). 241-258] (ISSN 2320-5407). www.journalijar.com


Dr. Yusuf Perwej
Assistant Professor , Department of Information Technology AL Baha University, AL Baha, Kingdom of Saudi Arabia(KSA)

DOI:


Article DOI: 10.21474/IJAR01/6672      
DOI URL: http://dx.doi.org/10.21474/IJAR01/6672