Optimizing Data Processing Workflows Using Apache Spark on Cloud Platforms

Notification

Announcement!

ISJEM Invites papers for various areas like engineering, Management, Science & other multi discplinary subjects. Please submit your paper for review.

ISJEM assigns a digital object identifier (DOI) to each published paper, making it easier for the paper to be cited in various major databases like Google Scholar, ResearchGate, Academia.edu, etc…

ISJEM takes 24–48 hours to publish a research paper. Within 24 hours, the submitted paper will be reviewed and notified of its status, and it will be published once the processing fee is successfully received.

Optimizing Data Processing Workflows Using Apache Spark on Cloud Platforms

Version

Download 5

File Size 404.73 KB

File Count 1

Download

or download free

Manuscript Title

Optimizing Data Processing Workflows Using Apache Spark on Cloud Platforms

Authors:

Santosh Vinnakota

Software Engineer Advisor

Tennessee, USA

Santosh2eee@gmail.com

Abstract—Apache Spark has emerged as a dominant framework for big data processing, offering scalability, fault tolerance, and ease of use. Cloud platforms provide on-demand scalability and flexibility for Spark workloads. This paper explores techniques for optimizing data processing workflows using Apache Spark on cloud platforms such as AWS, Azure, and Google Cloud. We discuss resource allocation, data partitioning, caching strategies, cost optimization, and performance tuning to achieve efficient data processing. Through real-world use cases and benchmarks, we highlight best practices for enhancing Spark performance on cloud environments.

Keywords—Apache Spark, Cloud Computing, Data Processing, Optimization, Distributed Computing.

[changelog]