Analysis of Apache Logs Using Hadoop and Hive

Velinov, Aleksandar and Zdravev, Zoran (2018) Analysis of Apache Logs Using Hadoop and Hive. Tem Journal, 7 (3). pp. 645-650. ISSN 2217-8309 (Print)

[img]
Preview
Text
TemJournalAugust2018_645_650.pdf - Published Version

Download (285Kb) | Preview
Official URL: http://www.temjournal.com

Abstract

In this paper we consider an analysis of Apache web logs using Cloudera Hadoop distribution and Hive for querying the data in the web logs. We used public available web logs from NASA Kennedy Space Center server. HDFS (Hadoop distributed file system) was used as a logs container. The apache web logs were copied to the HDFS from the local file system. We made an analysis for the total number of hits, unique IPs, the most common hosts that made request to the NASA server in Florida, the most common types of errors. We also examined the ratio between the number of rows in the logs and the time of execution.

Item Type: Article
Subjects: Natural sciences > Computer and information sciences
Divisions: Faculty of Computer Science
Depositing User: Aleksandar Velinov
Date Deposited: 30 Aug 2018 13:19
Last Modified: 30 Aug 2018 13:19
URI: http://eprints.ugd.edu.mk/id/eprint/20344

Actions (login required)

View Item View Item