hadoop - log file parsing in hcatalog regex or serde -
i pretty new hadoop.
i trying load log file hcatalog. following format of log file.
time: 2014-10-28 06:32:34z userid: arun groupid: admin page: welcome.aspx message: login successful time: 2014-10-28 06:32:34z userid: arun groupid: admin page: main.aspx message: menu load .. ..
do need write serde parse or can achieved via regex?
i belive load external log files hive tables wherein hive metastore managed hcatalog service.
if so, first analyze source log records fixed delimiter hive parse records required n number of columns, tab (\t) char help.
the next option achieving column parsing source log records using hive regexserde class relevant regular expression.
if regex parsing not feasible other option create custom hive serde class parse source log file records . of custom serde class, hive able fit delimited cells relevant columns of hive external table.
please refer,
http://docs.aws.amazon.com/gettingstarted/latest/emr/getting-started-emr-load-data.html
apache hive regex serde: data types
http://blog.cloudera.com/blog/2012/12/how-to-use-a-serde-in-apache-hive/