hadoop - log file parsing in hcatalog regex or serde -


i pretty new hadoop.

i trying load log file hcatalog. following format of log file.

time: 2014-10-28 06:32:34z userid: arun groupid: admin page: welcome.aspx message: login successful  time: 2014-10-28 06:32:34z userid: arun groupid: admin page: main.aspx message: menu load .. .. 

do need write serde parse or can achieved via regex?

i belive load external log files hive tables wherein hive metastore managed hcatalog service.

if so, first analyze source log records fixed delimiter hive parse records required n number of columns, tab (\t) char help.

the next option achieving column parsing source log records using hive regexserde class relevant regular expression.

if regex parsing not feasible other option create custom hive serde class parse source log file records . of custom serde class, hive able fit delimited cells relevant columns of hive external table.

please refer,

http://docs.aws.amazon.com/gettingstarted/latest/emr/getting-started-emr-load-data.html

apache hive regex serde: data types

http://blog.cloudera.com/blog/2012/12/how-to-use-a-serde-in-apache-hive/


Popular posts from this blog

c# - ODP.NET Oracle.ManagedDataAccess causes ORA-12537 network session end of file -

matlab - Compression and Decompression of ECG Signal using HUFFMAN ALGORITHM -

utf 8 - split utf-8 string into bytes in python -