Monday, February 4, 2019

big data interview questions


  • Hadoop Interview Questions


What commands do you use to start Hadoop?
A:start-dfs.sh and start-yarn.dfs
How do you copy a local file to the HDFS
hadoop fs -put filename /(hadoop directory)
Map is MapReduce?
Map takes an input data file and reduces it to (key->value) pairs or tuples (a,b,c,d) or other iterable structure. Reduce then takes adjacent items and iterates over them to provide one final result.
What does safemode in Hadoop mean?
It means the datanodes are not yet ready to receive data. This usually occurs on startup.
How do you take Hadoop out of safemode?
hdfs dfsadmin -safemode leave
How do you add a datanode?
You copy the whole Hadoop $HADOOP_HOME folder to a server. Then you set up ssh keys so that the Hadoop user can ssh to that server without having to enter a password. Then you add the name of that server to$HADOOP_HOME/etc/hadoop/slaves. That you run hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start datanode on the new data node
What is linear regression?
A: This is a technique used to find a function that most nearly matches a set of data points. For example if you have one independent value x and one dependant variable y then linear regression will calculate the y = mx + b where m is the slope and b is the x intercept. This is used in predictive data models. This is used to find a correlation between variables, for example whether studying more (x) increases student grades (y).
What are the main Hadoop config files?
A: hdfs-site and core-site.xml
What file types can Hadoop use to store its data?
A: Avro, Parquet, Sequence Files, and Plain Text
How can you call an external program from Hive, like a Python one:
A: Use TRANSFORM like SELECT TRANSFORM (fields) USING 'python programName.py' as (fields) FROM table;
What does the fsck command do?
A: It checks for bad blocks (i.e., corrupt files) and problems with replication
http://www.bmcsoftware.com.tr/guides/hadoop-interview-questions.html