Hadoop is a software system developed by Apache that allows a company’s data science team to process for analytical purposes large sets of data that are located on distributed servers. The software framework is mainly used by those companies that want the capability of extracting unstructured data to improve