There is always something new being said about Big Data on a daily basis. From large enterprises to new start-up ventures, Big Data is in everyone’s bucket list. It is the utility of Big Data across all industries and domains which makes it so popular. When something as huge as Big Data is in question, the news and information doing the rounds are bound to be slightly unusual. The latest piece of news is that companies are keeping a track of the pregnancies of their employees using Big Data. Legally, keeping a vigil on employee pregnancy is not wrong but the practice is being questioned on the ethical grounds. There are certain organizations which gather the employees’ medical information and then, use that data to identify whether the worker is pregnant or trying to conceive. Companies like Walmart and Time Warner are supposed to be the biggest users in terms of gathering medical data of their employees. The medical data collected can also determine about an employee’s general health conditions, for example: if someone is diabetic or needs a surgery etc. Ethical or not, such data has got to be a great influence in the decisions of an organization.
Another news that’s catching the eye in the circuit is of Big Data getting a new open-source project in form of Apache Arrow. There’s nothing new in talking about the significant influence of Hadoop, Spark and Kafka on the world of Big Data and now, there is Apache Arrow to take the baton even further. There are certain points which we must take notice of about this latest project. Firstly, since it is based on code from related Apache Drill, Apache Arrow can benefit the performance by improving the analytical workload. Also, it enables multi-system workloads by eliminating cross-system communication overhead. Developers from previous Apache big data projects like Cassandra, Drill, Hadoop, HBase, Spark and Storm have committed to Apache Arrow. Jacques Nadeau, vice-president of the new project as well as Apache Drill, said in a statement that, “The open-source community has joined forces on Apache Arrow. We anticipate the majority of world’s data will be processed through Arrow within the next few years,” he stated.
There is some other interesting stuff about Apache Arrow that we must observe. In most of the workloads, 70 to 80 per cent of CPU cycles are spent in serializing and deserializing the data. Arrow takes away that burden and enables the data to be shared among systems without the need of serialization and deserialization. Arrow is also capable of handling complex data with dynamic schemas in addition to traditional data.
Now that we have spoken about the news on Big Data front, let us take a look at some of the data science skills that will be in demand during 2016. It is assumed that any professional who is qualified with data science skills can secure a well-paying job. Having said that, there are certain skills whose demand is predicted to be quite high during 2016. Analysing the job postings on Linkedin, it has been found that SQL specialists are at a beneficial place right now. Besides SQL, other data science skills which will be in demand during this year will be Hadoop, Python, Java and R. Professionals skilled on any of these technologies are predicted to have a promising career graph in the year 2016.
Big Data has been the “it word” in the information technology industry for a while now. This is the reason why data science skills are also in high demand. If you are a student of information technology or an experienced IT professional who is looking for a change, then no other option would prove to be as beneficial as Big Data Analytics and Data Science. With so much happening in this circuit, there is an immense flow of opportunities. Also, the money is really good.
At Cognixia, we have the finest training programs on Big Data and its associated technologies. This is the best time to take the leap. For further information, contact us
Tag : administrator, apache arrow, big data, big data and hadoop, developer