Data science friendly
Making Kaggle Submissions with DSS
As a non - data scientist, i was curious to see how DSS could help me with the data preparation (cleaning and combining data), feature engineering and predictive modelling phases of a data analysis project
My goal was to make 2 submissions on Kaggle challenges in under 1 hour and without 1 line of code using the Data Science Studio (Titanic and Otto Product Classification datasets).
First, I was really impressed with the overall ease of use and ergonomy of the studio. Building "recipes" for data preparation mostly uses visual processors and the operations are visible directly on a sample of the data, facilitating validation of preparation steps.
In a train / test scenario, i especially enjoyed being able to replicate my recipes on both datasets very easily.
I used the Data Visualization tool to build a few exploratory charts, which can be done quite easily, though it is not as powerful as specialized tools (namely Tableau or Qlik).
For the machine learning part, I restricted myself to visual machine learning in the studio, which already packs the most common algorithms (random forest, logistic, svm, gradient-boosting...). I found the ability to benchmark and compare algorithms performance quickly a great time saver, allowing me to reach a first score in under half an hour on each dataset.
Once I chose the best model, I only needed a few clicks to use the model to prepare and score the Test Dataset and make my submissions. Both times I was in the lower half of the rankings but above Kaggle algorithmic benchmarks.
For "real" Data Scientists and engineers, the Studio allows them to go much further by building recipes and models in R, Python, SQL, Hive, Pig etc...but even as a business analyst, I felt empowered by the software that enabled me to prepare, analyse and build simple predictive models with my data.