Skip to main content

Is Data Science a Buzzword? aka: My first Coursera Course

Data science and data scientists are all the rage right now in the information technology space. Every company wants one; every job candidate touts they are one. But what actually does that mean to companies and potential employees? I decided to take a course on data science to see if I could find out!

My co-worker, Gabriella Melki, recommended the Coursera Data Science specialization by John Hopkins Bloomberg School of Public Health. The entire specialization contains a set of 9 courses, but you can take each one individually. I started with the first course, called "The Data Scientist's Toolbox". Over the four week timeframe, I was able to view lectures and perform the assignments at my own pace. I've listed below my thoughts on the course and what I learned about data science.

Week 1: Introduction to Data Science
Data science is about data, specifically about answering questions, and science, following a method to discover an answer. A data scientist is the person that uses data to answer questions. The reason data scientists are such a commodity is because people either can't find the data they need or there is so much data, it is difficult to wade through it all. There are a variety of tools available to help with this task. A common tool is the R language, which is what the rest of the course used. The course teaches the students to accomplish tasks on their own, such as suggesting good places to research R functions (help.search, anyone?) and recommending the use of online forums like StackOverflow to get help on confusing errors. The first week wrapped up with an overview of all the courses in the specialization and a few questions to make sure you were paying attention.

Week 2: Learning the Tools
I used to program about 10 years ago, but have focused more on SQL in recent years. So I had a bit of a learning curve when it came to the tools needed to work with R programs. Week 2 of the course walked the students through installing and the basics on each of these tools. Specifically, I learned about Git Bash, Git, GitHub, R packages, and RStudio. Although it seemed a little overwhelming at first, the course went step-by-step through each installation and explained each of the commands needed to use each of the tools. At the end of Week 2, we ran some of the commands to show our mastery of the commands taught.

Week 3: Understanding Questions, Data, and Approach
As we learned in Week 1, data science is all about trying to answer questions with data. Based on the data you have and the answer you need, you may ask different questions and use different approaches. For example, you may just want to describe the data, to make a prediction about the information, or more. Data comes in all shapes and sizes: qualitative versus quantitative, large versus small, confounding versus predicting. The most important thing is to ask the right question first, design a logical experiment, and then investigate the data to find the answer. Be careful not to force your results through experious correlations or data dredging! You don't want just any answer - you want the most accurate one. At the end of the third week, we completed a project to show our full understanding of the process and tools that we had learned over the past three weeks.

Week 4: Completing the Project
Week 4, the last week of the course, we checked everyone else's work. This grading exercise was also interesting to me. They asked every student to check over the work for at least three other students. Then they assigned a grade based on those results. I'm guessing they have some way to verify the information - maybe kick out the highest/lowest values and average the rest? Or kick out the students' results with the most variety in their scores?  Sounds like a good data science problem.  I'm not sure of the answer, but I completed the course with a 100%!

image
If you're interested in taking the data science course described above, visit here: https://www.coursera.org/course/datascitoolbox

Comments

Anonymous said…
efir grand casino slots
efir grand 바카라 casino slots. 메리트 카지노 쿠폰 online roulette casino slots. kadangpintar efir grand casino games for real money, free online casino game at efirgrand casino
DivyaGowda said…
This comment has been removed by the author.
deekshitha said…
I wholeheartedly congratulate the writer of this post for explaining the difficult concepts of data science in a simple and easy-to-understand manner. My only regret is that I didn’t read this post earlier. I have made many career decisions in my life after reading this, and have no regrets whatsoever to date.best data science institute in nashik with placement
curtains rods said…
Blackout curtains really help with reducing noise as well!

Popular posts from this blog

SQL Server 2016 versus 2014 Business Intelligence Features

Hello, SQL Server 2016 Yesterday, Microsoft announced the release of SQL Server 2016 on June 1st of this year: https://blogs.technet.microsoft.com/dataplatforminsider/2016/05/02/get-ready-sql-server-2016-coming-on-june-1st/ .  Along with performance benchmarks and a description of the new functionality, came the announcement of editions and features for the next release. Good-bye, Business Intelligence Edition The biggest surprise to me was the removal of the Business Intelligence edition that was initially introduced in SQL Server 2012.  Truthfully, it never seemed to fit in the environments where I worked, so I guess it makes sense.  Hopefully, fewer licensing options will make it easier for people to understand their licensing and pick the edition that works best for them. Feature Comparison Overall, the business intelligence services features included with each edition for SQL Server 2016 are fairly similar to SQL Server 2014.  Nothing has been "...

Manipulating Excel Spreadsheets in SSIS

Tom, an attendee at last weekend’s SQLSaturday Olympia , asked me how to refresh a spreadsheet from within SQL Server Integration Services. My first thought was to turn on the connection’s “Refresh data when opening the file” option in the spreadsheet itself and avoid the situation entirely; however, this may not always be a viable solution. Here are the steps to perform the refresh from within an SSIS package. First, ensure that Microsoft.Office.Interop.Excel is registered in the GAC. If not, install the 2007 Microsoft Office system Primary Interop Assemblies . This will need to be done on any machine where you plan on running this package. Next, create a script task in your SSIS package that contains the following code (include your spreadsheet name): Imports System Imports System.Data Imports System.Math Imports Microsoft.SqlServer.Dts.Runtime Imports Microsoft.Office.Interop.Excel Public Class ScriptMain Public Sub Main() Dts.TaskResult = Dts.Results.Success Dim excel...