Skip to main content

Is Data Science a Buzzword? aka: My first Coursera Course

Data science and data scientists are all the rage right now in the information technology space. Every company wants one; every job candidate touts they are one. But what actually does that mean to companies and potential employees? I decided to take a course on data science to see if I could find out!

My co-worker, Gabriella Melki, recommended the Coursera Data Science specialization by John Hopkins Bloomberg School of Public Health. The entire specialization contains a set of 9 courses, but you can take each one individually. I started with the first course, called "The Data Scientist's Toolbox". Over the four week timeframe, I was able to view lectures and perform the assignments at my own pace. I've listed below my thoughts on the course and what I learned about data science.

Week 1: Introduction to Data Science
Data science is about data, specifically about answering questions, and science, following a method to discover an answer. A data scientist is the person that uses data to answer questions. The reason data scientists are such a commodity is because people either can't find the data they need or there is so much data, it is difficult to wade through it all. There are a variety of tools available to help with this task. A common tool is the R language, which is what the rest of the course used. The course teaches the students to accomplish tasks on their own, such as suggesting good places to research R functions (help.search, anyone?) and recommending the use of online forums like StackOverflow to get help on confusing errors. The first week wrapped up with an overview of all the courses in the specialization and a few questions to make sure you were paying attention.

Week 2: Learning the Tools
I used to program about 10 years ago, but have focused more on SQL in recent years. So I had a bit of a learning curve when it came to the tools needed to work with R programs. Week 2 of the course walked the students through installing and the basics on each of these tools. Specifically, I learned about Git Bash, Git, GitHub, R packages, and RStudio. Although it seemed a little overwhelming at first, the course went step-by-step through each installation and explained each of the commands needed to use each of the tools. At the end of Week 2, we ran some of the commands to show our mastery of the commands taught.

Week 3: Understanding Questions, Data, and Approach
As we learned in Week 1, data science is all about trying to answer questions with data. Based on the data you have and the answer you need, you may ask different questions and use different approaches. For example, you may just want to describe the data, to make a prediction about the information, or more. Data comes in all shapes and sizes: qualitative versus quantitative, large versus small, confounding versus predicting. The most important thing is to ask the right question first, design a logical experiment, and then investigate the data to find the answer. Be careful not to force your results through experious correlations or data dredging! You don't want just any answer - you want the most accurate one. At the end of the third week, we completed a project to show our full understanding of the process and tools that we had learned over the past three weeks.

Week 4: Completing the Project
Week 4, the last week of the course, we checked everyone else's work. This grading exercise was also interesting to me. They asked every student to check over the work for at least three other students. Then they assigned a grade based on those results. I'm guessing they have some way to verify the information - maybe kick out the highest/lowest values and average the rest? Or kick out the students' results with the most variety in their scores?  Sounds like a good data science problem.  I'm not sure of the answer, but I completed the course with a 100%!

image
If you're interested in taking the data science course described above, visit here: https://www.coursera.org/course/datascitoolbox

Comments

Anonymous said…
efir grand casino slots
efir grand 바카라 casino slots. 메리트 카지노 쿠폰 online roulette casino slots. kadangpintar efir grand casino games for real money, free online casino game at efirgrand casino
Alfie Solomons said…
Thanks for sharing such an informative Article. It will be beneficial to those who seek information. Continue to share your knowledge through articles like these.

Data Engineering Solutions 

Artificial Intelligence Services

Data Analytics Services

Data Modernization Services

Popular posts from this blog

Manipulating Excel Spreadsheets in SSIS

Tom, an attendee at last weekend’s SQLSaturday Olympia , asked me how to refresh a spreadsheet from within SQL Server Integration Services. My first thought was to turn on the connection’s “Refresh data when opening the file” option in the spreadsheet itself and avoid the situation entirely; however, this may not always be a viable solution. Here are the steps to perform the refresh from within an SSIS package. First, ensure that Microsoft.Office.Interop.Excel is registered in the GAC. If not, install the 2007 Microsoft Office system Primary Interop Assemblies . This will need to be done on any machine where you plan on running this package. Next, create a script task in your SSIS package that contains the following code (include your spreadsheet name): Imports System Imports System.Data Imports System.Math Imports Microsoft.SqlServer.Dts.Runtime Imports Microsoft.Office.Interop.Excel Public Class ScriptMain Public Sub Main() Dts.TaskResult = Dts.Results.Success Dim excel

Reporting Services 2008 Configuration Mistake

To start working with the management side of SQL Server Reporting Services 2008, I decided to set up a report server and report manager. Unfortunately, I made a mistake while setting up my configuration that left me a little perplexed. Here are the steps I took to cause, track down, and solve the issue. Problem: I began by opening the Reporting Services Configuration Manager from the Start Menu. I clicked through each of the menu options and accepted the defaults for any question with a warning symbol, since warning symbol typically designate an action item. After two minutes, all of the warning symbols had disappeared, and I was ready to begin managing my report server. Unfortunately, opening up a browser and trying to open up the report manager resulted in the dreaded " The report server has encountered a configuration error. (rsServerConfigurationError) " message. Sherlock-ing it: I put on my sleuthing hat and went to the log file directory: C:\Program Files\Micros