The earth from space



Becoming a better data scientist: Lessons from academia and industry

· 1 min read

The start, scope and design of your project

Any good data science project starts with the end goal in mind. The number of possible analyses for a given data set is potentially limitless, and without a clear goal, you’re at risk of getting lost in the woods. Academia and industry have different goals and different takes on what the purpose of your research should be. Both takes can help you clarify the scope and direction of your research.

The goal of academic science is to increase knowledge by establishing statements as fact. To achieve this, every research project is guided by a carefully formulated hypothesis. Put very briefly, you start with a theory (how do you expect the world to work?) and operationalize it (what would this look like in observable data?), with a check for falsifiability (could the observable data also discredit this idea?). Hypotheses are great in part because they set a very clear goal for your work. A well-crafted hypothesis can go a long way towards informing your experimental design.

Not all industry data science projects lend themselves to formulating a hypothesis. For those that do, it can be a great tool to determine scope. A good hypothesis will naturally inform (and constrain!) the type of data to collect and the types of analysis to perform.

Read the blog of our colleague Marrit Zuure here.

Marrit Zuure data scientist at Orikami