Dead Reckoning: A Framework for Data Science
Before satellites could accurately estimate the global position of an object within a few meters, sailors needed a way to determine where they were in a vast, remote, and otherwise unremarkable seascape. Their answer to the problem was data science, an early version of the practice to be sure, but data science nonetheless. They did not have R or Python scripts running code in the background, but they did develop specialized computers. They did not have pivot tables or relational databases, but they did have nautical charts. They did not have sophisticated statistical models, but they did have Euclidean geometry. With these arguably simple tools, ocean navigators were able to take patterns in observable phenomena and infer a ship’s location. They applied models that allowed them to make accurate predictions in the absence of precise measurements, and they revised their course predictions if and when new data became available.
The most famous of the data-driven navigational approaches is arguably celestial navigation. Celestial navigation required accurate and comprehensive cataloguing of the night sky at different latitudes and during different seasons. The creation of accurate celestial charts was an important scientific and commercial endeavor during the Age of Discovery. Combined with the introduction of basic computers such as the sextant, mariners could estimate nautical positioning with an impressive degree of accuracy.
There were some notable drawbacks to celestial navigation, though. For one, sailors needed accurate historical celestial data. For another, maximizing the utility of celestial data required training, expertise, and basic literacy, which could be hard to come by in the modal 16th century sailor. Celestial navigation would return more accurate geolocation than alternatives, but it did so at a high resource cost. Smaller ships with more modest shipping routes had to rely on other, simpler navigation strategies like dead reckoning. The simplicity of dead reckoning belies its power in some ways, and it remains a useful navigation tool in information-poor settings.
To use dead reckoning all you need are three things:
- Point of Origin/Reference
- Data Collection Procedures
- Mathematical Model
In a data science context, I conceptualize the point of origin as a company’s current state and recent history. Understanding where a company is and where it has been help define the starting point of any data science initiative. It is not impossible to reach a destination when you do not know where you started, but it is infinitely more difficult. Using data to add value to a company can be challenging enough; there is no need to make this work harder by avoiding this important first step, even if it is easy to get overly excited at digging into the data.
Once we have a solid understanding of the current factors that influence relevant performance targets, the next step is to set a goal and define the procedures for meeting that goal. In practice, this stage requires operationalizing data collection procedures. Addressing this step in the process often requires the development and implementation of pipelines as well as mapping of any data extracted onto broader modeling and data-use aims. Ongoing communication with company stakeholders is critical here as we are building the data infrastructure they will rely on to make decisions and deliver value to their customers going forward.
If a data science solution also requires prediction, I work with clients to develop an appropriate mathematical model of their data. The “star” of data science is often the predictive models, but there is an old adage in statistics and data science: “Garbage in, garbage out.” Putting aside the challenge of selecting an appropriate modeling approach, rushing to a modeling solution before fully preparing your data is a recipe for disaster, which is why this step is often the last and, ironically, can be the least challenging aspect of implementing a data science product.
There is no guarantee that a given problem has a data science solution, but applying the principles of dead reckoning is an effective way to make seemingly intractable data problems tractable. I stand ready to help your team harness the power of your data and chart a course through this increasingly data-driven world. If there is a data science solution for your use case I will work to help maximize your chances of finding it.