Agile Methodology, Agile Methodology in Data Science & How to split Machine Learning Stories, Data Science, machine learning

Agile Methodology in Data Science & How to split Machine Learning Stories

When it comes to software development, and to producing deliverables more broadly, the agile manifesto and its corresponding agile methodology have been powerful tools and frameworks that allow teams to navigate tasks and processes more freely and efficiently, adapting as necessary to the dynamic conditions of the workflow. However, there are certain places where applying these pillars of agility can be rather tricky, and one of these tricky spaces is Machine Learning and Data Science.

 

Rise of Machine Learning

Machine learning is a topic that quickly went from some futuristic concept to a fully functional process, revolutionizing the Data Science discipline in the process. Traditionally, more rigid and computational statistical methods build the basis of Data Science, making any sort of data-analytical project a more-or-less hands-on experience and effort. However, as the global data pool continued to grow exponentially, these rigid and traditional methods would no longer be enough to “crunch the numbers.”

Enter machine learning. With some carefully constructed “learning” algorithms, some well-tuned parameters, and quite a bit of computational power, Data Science and data analysis stepped into a brand-new world of possibilities. Machine learning is a powerful process that can shift through previously unfathomable amounts of data very quickly, and draw relevant conclusions and patterns afterwards.

This power comes with caveats, however. The process is much more fluid, much less well-defined, and that can make it difficult to manage in a developmental sense and in an organizational sense. Agile methodology is typically applied to revolutionize traditional workflow structures, but machine learning is far from traditional.

 

Introducing Agility to ML Development

When the research, development, and refinement processes are all ambiguously overlapping, it can be tough to fit a reasonable plan-of-action to the timeline—and much less an agile one. Instead, when it comes to machine learning, we need to chunk relevant and related processes together even if they would traditionally be separated into different incremental sprints. For example, and as mentioned, research and development no longer happen in a linear order.

“Incremental sprints” might not look the same way within the feedback-loop-context of ML development, and the final product of the process is impossible to predict due to the nature of the tool. Thus, it can make more sense to split the process up into much broader chunks:

          Research

          Adjustment

          Refinement

Agility is crucial within these three chunks, since the structure is rather vague by necessity. For the research chunk, it is often blocked off into two processes. First, the initial data analysis. While the hard data analysis will be run further down the line, it’s important to know what you’re dealing with from the start. Finding good and reliable data sources and researching what your data is and more or less how it looks is a crucial first step. Then, the team has to find an appropriate algorithm. This second process involves a lot of testing, trial and error, fitting, and adjusting from start to finish. Here, a strong sense of teamwork, communication, and certainly agility are all essential to making sure the process runs quickly and smoothly, since this stage will almost certainly be unique to each project.

Once the algorithm(s) has been selected, the strong adjustment process begins. If the selection was sharp enough, this process of training and fitting should be relatively straightforward, but complications can still arise. New information or updated client specifications might mean adjustments to the selection, and so being able to react on your feet is still important.

Finally, when the dust has settled, studying, interpreting, and finding your findings is what makes data science a science in the first place.

The machine learning agile development process can be a bit tricky to describe since both components are highly contextual and fluid, but it’s precisely those properties that make an agile approach so important in the first place. The most important takeaway with this ML thesis is that, true to the name, when it comes to applying agility in unfamiliar contexts, it’s important to adjust. 

Living Pono is dedicated to communicating business management concepts with Hawaiian values. Founded by Kevin May,  an established and successful leader and mentor, Living Pono is your destination to learn about how to live your life righteously and how that can have positive effects in your career. If you have any questions, please leave a comment below or contact us here. Also, join our mailing list below, so you can be alerted when a new article is released.

Leave a Reply

Your email address will not be published.