Prefabricated Peace of Mind? Enabling Machine Learning

To explain insanity, Albert Einstein used the example of a person who keeps making the same mistake over and over again – yet each time expects a different outcome. Implicit in this is the anti-strategy of doubling down on bad behavior in spite of clear evidence of its diminished efficacy. The behavior is still useful, to be sure, but – if you were to graph its efficacy over time – you’d see a line clearly trending downward.

Unfortunately, some organizations are making the same mistake with their IT strategies. Their existing technologies and practices just aren’t working as well as they used to. They’re ill-suited for new business requirements and use cases. They limit the business’s ability to pursue new opportunities: to launch new products and services, to push into new markets, to expand into new territories. They constrain the scope, and the success, of its digital transformation efforts.

At the same time, the sheer profusion of new technologies and practices can seem confusing, if not daunting – especially to small- and medium-sized enterprises. It’s no wonder some are vacillating.

They don’t know how or where to begin.

Analyst Eric Kavanagh captured their confusion in a recent episode of The Briefing Room, a regular series of sponsored webinars produced by IT research firm the Bloor Group. “How do you build out a stack to leverage machine learning and artificial intelligence?” Kavanagh asked. He compared the challenge of standing up a software stack to support machine learning (ML) and artificial intelligence (AI) engineering to that of bootstrapping a stack for streaming analytics.

“There were several big vendors that came out in the streaming analytics space probably seven years ago, and what they all found … [was that] a lot of companies were not ready to take advantage of this technology because their stack just wasn’t prepared,” he told viewers. “So, they kind of pivoted: these companies tried to articulate and evangelize how [companies could] get ready to leverage these technologies.”

Another thing several vendors did was to put together prebuilt, preconfigured software stacks designed for streaming analytics. These full-stack “solutions” incorporated everything – e.g., operating system software, middleware, apps and services, supporting frameworks, etc. – into a single, managed offering. “They gave you everything you would need to do streaming analytics. Everything was preconfigured. What was great about it was that it was fully managed, so you didn’t have to worry about patching software or managing dependencies,” Kavanagh said, in an offline exchange. “It’s as close to an out-of-the-box solution as you can get for something that’s inherently complex.”

Kavanagh alluded to a similar push among vendors that market full-stack stream-processing or ML/AI engineering platforms and services. These vendors claim that their prebuilt software stacks help customers cut through the confusion, and the sheer profusion, of the New. They say their prebuilt stacks logically arrange the building blocks of an application, use case, or compute paradigm into a functional, ready-to-use configuration. As with the platform-as-a-service (PaaS) cloud, the vendor itself manages and maintains the prebuilt stack “solution.” The stack lives in public cloud infrastructure – usually in an infrastructure-as-a-service (IaaS) context – or in the business’ own data center.

So, for example, a stack designed for ML/AI engineering would comprise an operating system substrate, a middleware layer, the software platforms, frameworks, runtimes, etc. that support critical apps and services (along with their associated workflows), and, of course, the apps and services themselves.

It’s a compelling vision, even if, as with the platform-as-a-service (PaaS) cloud, it doesn’t absolve businesses of the responsibility to build and maintain the core IP – i.e., the data engineering logic, ML models, rule-based logic, and so on – that supports their ML/AI apps and services in production.

The sheer, staggering profusion of the New

What is most compelling about the prebuilt software stack is that it eliminates much of the uncertainty – the strangeness – that is always a factor when businesses decide to adopt and implement new technologies.

So, for example, a pre-built stream-processing stack obviates questions that might stymie or confuse IT. These include: Which core enabling technologies are required to “do” stream processing? How do these enabling technologies relate to one another? Depend on one another? Most important, how do they relate to – i.e., integrate with – the business’ existing tools, processes, and practices? How does all of this stuff work together? How, if at all, is it in conflict?

There’s something else, too. In the present context, IT’s natural confusion about any new technology is compounded by the Cambrian-like explosion of novel compute paradigms, tools, techniques, and, not least, practices. Even if IT is clear about how to stand up and support a data science, ML, or AI practice, it is less clear about how to integrate these practices with other disciplines – such as DevOps or site reliability engineering (SRE). Nor does IT know what to do with its legacy assets and practices. Which should it keep? Which can it safely get rid of? Again, which bits of the New conflict with which bits of the Old – and why?

The sheer profusion of the New has inculcated a kind of hesitancy among both information technologists and business decision makers, argued guest expert Harish Doddi, founder and CEO of Datatron, a company that develops a software platform used to deploy, monitor, and maintain ML models in production. As Doddi duly noted, almost all large organizations are standing up DataOps or MLOps practices. And proud we are of all of them. However, he told Kavanagh, organizations sometimes find it challenging to assimilate these cutting-edge practices to their existing IT practices.

Okay, you built it – now what do you do with it?

Doddi cited container virtualization – and container orchestration, in particular – as a case in point.

“There’s quite some hesitancy of adopting container frameworks because a lot of these frameworks have [yet] to be accepted by the internal enterprise architecture teams,” he told Kavanagh.

The reason this matters is that you can encapsulate part of the lifecycle of an ML app or service in a container.

So, for example, ML models, along with their data engineering and application logic, can be instantiated in virtual containers. These containers can be deployed as production-ready apps or services. An orchestration platform – usually Kubernetes (K8s) – can be used to spawn, terminate, or restart containerized apps or services. It can also spawn new, separate instances of apps or services in response to event-driven triggers, such as API calls or alerts. Container-level abstraction and encapsulation not only simplifies the design and versioning of ML models, but also consolidates other essential lifecycle activities – e.g., deployment, provisioning, and maintenance – into a single context: everything is encapsulated in the virtualized container, which is controlled by the orchestration platform. Best of all, the container itself is a blank slate: it is at once reusable and perfectly replicable; it behaves exactly the same way each and every time it spawns.

But what if an organization is not already using containers and, especially, container orchestration software? What if its existing IT policies, processes, and controls were not designed with containers in mind? Or, more likely, what if its siloed data science and MLOps teams prefer to use their own policies, processes, and controls? What if these are not exactly congruent with those of IT operations and (e.g.) its supervisory change control board? What if IT operations distrusts the technologies – i.e., the platforms, frameworks, runtimes, libraries, etc. – which data scientists and ML engineers use to design apps and services for production? At a minimum, this complicates the deployment of ML-infused software.

“Generally, data science teams operate as a siloed organization, and they leverage all sorts of tools, frameworks, and languages out there to build their models,” Doddi pointed out. “But when it comes to taking these models into a production environment … if they can adopt these container[-specific] practices, ahead of time, that’s going to remove quite a lot of headache … both from an operational standpoint … as well as keeping the data science teams happy, in terms of how fast they can go.”

Prefabricated peace of mind?

Doddi has a dog in this race, of course: Datatron’s platform is basically a prefab stack that experts can use to design and manage the ML models that power different kinds of “lite” AI apps and services. It provides built-in tools for deploying, monitoring, maintaining, and replacing these models. It provides a way for organizations to define and enforce policies with respect not only to model development, but to the performance – i.e., the consistent, replicable, and accurate results – of models in production. In this respect, Datatron provides the kinds of formal, standard controls that both internal auditors and external regulators tend to look for. This helps grease the skids for deploying ML apps and services in production.

With this caveat in mind, however, the problem Doddi describes is real. It gets at a disconnect between an organization’s ambitions – e.g., what it would like to be, what it would like to do – and its necessary commitments, not only with respect to its existing policies and processes, but to its technical debt, too.

Another Briefing Room participant, Javier Perez, zeroed in on precisely this problem. Perez is chief evangelist for open source software (OSS) with Perforce Software, a provider of managed DevOps software and services. Perforce, too, has an interest in telling a story of this kind: under the auspices of its OpenLogic brand, it markets a series of managed stack offerings for DevOps, stream processing, K8s, AI/ML, and other use cases. Perforce says its OpenLogic offerings help to accelerate the deployment of these technologies, as well as to simplify their maintenance.

A related problem is that businesses have their hands full with the New as it is. Most are already struggling to make sense of seemingly disjunctive changes in software architecture and software engineering. The second decade of the new millennium brought with it the shift to DevOps and its emphasis on the continuous development, delivery, and integration of software. A more recent disjunction is that of cloud-native software design. Compared with monolithic software, cloud-native design seems like the stuff of alterity: a completely different way of thinking about, building, maintaining, scaling, and using software. It makes use of new concepts and technologies, such as container virtualization and automated container orchestration; new methods of designing and instantiating software (microservices); and wholly new compute paradigms – function-as-a-service (FaaS) and infrastructure-as-code – that eschew conventionally instantiated infrastructure.

With this as prelude, Perez and Perforce say that a prebuilt stack reduces the number of variables the business has to solve for, so to speak: IT experts can focus on improving, scaling, and supporting business-oriented practices, applications, and use cases – rather than on acquiring, building, and maintaining the technical bits that enable them.

If this sounds familiar, it should. In its essentials, it is consistent with the value proposition of the PaaS cloud.

Won’t (ever) get fooled again

The thing is, Perez told Kavanagh, uncertainty about the New isn’t the only source of hesitancy among businesses. Some companies feel constrained by the unpaid balance of their technical debt, and, relatedly, some are still traumatized by experiences with prior technology commitments gone bad – very bad indeed. They are determined not to make the same mistakes again; they’re loath to commit to perceived technology risks of any kind. At most, they are comfortable experimenting with novel technologies in skunkworks- or test-dev-like deployments. Even if a skunkworks experiment shows progress, these organizations are chary about promoting it to prod – especially if doing so entails making changes to the prod environment.

This anti-strategy is an Rx for inevitable failure, Perez argued. Not only does it restrict the business’s ability to change or to grow organically, but it constrains its ability to compete. What is just as important, it makes it difficult if not impossible for businesses to attract top-level coders, data scientists, ML engineers, and other talent. The kids are alright with the New; in most cases, in fact, they want little to do with the old.

“You need to plan your organization for the future: what is going to be the widely adopted technology, the widely adopted frameworks, and you will be getting a lack of talent if you don’t adapt to those things, because in the new generation, they are more comfortable with a certain stack,” he said, referring to Python and its preeminence in ML/AI engineering as one example. “In legacy AI and machine learning, there are some old frameworks people use, but the new generation of data scientists, they’re very much [accustomed to] Python. And if your organization does not have the ability to support production-level practices for Python-related models, then you’re obviously missing a lot.”

Again, Perez isn’t an entirely disinterested observer: Perforce claims that its managed DevOps, stream-processing K8s, and ML/AI stack offerings help ameliorate some of the risk entailed in using these technologies. So, for example, businesses are not responsible for the configuration, deployment, and, above all, ongoing maintenance of these stacks: Perforce itself assumes these responsibilities.

Moreover, Perez claimed, Perforce’s managed stream-processing and ML/AI stacks can be deployed as complements to, not as replacements for, an organization’s existing data management infrastructure. He described Perforce’s pre-built, managed stack “solutions” as analogous to a PaaS offering that automates the initial configuration, deployment, and delivery, as well as the periodic maintenance, of critical IT services. “The key here is you want to make sure that it all works, or works together, that you don’t lose cycles just doing support, that you can actually just go use the technologies, and … you build the expertise, the skills to work with those stacks,” he argued.

Heroes of our time

Not surprisingly, the young are quicker to cotton to new technologies and practices than are 40-plus-year-old IT dinosaurs. Analyst Kavanagh, host of The Briefing Room, likened the architects, software engineers, data scientists, data and ML engineers, and other experts who agitate for their employers to take up and use cloud-native design methods, K8s, FaaS, and other similar technologies to the IT heroes of our time.

And Kavanagh wasn’t channeling Russian literary great Mikhail Lermontov, or Welsh-born pop-rock rhapsode Bonnie Tyler, either. Not on this occasion, at least.

Nevertheless, Kavanagh also allowed for a different, even more compelling kind of heroism: that of the IT veteran who recognizes that she, too, must change the way she does things. “There was a great quote I heard … at a conference where they were talking about … AI-infused ERP, and [a speaker] made this great quote about some of the people who have been around for a while: … “I am the hero of yesterday, what is my role now?” he said. “And that’s the guys and girls who are in their 40s, 50s, even 60s, who are … [asking] ‘What does this mean for me?’ It means to change, and I think that’s the key. You need to embrace the up-and-coming approaches for working in this discipline in this field.”

Perforce’s Perez picked up on this idea of heroism, using it to sketch a subtly different take on the problem. The issue isn’t necessarily that one group of people – say, younger technologists – tends to grok the New, while a separate group – older, entrenched technologists – tends to be more resistant to these new technologies and practices. This isn’t a helpful way of looking at it at all, he implied. Instead, he argued, the issue is that some people tend to be more comfortable with, and more conversant in, the open source technologies that disproportionately comprise the New; technologies that – in the form of larval OSS projects – function as incubators for so much innovation in IT.

If all dichotomies are false, some are useful. In this context, young-old is a false dichotomy; as Perez puts it, a more useful dichotomy might distinguish between people who are open to – and, usually, involved in some way with – OSS development, and, conversely, those who are hostile to OSS. This is more helpful, although it still doesn’t quite capture the problem.

In this respect, Perez posited, an even more useful dichotomy might distinguish between people who recognize the necessity of ongoing learning and skills acquisition … and those who do not. As he sees it, the person who accepts the inevitability of IT change and backs this up with a commitment to acquire new kinds of socio-technical skills–: this person, irrespective of their age or their strongly held beliefs about the only proper way to indent code, is a hero for our time.

“If you don’t have time to retrain and learn the new stuff, to learn about the latest innovations in open source software, you can fall behind really, really quickly,” he told Kavanagh. “I was doing a chat with some college students, and I was giving them exactly the same advice: don’t stop learning, keep learning …. That’s good advice for anyone, because there’s so much good, interesting stuff going on in open source.”

About Vitaly Chernobyl

Vitaly Chernobyl is a technologist with more than 40 years of experience. Born in Moscow in 1969 to Ukrainian academics, Chernobyl solved his first differential equation when he was 7. By the early-1990s, Chernobyl, then 20, along with his oldest brother, Semyon, had settled in New Rochelle, NY. During this period, he authored a series of now-classic Usenet threads that explored the design of Intel’s then-new i860 RISC microprocessor. In addition to dozens of technical papers, he is the co-author, with Pavel Chichikov, of Eleven Ecstatic Discourses: On Programming Intel’s Revolutionary i860.

Prefabricated Peace of Mind? Enabling Machine Learning

About Vitaly Chernobyl

Related Articles: