Analog
what my knowledge management job at turner taught me about why ai is failing in construction
my experience working at turner at the turn of the decade, working in knowledge management.
it’s clear that turner at the time had some of the best and brightest but also had some of the most wide-ranging commercial work in the industry that the sheer breadth of knowledge was overwhelming. so much so that they created the knowledge management program in the mid-2010’s. i was hired into that group at the end of 2018 and served in that department until march 2020 before leaving to join upstart and create a knowledge management function there. during my time within the knowledge and learning group, we had a wide mandate of extracting and sharing the company’s knowledge throughout the enterprise. One of the ways we did that was through internal resumes and profiles. every employee had a semi-private resume that listed all of the projects they’d worked on during their time there and also their skills/expertise for those projects and overall career. we then paired that with an ask an expert service through a portal that i was responsible for managing. specifically, my job was to field questions as they came in and ideally answer them on my own, through my turner network (this is how i met so many leaders and people across the organization in such a short amount of time) across the enterprise. for those instances where i couldn’t easily answer/provide a response, i would then cross-reference our rolodex of internal relevant knowledge, reach out those listed individuals, and then either a.) retrieve the information myself and relay it back to the requestor, or b.) connect the two parties together and follow up with them separately to capture context for future work. a classic example of this is when someone was doing some work at the cincinnati zoo and wanted to learn about a super niche animal exhibit requirement from another pm/superintendent that had just completed some work at the san diego zoo. our team was responsible for this and many other knowledge and context sharing initiatives and was but one way that turner was able to rely on the expanse of the organization to better position itself against others.
however, given the work that llms have done over the past few years, it’s no surprise that that function no longer exists within turner. however, those experiences surprisingly do provide a great analog framework for how llms in construction work. i recently wrote about how/why chatgpt doesnt work that well with construction drawings. it’s almost a trope at this point in the industry to keep arguing this point but it’s necessary to perhaps explain a bit more. generally, when people upload a bunch of drawings/documents to claude/gpt/gemini/etc. they expect to get back an answer in one-shot. this almost never works and so they declare that claude in construction doesnt work. but you have to go a couple layers deeper and explore why this doesnt work. there are obviously many potential failure modes such as pre or post-training knowledge gaps, retrieval and augmentation, context windows, tabular and spatial reasoning, etc. some of these can and will be addressed by the foundational model providers, however, there still exists a gap in construction context to fully solve this issue. i suspect the most construction contextually important challenges to overcome are retrieval/ingestion, and pre & post training gaps. going back to my experience at turner. given the scale of the organization, it was a fair assumption that there was someone within the company that had existing knowledge and experience for any given submitted query and that it was our job to go retrieve that knowledge by searching for relevant context, knowing who held that knowledge in their head, extracting it and then presenting it back to the requestor. over time, i had to capture those lessons learned and use it to seed an early chatbot we created for the staff in the company. this process took anywhere from hours to weeks, depending on the nature of the request. super high-level, but is functionally how an llm works with multiple steps required to be successful in order achieve a positive, successful result from gpt. the reason i say that pre & post training gaps paired with retrieval/ingestion are the two biggest challenges is because i’ve lived the analog version and it’s hard to unsee it. it’s true that these llms are trained on the internet’s data, which is vast, but the construction industry was barely online until the last 10-15 years, and even assuming things were online, how much of that is good data (comprehensive, accurate, available, not behind a paywall, etc.). for the sake of this conversation, let’s assume the model providers were able to scrape the necessary context needed to include construction knowledge in the training run (pre-training) there still exists a gap in post-training whereby construction specific experts have by and large been ignored in grading the trained model to provide accurate responses on construction specific semantics and context for our industry, so it defaults to generalized, plausible outputs - i.e., failure. which is highly ironic given they’re all super constrained by construction as the bottleneck to access additional compute for training and inference.
this is also why leaders in the space rely on experience/years in industry as a proxy for knowledge, it takes years to start pattern matching.
nevertheless, let’s now assume anthropic/openai now have access to construction data and construction experts to train and grade these models before release, there’s still an incredibly hard underlying retrieval/ingestion problem to overcome. the reason this is so hard to solve is because you need to be able to strip away all of the unnecessary information about construction/the prompt in order to return the correct answer. a challenge which actually gets worse, not better, as context windows increase and people stuff more documents into a single prompt. when this happens, the model loses information in the middle of the context, which is a well-documented failure mode. and even when you sidestep that by using vector retrieval instead of stuffing context, the retrieval problem itself gets harder as the document corpus grows. an example of this dynamic playing out is when you have a construction project. before the project begins, you have to learn about the job. you have to learn about the client, what they care about, contract requirements, site conditions, the schedule, the drawings, the budget, the cost ledger, the critical path, o&m requirements, inspections, submittals, rfis, change orders, etc. it’s a lot of context for a team to know, but it’s the expectation. now, when the job starts and changes start happening, someone has to not only hold in their head what original versions of all of these things but also their respective changes, and not only that, but also how those changes are interconnected and also their downstream effects. because it’s too much to keep in your head, what people start doing is stripping away the non-relevant context of the project based on their role, function, experience, stage of the project, etc. then they figure out what are the top items to keep track of that could make the project successful or derail the project. that same process that the construction team is doing, is the same thing llms do through a process called chunking and embedding/vectorization. chunking is when information is broken down into smaller pieces and through embedding is assigned a numerical representation of that information while some systems also apply the clustering of those numbers to related concepts. whereas, retrieval is the process of finding previously assigned related information, ranking its relevance and then presenting that information back to users.
through these processes, you can start to more easily map the analog knowledge management experience i had at turner to understand my beliefs on where, why, and how llms are failing in the construction industry.
again, it’s true that some of these challenges could be solved by the foundational model companies including more construction data into their model development pipelines, but hope is not a strategy. it’s also clear that the foundational model providers won’t fix these issues because the construction industry isn’t their market. which is why you have companies in this space creating their “own models” to solve this problem. they’re not creating a new foundational model, they’re creating frameworks and ways of working to solve our industry specific challenges. given that, it also becomes clear that chatbots were never going to be the ai application to make construction data more useful. it’s a marginal improvement to the existing paradigm. in a future piece, i’ll share more of my thinking around what separates real scaffolding and useful frameworks to unlock ai as a use case in our industry from the noise in the space and how i evaluate those companies/efforts. that piece will require a paid subscription.

