Podcast Ep 9: Overcoming challenges in protein characterization

In this episode, Dr. Andrew Lee and Dr. Edward Kraft (Leash Bio) chat about the intricacies of high throughput protein characterization. They explore the role of automation in simplifying research processes, discuss various protein characterization methods, and consider the impact of data analytics on the field. Edward Kraft shares his career journey, insights on team management, and thoughts on biotechnology’s evolving landscape.

Read the full transcript here

Andrew Lee: [00:00:00] Welcome to our podcast series, Imagine More, Create Solutions, where we talk about chemistry, biochemistry, and some engineering. For today, our discussions will be covering protein expression, purification, characterization, automating these processes, and an example of downstream application. We covered parts of the protein characterization, but protein characterization is so essential.

Andrew Lee: I mean, just purifying itself, you’ve got a nice piece of protein, but what does it do? And you have to identify the functional proteins. You express it, purify it, but if it’s non functional, then it’s useless. So what have you implemented for protein characterizations?

Edward Kraft: Yeah, and so in the context when you’re thinking high throughput, you don’t have a lot of individualized assets at your disposal.

Edward Kraft: However, things like HPLC based SEC. Um, and variations have been kind of the most widely generalized technology that I’ve used for characterization. I’ve explored like nano [00:01:00] DSF and specific instances as well. Um, well, not necessarily the easiest to interpret across a wide variety of proteins and protein classes.

Edward Kraft: Um, or if you’re dealing with protein complexes in high throughput setting, you know, those are examples of things that certainly have a place and they’re very valuable. And then more recently here, at least, you know, we’re looking at DEL based outputs and so for protein function characterization. By, you know, pairing individual proteins, um, or many in a particular class with small molecule binding profiles.

Edward Kraft: And so instead of being too worried about maybe if the, I have, you know, the most homogenous, um, perfect protein, we let the screen, uh, really be able to work for us with DEL outputs and computational classification to define your protein profile. Um, across many independent, um, proteins, as well as protein variants of an individual protein, um, to be able to get at the function.

Edward Kraft: And [00:02:00] so while you certainly can produce very well behaved inactive proteins, in many instances, take kinases, um, it’s a matter of phosphorylation state. Um, usually hip profiles are going to tell you the nature of that, and for many companies that’s untenable. They want to have beautiful, purified, super active protein up front.

Edward Kraft: Great. That’s a lot of work for if you’re dealing with hundreds, if not thousands of proteins here. If you don’t

Andrew Lee: understand some of the functions already, or if you don’t have the assays, appropriate assays for it, then how do you know, right? So I think I like your approach with the DEL outputs, just looking at pairing the individual proteins that way.

Edward Kraft: Yep, and I’ve seen, you know, analysis paralysis kick in, right? You sit at a table and everybody talks about all the different ways that it’s not going to work. And you could have already done the experiment to see. And so do do the work and let the outlets tell you before you pre qualify that [00:03:00] it’s. It’s going to fail.

Andrew Lee: And then one of the characterizations, uh, you mentioned, uh, running SDS page, uh, man, SDS page back in my grad school days, I actually had to cast my own SDS page gels. And then I’d mess it up and shoot my whole day off and recast and remake these gels and make the mistake of not pouring the right percentages.

Andrew Lee: It’s just. a nightmare And then once the precast gels came in for an academic, it’s so expensive and they’re so precious. So you set up the exact number of samples that you’re going to run and then you run exactly that number of samples only. But here you’re doing something different. Yeah, I mean, especially for high throughput.

Andrew Lee: I assume you still run gel electrophoresis, like the SDS PAGE. What do you do?

Edward Kraft: Yeah, you know, looking at alternatives there has been an area that I’ve worked on over the years, but I’ve yet to see a system that really truly covers the diversity of proteins like what SDS PAGE can do. If you’re in the [00:04:00] antibody space, then you’re in luck, as there’s been, you know, great focus on developing systems and automated systems that are generally pretty well suited for your use.

Edward Kraft: For the rest of us, however, diversity of proteins leaves us, uh, running SDS PAGE delts. And thankfully, Well, there’s a cost associated with precast gel, certainly, um, and some of the imaging annotation systems, uh, for scanning gels have made great advances for being able to make that less laborious. Um, you know, it’s much less painful than I think one assumes.

Edward Kraft: You can certainly crank out multiple gels, uh, within a few hours in an afternoon if you’ve got the right pipetting setup, uh, upstream of that with a little bit of help. However, it’s certainly an area that I’d like to see, and maybe there has been something very recently, um, some more innovation around.

Edward Kraft: For a truly general purpose system, uh, to make it out of [00:05:00] development and onto the market that really can provide, you know, a true replacement for something like SDS PAGE or proteome level type analysis.

Andrew Lee: Yeah. And another characterization, a very standard characterization is, uh, UV Vis spectroscopy. So you can use, uh, 280 nanometer absorbance.

Andrew Lee: And, you know, a very standard spectrophotometer, like the Nanodrop, or I think Unchained Lab has their, uh, I forget which system, it does very similar things, right? So you’re looking at quantifications using a 280. Are there any particular, I don’t want to call out go to companies, it sounds like we’re supporting them, but go to protein quant methods?

Andrew Lee: You can use fluorophores or absorbents. That’s very compatible with automation.

Edward Kraft: Yep, I’ve used DropSense, which I think is now called Stunner, as the go [00:06:00] to in the space for gears, and it’s truly wonderful. And so, a little micro capillary just takes a few microliters of solution, and then do A280 to get your total protein quants out of your illusions.

Edward Kraft: They can be a little sensitive to air bubbles, and so, you know, being able to use automation to load it can be a little bit challenging. But, you get values. For pretty much everything the 1st time trying if you don’t need that throughput, you can do similar things on like a nano drop, you know, for measuring 2 80s as well.

Edward Kraft: Um, you know, both ensure, I think 1 of the things is. Making sure you’re not wasting a lot of sample, um, to be able to do this and both of those ensure minimal sample usage and are very quick way to quantify total protein. And I say that specifically, uh, because you need to factor in the purity of your samples.

Edward Kraft: Usually you couple that with some sort of analysis like densitometry on a, on a CS page. With a final plan to be able to get kind of a purity and then [00:07:00] relative to the overall yield that you get by 280. And so it’s actually very simple and fast process with being able to just measure 280s using those types of systems and get your total protein yields.

Andrew Lee: Yeah, I mean, for any sort of quality aspects, uh, characterization, these tools are so important. I mean, you’re looking at sort of the purity of it, your quantity, the functionality, um, but you also mentioned kind of your approach or Alicia’s approach using Dell outputs. But there’s, you know, sort of more tedious approach is the SCS page, you know, the gel imaging while you couple the spectrophotometer, you can couple maybe dynamic light scatter for some of the sizing.

Andrew Lee: But are there any other analytical methods that you typically run? Um, I think you touched upon it earlier on about size exclusion chromatography, but those are pretty slow and not really for high throughput.

Edward Kraft: Yeah, absolutely. You know, as we kind of [00:08:00] drill down into, like, activity and more finer assessment, we start to see splintering and the applicability of high throughput technologies to meet the needs across the totality of protein space for high throughput work around folding.

Edward Kraft: You know, you’re right. Yeah, SCC is low. And even in higher throughput systems, if you’re starting to talk about hundreds of thousands, yeah, it’s just Too slow to be able to do that. If you can do DLS or MOLS types analysis to be able to look at uniformity of your samples, you know, I talked about, you know, DSF is a way to be able to get at a little bit around, um, sample quality and uniformity.

Edward Kraft: Once again, looking at fluorescence there, um, during unfolding. And so, you know, function, true function is a separate issue altogether and usually take specific assays around. Antibody binding, small molecule binding, or some sort of enzymatic assay. You really [00:09:00] start to get outside the space of high throughput.

Edward Kraft: If you know, a group is dealing with many different protein types, you know, SPR and similar outputs are a great technology. But you’ll need protein forms and specific outputs there to know if you consider it truly functional based on what you have in there and what it’s finding. And so these technologies are a little bit more difficult to automate and the analysis for proteome level.

Edward Kraft: Is probably not there. However, if you get into protein class level analysis, then you’re talking about more friendly approaches to be able to automate through that. And 1 of the things that we do at Leash is looking at protein classes and del level outputs for small molecule by profiles, you know, not unlike what you may do in the SBR type context.

Edward Kraft: And so, you know, you start to get data sets. that give you coupled outputs with non open molecules that bind to protein, you know, [00:10:00] when, when those are available. Um, and then you can determine what is your screen quality, protein quality, and infer protein function and form, you know, based on the screen. I expect that we’re going to see a lot of things on the computational modeling front, AI space and hit matrix that can better.

Edward Kraft: Um, back calculate, uh, your overall protein quality based on screening outputs and being able to also account for not needing to have absolute uniformity and perfection, um, of that starting material by being able to, um, subtract out, uh, what would be the nonproductive information.

Andrew Lee: As you were talking about this, there was actually a company that came to my mind called Soma Logic.

Andrew Lee: They’re, I think, based out in California, they use small molecules as well as with DNA encoded libraries, right? And they have a particular molecule that’s supposed to bind to a specific protein per [00:11:00] se, and that’s really kind of they’re pushed towards proteomic analysis. So is that a similar concept

Edward Kraft: in a way, I believe with some logic that it’s specific, um, nucleotide structures that bind to specific.

Edward Kraft: Protein folds as their output there for being able to classify and get at some aspects of on a nation in the, in the proteome. Rather than having that be the output, you know, Adele is actually. Chemistry with the DNA type readout and so. Dell is going to be more broadly applicable. Then a logic type technology, which is going to take a lot of development and defining to know what is binding to what and what it isn’t binding to.

Edward Kraft: But, yeah, those types of technologies are coming on on the market. To be able to get at some aspects are pretty on the connotation. I can see some of those things starting to get spun off as individual products for being [00:12:00] able to get a protein classes and being able to look at, you know, potentially protein quality for that as well.

Andrew Lee: Yeah, I was thinking along that similar line that protein quality assessment, you could potentially. Run along this direction, so there was an interesting statement that you made in 1 of your past presentations. Maybe one of our past discussions, how you enable this whole protein production or this high throughput process by breaking down into layers, uh, layers of expertise and each layer or level has critical problem solving, rather than breaking it up into like silos or groups that feed one process to the next process to the next process.

Andrew Lee: What made you think of this approach rather than some of, I guess, more of the classic workflow where somebody will do expression only, another person will do protein purifications only, and another set of people will have the analytical chemistry background and [00:13:00] they’ll do the characterizations only. So you kind of fused your approach differently.

Edward Kraft: Yeah, I’m happy you used the word silo. Because I think that that term gets used in organizations as processes to avoid building. In a modern organization, and so the layer approaches really respects the expertise of biologists, biochemists, automation engineers, and data scientists as equal partners at the table.

Edward Kraft: And in conversations, and so the ultimate goal here is to develop an end to end process. And, you know, you talked about breaking that down into individual groups that are all, like, handing things off, um, you know, that. That I think is highly inefficient in that form to be able to deliver and truly get what an organization needs.

Edward Kraft: When you break this up as a process such as expression, purification, disconnect the automation and data science from the process, you’re asking for inefficiency and translatability problems. [00:14:00] You need to connect the approaches from small scale through large scale, um, and even into assay outputs to be deeply connected.

Edward Kraft: Um, with these groups to truly make that process work. And so if every group is doing their own thing, um, small scale screening is a questionable value to the company. You know, if you’re, if you’re optimizing around that, uh, without any line of sight into downstream scale up and things, you may be optimizing for processes that you can never actually attain in a scale fashion.

Edward Kraft: And so if you’re outsourcing options use completely different approaches for expression, even if the same in the using the same cell line, your small scale screens become problematic for predicting downstream success. Now, everyone needs to be in the same room and be connected and invested in a workflow that runs end to end and avoid surprises in groups.

Edward Kraft: Um, that are not getting the approaching forms or amounts that they want, you know, ultimately from. What [00:15:00] you get when you scale this up and try to deliver for an organization data science that builds connects internal tracking systems is essential to provide transparency and easily reference data throughout the organization.

Edward Kraft: You know, automation as well, they need to be in the room and be respected for what the equipment can and cannot do. Um, as well as. What a process would take relative to capital expenditure, you know, protein screening throughout an organization is is and has become even more big data challenge and needs to be viewed as such with systems that accommodate all types of proteins and are able to connect that information more effectively.

Andrew Lee: Yeah, the approach to this, it’s. Very unique philosophy to it, you know, that work philosophy, uh, working broadly across and it’s a huge team effort to really take this across and it really shows with the years of experience that you’ve kind of [00:16:00] demonstrated. Over the years, I have to wonder, have you ever run across a particular protein that you couldn’t express or isolate?

Edward Kraft: Absolutely. Yeah, I think that everybody has their, the list of unattainables or their, you know, the, the book of things that when people mention, um, you know, you’re going to be in for it, um, a little recess. And so absolutely, you know, having had the great privilege of working, um, in large organizations and roles that.

Edward Kraft: At screening of everything that an organization was doing, you absolutely get exposed to the things that no common expression system, or even the niche expression systems just cannot handle when you try to really push them through at high expression levels, being able, you know, in finding what’s needed for the downstream assays.

Edward Kraft: Yeah. And you also have a [00:17:00] scenario where there’s a. Not a significant number of proteins. They’re only stable when found in a complex or whichever. And so if you think that you’re going to be doing research and that was an isolation, you are, you’re very mistaken. And so producing these proteins in that form requires.

Edward Kraft: You know, the ability to use them then for downstream analysis and being able to communicate and have that discussion, uh, openly and freely so that everybody really understands that. Um, and then factoring that in what that would then look like for drug development efforts, or for whatever efforts that you’re looking at performing.

Edward Kraft: It is not an insignificant number of proteins that are going to give you trouble when you try to make milligrams of them. I

Andrew Lee: mean, I wish in life it was easy as, well, there are some approaching targets that are not easily obtained or cannot be obtained, but then as you work with the, within the team, there’s so many other, I guess, [00:18:00] other layers, personality layers, some people have their ego, their power trip or power dynamics.

Andrew Lee: We have some personal agenda or there’s definitely incompatibilities in personalities. And this leans now outside of really the protein purification, protein characterization, but it definitely kind of intermingles in implementing high throughput process or kind of managing this whole rollout of any sort of workflow.

Andrew Lee: How have you dealt with some of these? I guess, softer skills, softer aspects of working in the team and managing the team.

Edward Kraft: Yeah, I love this topic. It’s a great topic. And it’s something that I’ve personally invested in getting better at as my career has gone along. And, you know, I always like to say that science is easy.

Edward Kraft: Um, but the non science aspects are certainly more challenging and so we don’t operate on teams that never change, you know, leadership that doesn’t change and [00:19:00] broader organization, um, priorities that don’t change. And so over the years, I’ve really seen a wide diversity of personalities and motivations that create their own combinations of productivity boosts as well as drags.

Edward Kraft: And so. This has, um, been an area I’ve even started talking about at conferences, uh, working in high throughput settings, you know, and the challenges that you feel around feeling disconnected from the science or, uh, feeling like you are just a part of an assembly line. The staff that works in these groups is exquisitely talented at mastering the complexities of automation, big data, and absolute consistency in the work that they do.

Edward Kraft: And I really want to champion the people that. Work in these environments, because so often they do such a great job that it, you know, if you’re not the problem, you can often go unnoticed. Right? And so I’ve been just so fortunate over the years [00:20:00] to work with some of the most amazing individuals that can work in these groups and really leverage Um, their expertise across there, and so it becomes a question on, you know, how this role is valued and promoted is, you know, their career or personal careers progress.

Edward Kraft: You know, it’s a question for a company. Is it a role in a company that is recognized and promoted with similar attention to, like, the big, splashy science or not? More generally, the aspects of power, politics, and personalities, is it unique? In any way, um, to the work that we do in high throughput protein work, you know, the principles and tools to address the success and scientists and managers, um, have them at hand to use them and learn.

Edward Kraft: Uh, I, I certainly hope that, you know, organizations will continue to invest in people and leaders, making people feel valued when they work in these higher throughput spaces. [00:21:00]

Andrew Lee: I mean, it sounds cliche, but it is true that you have to make everybody feel valued and respected and that sort of communication, despite some of the failures or frustrations or missed expectations, you have to set those clear boundaries and those goals.

Andrew Lee: And it sounds like that’s all kind of translated as you’re working together within this team.

Edward Kraft: Absolutely. Yep. And it’s something I certainly got wrong a lot early on in my career, but I like to think with, with, with every passing year, I get a little bit better.

Andrew Lee: Yeah, I don’t think AI or artificial intelligence is going to take over on the HR side of things or management side of things, even though some of them, they think that they can.

Oh, boy.

Andrew Lee: Oh, boy. Exactly. My response, you know, these new technologies that we’ve discussed, I mean, we’ve covered. Quite a few different technologies already, but, uh, I also see some of [00:22:00] this fear or reluctance, uh, new technologies are changing the existing processes that, why change something if it’s working?

Andrew Lee: Why, you know, break it if it’s working? Tight deadlines, uncertainty of delivering their existing data set, because they have their primary duties, they have their primary roles, they have to deliver some certain things by their deadlines. Now they have to take this new technology, evaluate it, confirm it, validate it, and then absorb that.

Andrew Lee: So I can understand part of that reluctance, but how do you overcome that?

Edward Kraft: Yeah, this is a great question. Yeah, to your point, you know, why take a risk on a new technology? That literally endangers your job security if it flops. Why would you do that? And so this depends on the organization. And how they view risk taking and how tight deadlines really are, you know, and so is there flux there?

Edward Kraft: And so if an organization has a punitive approach. To taking smart [00:23:00] risk in science. I wonder how long they’ll be successful or if they are successful at all. You know, it also becomes a question of how much time you or your team really has to pursue new technologies. You know, it must be balanced with.

Edward Kraft: Getting caught up and acquainted with what you’re trying to do new versus the outdated technology that makes you potentially attractive as outsourcing as an entire function. If you’ve gotten so far behind in the field that effectively outsourcing provides better opportunities there, yeah, you’re, you’re, you’re at risk.

Edward Kraft: I have always been willing to risk a lot to be on the cutting edge of technology that improves the quality of life of those in the lab and the quality of the data coming out of the lab. You know, doing so certainly has a personal cost, uh, and I’ve learned that over time, but it has also meant that I’m generally viewed well as a leader in this field.

Edward Kraft: And that said, I’ve been [00:24:00] very fortunate to work at companies that value being the very best in what they do at every level. You know, and so I really want to say that, generally speaking, I’ve, I’ve had great fortune over the years of being at companies that do value taking risk and working with companies like IMCS and others to be able to bring in a new technology in its early, early days and, and really, um, use it to great effect.

Edward Kraft: Automation, you know, is an area I want to call out as a particular area of reluctance. We kind of touched on that earlier. Yeah, I’ve yet to see automation not be a net positive for a group. And improve specifically on the consistency of the work that gets done, you know, that said, it is far from an autonomous process.

Edward Kraft: So expectations for leadership can be that, you know, I’ll look at headcount savings, you know, and it is, you know, that shouldn’t be an expectation, but it will allow for scaling of a process. And so [00:25:00] iMix has made the process of application support for methods development quick, easy, customizable process that.

Edward Kraft: Makes this transition simple, you know, for example, in half a day, we were set up to run our high throughput verification process, uh, on site here, you know, I hope that people that may feel a little bit on edge around automation, um, realize that it. You know, just partner with the right companies and people, and it is extremely accessible, and it doesn’t have to be something that generates fear.

Andrew Lee: It feeds into this next curiosity that I had, and for any other sort of collaborators, or other smaller startup companies, or other companies that are sort of the vendors, you covered this. Not just a buyer vendor relationship, but a collaboration to make the science progress. If you could have that relationship, it actually progresses better.

Andrew Lee: Um, so in that same vein, how can an inventor or this vendor approach this thought and [00:26:00] the challenge and in promoting the new technology? So, rather than trying to sell or prove the technology, they should be solving a problem. So, how can they still work with the customer? It is a novel process, but understand the customer’s pain points, not just the fear, but the problem that they have any recommendations or suggestions that you have.

Edward Kraft: Yeah, I mean, it all comes down to working with organizations and individual leaders and people you might meet at conferences, right? That you can tell the personality type. Sometimes when you have a conversation with somebody, if they’re going to be somebody who leads in, so like. Really being an early adopter.

Edward Kraft: And so working with those that have that track record of using and talking about the technology, you know, really is, is the best approach, you know, testimonials and potential consulting from these relationships can also help get past the, you know, maybe the more the charlatans, right. You know, that they’re promised a lot and deliver very little.

Edward Kraft: And so you develop that. You know, [00:27:00] trust with, with a vendor and if it doesn’t feel like there’s a motivation, you know, to make the sale as well, you know, if, if I’m being asked or pressured effectively, purely from a monetary perspective, and I don’t feel that there’s much to gain from the science side.

Edward Kraft: Then that’s how I view the relationship. The biggest challenge I come across is a vendor who makes a product in isolation of the process that’s in use by most. In use by most out there in the field. You really have to understand what companies are doing and how they’re doing it. It isn’t simply I always thought through for the realities of how work and workflows happen in a typical commercial lab, and so with continual downward pressure as well on headcount and research, you know, across biotech and pharma.

Edward Kraft: The product being proposed must be equally or more efficient for a team to be used relative to the current state of the art. The product needs to provide that light at the end of the development [00:28:00] integration cell. I’m not going to ask a team to go evaluate and use the technology that does not in some way, um, have at least a net neutral on the amount of time that they’re spending doing it, in addition to producing at or likely better quality outcomes on the data.

Edward Kraft: I hope that companies who are developing products are really talking to, um, the actual users in a way from a variety of different organizations, um, to make sure that they’re on the right track. And make sure that they’ve thought through all of the integration processes from an end.

Andrew Lee: Great, great recommendation.

Andrew Lee: I hope we can highlight that too. For a lot of the, I guess, new entrepreneurs as well. Trying to figure out what their future directions would be. Now you’ve spent A big chunk of your career leading and building protein expression teams, protein purifications. Have you applied large data set and you’ve sort of touched [00:29:00] upon this with the Dell and other current work.

Andrew Lee: So how does the large data sets impact your protein expression and purification? How do you apply it?

Edward Kraft: Yeah, and so I haven’t specifically used anything public, um, in, you know, organizations that I’ve been at, but certainly have. Tool sets at their disposal, uh, at least from a database perspective. But I expect many organizations, um, have data sets relative to protein expression and protein production, uh, at their disposal, um, to be able to use to effectively build their processes better.

Edward Kraft: And, you know, being at a really small company now, you know, I love to reference PDB, um, to be able to effectively say, um, how people are going at specific protein classes. At the point where you’re delivering structures, you’re doing something very right. That’s the go to for defining starting point on a protein, and then kind of [00:30:00] more generalizing that on a protein class.

Edward Kraft: Now, they won’t be able to express That as a function, maybe outside of what’s available in PDB, but it definitely removes unnecessary risk for things that are represented there. Beyond that, there’s a lot of just protein domain level breakdowns that sometimes are a little bit too vague for reference in those general prediction tools for quote unquote making the protein as a soluble form.

Edward Kraft: And so there’s a lot of inter domain dependencies that make this more complicated than just Chopping out individual domains and expressing them and we mentioned a little bit earlier. Uh, some of the tools that are starting to become available kind of around that and some of the prediction software that does a better job of accounting for the inner domain stuff.

Edward Kraft: I expect that. We’re going to get a lot better for computational tools that are accessible to all, whether or not [00:31:00] data sets that big companies are, or already have generated becoming a useful thing more broadly. I question if that will truly ever happen, right? We’ll see. Maybe they’ll be academic.

Edward Kraft: Initiatives that fire up around this to be able to better predict how to make your protein and make those protein forms.

Andrew Lee: So one of the challenges that I have personally, you know, getting it in my head is the grasp around machine learning, you know, the data set is the most important part, but then like, even within protein, there’s so many subtle variations.

Andrew Lee: I mean, I think you touched on phosphorylation, glycosylation, you know, dimerization, chaperones. There’s just so many subtleties already around there. Um, So how does that all get fed into large data set? I mean, who feeds into it in the manner and the form [00:32:00] that is relevant to your work? It’s probably not always the case, right?

Andrew Lee: You can’t always just pull database. And I think you touched on it as well, like PDB, the protein database, you can rely on parts of it because it’s been crystallized. It’s been kind of formulated and tested, but then the solubility definition for even that database is relative and it’s. Not a standard.

Edward Kraft: Yeah, I’d say that this is an area that really is in its infancy, even though, you know, I’ve been around for many, many years, right? You know, when you look at a static structure, but you don’t consider the protein. Um, I think you can get yourself a little bit in trouble. And so some companies are doing this.

Edward Kraft: We’re talking about specific therapeutics, looking at molecular dynamics as a way to better understand a particular protein or protein target, but the success of generalizing across the protein space is yet to happen, and it’s very computational and [00:33:00] costly. Uh, to be able to try to do that pulling from legacy data sets with little or no standardization to go at this problem is really a flawed approach.

Edward Kraft: And I think a lot of organizations have been or are burned by trying to mine their legacy data sets where there was a lot of niche approaches to collect all that information. Um, and when you put it all together, all you get is noise. It would be easy to sit around the table and look at the complexities of this problem and talk yourself out of even doing the work, uh, to develop a testable dataset, right?

Edward Kraft: And so it’s important to consider that we don’t need all the options for potential binding agents and poses, but we do need to be able to do measurably better than the current state of the art. You need to commit to developing a broad, testable dataset of proteins, protein mutants. Reverse chemical space to really tackle this problem.

Edward Kraft: Um, the data set really needs to be generated with the ML process [00:34:00] in mind. Right, and so going at this first, um, coupling together legacy data is not a particularly valuable approach. You will absolutely have artifacts in the data set. And generating a large enough data set to reveal those artifacts.

Edward Kraft: allows you to be able to computationally mine and correct for that. And so, the ML process isn’t a replacement for good science and traditional approaches, but another tool for accelerating discovery and better revealing high quality hits. And so, there’s too much hype. That MLAI is going to revolutionize drug discovery and we’ll have autonomous processes for, for drug discovery and development.

Edward Kraft: Well, and we will see if this truly emerges, but they are just simply memorizing data sets right now. We’ll see how this comes about as efforts to try to generalize in, in the coming years. But as, as we saw at least here with the recent Kaggle competition, it’s a memorization [00:35:00] exercise. We’ll generate.

Edward Kraft: Large data sets to be able to get a cracking this problem, but you will see this is certainly a problem that is yet to become close to being solved, you know, and you take an example of membrane proteins, uh, as an extraordinary challenge for conformational complexity and you see efforts around around this.

Edward Kraft: And so not only do you have to consider the matrix, but the interrupting proteins on both sides of the membrane that define the different protein states. Yeah, I expect that. Okay. The GPCR space is going to be the first area where computational design makes head notes, and it has in recent years. And there are some great papers that have come out recently around this.

Edward Kraft: And so I looked at that as being kind of the real trendsetter for the field. As we try to take a protein class, which are fantastic small molecule binders, there’s still a lot of need there for developing new small molecule binders and be [00:36:00] able to generate datasets that get at true ML, AI computational, both modeling and prediction of hits without a lot of human intervention.

Andrew Lee: You know, the space kind of comes true and dear for me is because 1 of the 1st internships I did in San Diego was for a company, a startup company trying to crystallize. Yes, it was back in the days. It was right as I was trying to, I guess, leave undergrad and get some biotech experience. And I remember that company trying to crystallize, but, of course, now, 30 years down, uh, we’ve got the crystal structures, but also we’re making various.

Andrew Lee: Predictions are around and I suspect, you know, for machine learning in these large data set, like you touched upon is some of these small molecules with unique designs to target look at predictability of toxicity or its profiles. Uh, if we can kind of design, because it. [00:37:00] Nature has its own rules, right?

Andrew Lee: We’re trying to dissect that rule and understand it and build models around it. And the large data set with the machine learning algorithm is to help us get there. So really, the holy grail in my mind, I guess, is to find sort of this model that can provide us with, like you said, also the functionalities, binding profiles, metabolic profiles, even that would really be nice.

Andrew Lee: I think it’s just how much data do we need to generate those rules, right?

Edward Kraft: You know, absolutely. And I think this is more a more defined question that is better suited for ML, you know, look at a typical Sarah panel that you run and screen. Against and that could be a testable data set on potential toxicity.

Edward Kraft: You know, how much data do you need in a chemical, a chemical space to build a generalized model though becomes a question of who would then dedicate the time and effort for building such a data set to establish that [00:38:00] foundation. Right? You won’t know until. It starts to emerge. And so it might not ever emerge.

Edward Kraft: And so there’s a risk in taking that. And I expect we’ll see a lot of information coming out in the coming few years around this question of how much data. It really took to generate true intelligence outside of the training data set.

Andrew Lee: Um, as we come to a wrap things up, I’d like to kind of add in a tidbit of life lessons for the younger generation.

Andrew Lee: My life lesson for the younger generation is that I did summer internships left and right ever since my freshman year in college. And it was a very old school medical school lab that. You know, poured silica gel columns, purifying, fluorescing conjugated bile acid. That still helps me. I mean, a lot of these tall silica columns, um, that kind of brings me back memories.

Andrew Lee: I don’t run columns anymore, but I run microchromatography products on automated liquid handlers. [00:39:00] So the general concepts of science, the chromatography, the, the logic, uh, as long as you kind of pick those up early on and you continue to build upon it. I think is always beneficial. So don’t lose opportunities to do a variety of different internships or experience in the lab.

Andrew Lee: If you’re really curious about the lab, that’s my tidbit. How about you?

Edward Kraft: Yeah, you know, I’ll step back to, you know, I grew up on a dairy farm, right? And so being on a small family farm taught me the importance of invention with what you have around you. It also taught me my first lessons in automation and efficiency for being on a farm and being able to get all the work done and still have something left of yourself at the end of the day, right? And so, why pick up one bale in the field at a time when there’s a machine that can do it at much greater efficiency and be very cost effective? You know, we don’t milk cows by hand anymore, right? And so, we have machines that do it and have done it for decades.

Edward Kraft: Why would we sit in a lab and purify a single [00:40:00] protein at a time? When we have a Hamilton Vantage system that can take IMCStips and do 192 of them. Yeah, I can walk into a lab and see the inefficiencies that are holding it back and also making the people there unhappy. You know, if I’m enabled to chip away at these inefficiencies and improve work life balance and contentment in the lab, and that’s when I feel truly fulfilled.

Andrew Lee: So if you had a magic wand to wave one magic wish, the genie, to solve one of your technical problems now, uh, what would that problem be? And why do you want it solved?

Edward Kraft: Yeah, I’ll go on a little bit of a tangent since I would love to do that, but it ties into technical problems. You know, I would ask for protected budget, you know, immune from leadership changes and.

Edward Kraft: The year to year flux to pursue process efficiency and technical developments in a group. You know, I have enjoyed giving those in the lab, the ability to purchase to pursue technical developments. Be on the forefront of technology and working [00:41:00] across the protein expression purification space to really push the field forward while still delivering on what they need to deliver every week.

Edward Kraft: That would be my magic wand ask, is that I had that percentage time for everybody to continue to work with companies like IMCS and others. Um, to evaluate technology and continue to push this field forward.

Andrew Lee: And if you had in any project. 1 change that you can make, I mean, knowing that what, you know, now, if you were to go back in time and say, oh, I, if I did this 1 thing differently, it would have saved so much time.

Andrew Lee: It would have been so much more efficient. What is that?

Edward Kraft: Yeah, I’ll go back to the list of list of unattainable, you know, proteins that no matter what you do with common expression system, it just can’t get up. And so I wish I had a line in on screening those proteins across human cell lines. Uh, to define a place where they truly are attainable.

Edward Kraft: Right there, there’s a, there’s a lot of value to humanity and being able to launch a drug discovery. [00:42:00] Uh, therapeutic programs around those targets. If we can just solve some of the aspects around production. Those targets have a solution, you know. Heck, we make them in our own bodies, but we just need the right environment to truly make them tractable.

Andrew Lee: Well, thank you so much. It was a blast having you here at IMCS on our podcast, Imagine More Great Solutions podcast series. I’m the crazy post scientist, Andrew Lee. I am the scientific officer and I have here with me Edward Kraft, Senior Director of Small Molecule Discovery at Leash Bio. He’s implementing a major push in protein purification, expression, characterization, and then they’re building their own data set.

Andrew Lee: And you know. Feed into their own modeling platform. So it’s a very impressive stuff. So please do check out Leash Bio. Also read the book chapter that Edward coauthored, and then please listen to his presentation at PepTalk. And any other like PEGS too. PEGS, I believe is that Boston , PepTalk is in [00:43:00] San Diego.

Andrew Lee: Both beautiful locations.

Edward Kraft: Yeah, thanks Andrew and everyone at IMCS. And so it’s been a blast today covering a career in this space. I really appreciate being part of the podcast and working together in this protein purification space. And I appreciate the opportunity to share my expertise, thoughts, and opinions.

Edward Kraft: And so I look forward to continue working together to build large datasets across the human proteome as we make, you know, rational purpose built datasets to inform ML, AI efforts here at Leash.

Andrew Lee: Well, let’s wrap it up folks. I hope you enjoyed this episode. If you want to stay connected, follow us on LinkedIn. And for more episodes, find us on Spotify or visit our website, www. imcstips.

Andrew Lee: com. Catch you on the next one. Take [00:44:00] care.