Discussion about this post

User's avatar
Gustavo Seabra's avatar

Another issue is the _type_ of data. Can you please comment on what kind of data you are collecting? IC50s are relatively abundant in the open-source sets, but notoriously low in quality and even worse in reproducibility.

Expand full comment
Gustavo Seabra's avatar

Thanks a lot! I'm glad to see this, as I've been hitting on this key for a long time now.

I see too many new models out there claiming to "beat SOTA", which bring only small, incremental improvements that more often than not fall within error bars, so, not really statistically significant at all. The methods are great, but there's a barrier we cannot seem to cross, and I've always attributed that to data, not model. No matter how much we tinker with the models, there only so much one can do with the low-quality open-source data available for most academic researchers.

Expand full comment
4 more comments...

No posts