Copyright: © 2026 by the authors. Licensee: Pirogov University.
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (CC BY).

ORIGINAL RESEARCH

Classification models for assessment of influenza virus A/H1N1 inhibitors

Stolbov LA1 , Tarasova OA1 , Borisevich SS2 , Gorokhov YV2 , Zarubaev VV3 , Poroykov VV1
About authors

1 Institute of Biomedical Chemistry (IBMC), Moscow, Russia

2 The Ufa Institute of Chemistry of the Ufa Federal Research Centre of the Russian Academy of Sciences (UFRC RAS), Ufa, Russia

3 Saint Petersburg Pasteur Research Institute of Epidemiology and Microbiology, Saint Petersburg, Russia

Correspondence should be addressed: Leonid A. Stolbov
10 Pogodinskaya St., b. 8, Moscow, 119121, Russia; ur.xednay@alvoblots

About paper

Financing: this research, which details the structure-activity relationship models, was supported long-term by the State Program for Fundamental Scientific Research in the Russian Federation (2021–2030) (Number 124050800018-9). This database was developed as part of the state research project entitled “Kinetic, spectral, luminescent, and theoretical analysis of core intermediates in chemical and biochemical oxidation processes” of the Ufa Institute of Chemistry of the Ufa Federal Research Centre of the Russian Academy of Sciences No. 125020601626-9.

Author contribution: Stolbov LA — data analysis, model building, manuscript preparation; Borisevich SS — idea, database preparation; Gorokhov YaV — scientific literature review for database compilation; Zarubaev VV — providing up-to-date results of biological testing; Tarasova OA — the idea and methodology of research; Poroikov VV- research methodology. Every author contributed to writing and editing the paper.

Received: 2026-01-29 Accepted: 2026-05-10 Published online: 2026-06-15
|

Creating new effective antiviral drugs is a top priority for modern medicine and pharmacology as the virus can possibly produce a severe direct cytopathic and immuno-mediated effect on the cells of the host body, and due to the spread of viral infections through communities during epidemics and pandemics. This issue is highly important for illnesses caused by highly variable viruses, such as the influenza virus [1]. Annual epidemics and potential pandemics force scientists to constantly improve medical treatments, which demands a substantial investment of both capital and temporal resources.

In silico techniques, particularly machine learning (ML), serve as a potent catalyst for accelerating and reducing the cost of the preliminary stages of development. ML allows to build predictive algorithms, which analyze the existing data on the structure and activity of the chemical compounds and accurately predict the physicochemical characteristics of both known substances and entirely novel, not synthesized compounds. This breakthrough paves the way for precise, computer-aided screening of extensive chemical libraries. Consequently, it helps to identify viable molecules for subsequent laboratory synthesis and biological experiments.

The effectiveness and reliability of such models depend on several aspects. First, it is the quality and volume of the training data. Developing robust models requires a well-balanced training dataset. This dataset must feature consistent and trustworthy metrics on both antiviral efficacy (e. g., IC50 concentrations) and cellular toxicity (e. g., CC50 values). Building such a database is a separate difficult and time-consuming task. Furthermore, the selection of an appropriate ML approach and feature space is critical.

Quantitative structure-activity relationship (QSAR) analysis is an essential tool in ligand-based drug design, facilitating both the identification of promising molecules and the optimization of lead compounds. Although this methodology broadens the model applicability, it is characterized by a number of limitations: a significant variation in compound activity metrics across different testing methods; the difficulty of quantifying categorical data; and a severe bias toward highly active compounds. As experimental parameters are frequently poorly defined, generalized models suffer from diminished predictive accuracy unlike models trained on restricted, uniform datasets. Scientists must choose between a generalized model that predicts poorly and a highly specific model that works only in limited situations.

Predictive performance and model interpretability vary significantly depending on the chosen methodology. These choices range from classical algorithms, such as Random Forests [2] and Support Vector Machines (SVM) [3], to advanced deep neural networks, such as molecular descriptors and fingerprints.

Several models have been developed to analyze structure-activity relationships in order to search for potential anti-influenza compounds, including combined algorithms using molecular modeling and machine learning methods. Thus, in their study exploring the link between the biological activity and theoretical data, Khomenko et al. [4] developed a regression model. This model compared the half-maximal inhibitory concentration pIC50 (50 % suppression of viral replication) against computational estimates of ligand binding within the influenza virus active hemagglutinin site. The docking scores served as a measure of the ligand’s binding affinity to the target proteins. The correlation between the pIC50 values and the scoring functions was 0.46.

Mercader et al. [5] utilized computer-based QSAR modeling to predict the inhibitory effects of flavonoids and biflavonoids against the H1N1 influenza virus neuraminidase. The study analyzed 25 chemical compounds. It used experimental data to find their IC50 (half-maximal inhibitory concentration). It also used descriptors to calculate the physicochemical and geometric features of each molecule. For the R model, the correlation coefficient was 0.971, with an RMS error of less than 0.1. When applied to the test sample, the model displayed an RMSE of 0.1163.

In the Hammoudan publication [6], the authors created a regression model based on 168 compounds to predict the potential activity against influenza virus neuraminidase. The model shows good predictive accuracy (R2 = 0.82; Q2 = 0.81). However, the paper does not state the exact size of the test set. We estimate it contained about 20 % of the original data, as the authors used the Kennard-Stone algorithm for sample stratification. Using a computational model, the authors designed several structures that could potentially block the H1N1 influenza virus. Still, these compounds have not been evaluated in cellular assays to determine their ability to halt viral replication.

In publications [711], structure–property relationship analysis models (2D/3D-QSAR) were constructed to evaluate the binding (inhibitory) activity against the targets of neuraminidase and hemagglutinin of the influenza virus for heterocyclic and natural antiviral compounds in a training sample of 20 to 45 molecules. The accuracy of the developed models shows a coefficient of determination between (R2 = 0.847) to (R2 = 0.973). During cross-validation, the predictability ranges from (Q2 = 0.610) to (Q2 = 0.950).

Thus, some ongoing studies using machine learning models [413] are aimed at finding inhibitors of specific viral targets (for example, neuraminidase, hemagglutinin, M2 proton channel, or other proteins that affect viral replication).

Meanwhile, researchers have gathered extensive test data on blocking viral growth in cell cultures. Yet, the specific mechanism of action remains undefined. These datasets enable the development of structure-activity models. In turn, these models facilitate the selection of promising candidates via virtual screening. To do this, we used a classification method called SCEC (Self-Consistent Extreme Classifier). It analyzes how a chemical structure relates to its biological activity by applying statistical regularization. The primary advantage of the SCEC method is its ability to bypass the limitations of traditional approaches. It achieves this by carefully choosing the best features used to classify data. Unlike traditional approaches requiring prior feature selection or intricate mathematical mappings (e. g., kernel trick for SVM [14]), SCEC embeds statistical regularization directly into the classifier’s iterative training phase. This method allows to filter out unimportant descriptors efficiently. It helps you focus on the most useful categories for data separation, even in high-dimensional feature space [15].

This new approach looks at how far chemicals are from the class boundary. The SCEC algorithm ignores descriptors of those compounds that are very far from this line. This helps the model focus only on the chemicals that are hardest to classify (for active and inactive ones). This guarantees that the algorithm prioritizes the most highly relevant data around the critical activation threshold, thereby enhancing the classification accuracy of borderline scenarios.

Validation of the SCEC approach on test datasets demonstrates that its predictive accuracy matches that of alternative models, while utilizing substantially fewer molecular descriptors. This highlights its overall efficiency and superior interpretability [15]. Quantitative models need matching test data. This means using the exact same methods to study one molecular target.

In this study, a broader yet highly practical objective is achieved: predicting whether a compound possesses antiviral properties against the influenza A/H1N1 virus, independent of its specific mechanism of action. Consequently, this leads to the selection of a binary classification approach (“active” versus “inactive”) rather than utilizing regression analysis. This approach allows to do as follows: 1) To merge heterogenous data from different labs, virus strains, and cell types into one training dataset. This is essential for building a regression model; 2) To broaden the chemical space of the model, diverse molecular types and compounds known to have no effect were added; 3) To solve the key task of primary virtual screening sorting compounds by their predicted activity. This allows researchers to quickly identify the best drug candidates. As part of this approach, the model provides a quantitative measure (for example, an estimate of the probability of being classified as “active”) suitable for comparing and prioritizing compounds.

It should be noted that this method does not aim to predict the exact numerical value of activity that can be obtained in a specific standardized experiment for a specific target; this task is solved at subsequent stages after identifying promising “candidates”. This model allows us to address two main objectives: determining if a compound is active against the influenza A/H1N1 virus, and identifying which candidates are best suited for further testing.

MATERIALS AND METHODS

The database used in publication [16] contains 2,255 records on the chemical structure of small molecules and biological testing data for influenza virus strain A/H1N1 of different serotypes and the Madin-Darby canine kidney (MDCK) cell line using the MTT test [17]. Scientists calculate selectivity indices (Selectivity Index, SI = CC50 / IC50) based on CC50 and IC50 values.

The IC50 values for compounds in this database span a wide range — from subnanomolar to the micromolar level. This huge range shows that the studied compounds have very different structures. The cytotoxic concentration of CC50 also covers several orders of magnitude. This wide range lets researchers measure both the antiviral effectiveness and the potential harmfulness of the compounds. The SI selectivity index, calculated as the ratio of CC50 to IC50, serves as an integral indicator of the compound’s promise, since it combines data on activity and safety. High SI values mean a compound stops viruses very well without harming normal human cells.

The choice of the threshold value by which a compound is classified as “active” is determined by the specific task of the study. Throughout the search for novel antiviral compounds, this parameter is frequently governed by regulatory protocols, for instance, the Guideline for Experimental Preclinical Studies of New Pharmacological Substances [18]. Although the standard minimum threshold for the selectivity index is generally ≥ 8, this study applies a stricter criterion of SI = 200 to ensure a highly reliable selection of candidates. This strategy allows researchers to identify the most effective and specific compounds against the H1N1 influenza virus for further testing.

An independent test dataset was established to validate the effectiveness of the developed classification models in virtually screening and identifying potential antiviral compounds. The main goal of the design was to completely detach the process from the training phase and directly optimize the model’s parameters. Internal validation methods, such as cross-validation on a training sample, are necessary to assess the stability of the model and select optimal parameters. The model, which is characterized by high rates of accuracy in describing the dependence for known data, sometimes predicts the activity of fundamentally new, previously unknown chemical structures much worse. It is precisely to simulate this real-world task of predicting the activity of unknown compounds that a completely independent test sample is required. The compounds of the test sample, selected randomly from literary sources [1922], have structural representativeness and novelty relative to the training sample. This list is limited to compounds evaluated for antiviral efficacy using the MTT assay. The evaluation was conducted against various H1N1 influenza virus serotypes cultured in the MDCK cell line. In other words, to test the predictive ability of the model, we selected only those compounds that were tested under similar experimental conditions to the compounds in the database that were used to build the model. Test results on this independent sample prove the true reliability of our models. A total of 16 compounds were included in the test sample.

Following established (Q)SAR modeling standards [23, 24], we processed all chemical structures for the training and test sets. This process included removing salts and duplicates, and standardizing molecular forms. After removing duplicates and preprocessing, 1,816 unique structures were included in the training sample.

To build the models, the researchers used QNA (Quantitative Neighborhood of Atoms) molecular descriptors as feature space. They originally created and implemented these markers for the GUSAR software [25, 26].

QNA descriptors characterize each atom in a molecule, taking into account the influence of all other atoms of the molecule on this atom. This allows us to obtain a complete and informative description of the molecular structure, correlating with its biological properties. Like the GUSAR model, we use Chebyshev polynomials to create a list of numbers. These numbers measure a molecule’s chance of being biologically active [27].

To create the predictive classification models in this study, we used the Self-Consistent Extreme Classifier (SCEC) technique. We provided a detailed explanation of this method in our previous work [15].

To ensure the models worked well on new data and did not just memorize the training data (overfitting), a 5-fold cross-validation (5F CV) method was used. To achieve reliable results, the data were split into different training and testing sets ten times. The cross-validation for each split was run to ensure that the findings were consistent, and then all the performance scores were averaged together.

Random splits were generated using the C++ standard library’s implementation of the Mersenne Twister algorithm, with a modulo operation applied to restrict the range of the generated values.

The performance and comparison of the designed classification models were based on the following metrics:

Sensitivity, Recall: The percentage of correctly predicted active compounds among all really active ones.

Specificity: The percentage of correctly predicted inactive compounds among all truly inactive ones.

Balanced Accuracy: The arithmetic mean of sensitivity and specificity, which is especially important for unbalanced samples.

form. 1

form. 2

form. 3

form. 4

Area under the ROC curve (AUC-ROC): An integral metric that evaluates the ability of a model to separate classes at different classification thresholds. The AUC-ROC value was calculated by numerically integrating the dependence of correctly classified class objects (True Positive Rate, TPR) on incorrectly classified class objects (False Positive Rate, FPR).

In formulas:

TP (True Positives) shows the number of true positive predictions (active compounds predicted as active).

TN (True Negatives) shows the number of true negative predictions (inactive compounds predicted as inactive).

FP (False Positives) shows the number of false positive predictions (inactive compounds mistakenly predicted as active).

FN (False Negatives) shows the number of false negative predictions (active compounds mistakenly predicted as inactive).

Applicability domain (AD) of the prediction was defined using the average similarity of the three most closely related samples in the training dataset. If a compound scores less than 0.5 in average similarity based on its descriptors, its prediction is considered outside the model’s reliable range [28].

RESEARCH RESULTS

A carefully prepared training sample containing 1,816 unique chemical structures with experimentally measured values of antiviral activity (IC50), cytotoxicity (CC50 and selectivity index SI) was used to build predictive models. The chemical analysis showed that the sample includes many different types of structures. However, the traits or effects of these structures are not spread evenly across the groups.

Three independent binary classification models were developed using the self-consistent extreme classifier (SCEC) method to predict antiviral activity (IC50), cytotoxicity (CC50), and selectivity of action (SI). The classification thresholds dividing the compounds into active/inactive, toxic/non-toxic, and selective/non-selective ones were established by balancing physiological interpretability and predictive performance, as validated by 5-fold cross-validation. In order to determine the best way to split compounds into active (flu-fighting) and inactive ones in the experiment, several options for dividing the training sample into positive (active compounds) and negative (inactive) examples were investigated.

To separate active from inactive compounds, we evaluated multiple threshold combinations. These were 100 to 0.1 μg/ ml for IC50, 1 to 300 μg/ml for CC50, and 0.1 to 200 for the Selectivity Index (SI). This approach was adopted to evaluate the models’ sensitivity to threshold variations. The primary models were chosen using threshold values standard for identifying active antiviral agents: (1) IC50 of 5 μg/mL for antiviral activity, (2) CC50 of 300 μg/mL for cytotoxicity, and (3) a selectivity index SI of 200.

Models developed using optimal cutoffs were benchmarked against those derived from adjacent thresholds in datasets lacking severe class imbalance. fig. 1 shows that, the chosen thresholds produce the best SI models and highly competitive IC50 and CC50 models. Based on the specific thresholds, the training data included 523 samples that were active for IC50, 768 samples that were non-toxic for CC50, and 88 selective samples for SI.

Using a 5 μg/ml cut-off threshold to identify active compounds, the predictive model demonstrates a Balanced Accuracy of 0.756 and an Area Under the ROC Curve (AUC-ROC) of 0.822. Reviewing the histogram profile (fig. 1A) across various classifier thresholds indicates that the predictive model has room for further optimization. The CC50 cytotoxicity prediction model with a threshold of 300 μg/ml has a balanced accuracy of 0.801 and an AUC-ROC of 0.875 (fig. 1B), which indicates its high quality for predicting the safety of compounds. The accuracy of using the selectivity index to predict compound rejection also relies on the defined threshold (fig. 1C).

The SI=200 threshold gave a balanced accuracy of 0.812 and an AUC-ROC score of 0.861. These results are better than the SI=8 threshold, which only reached a balanced accuracy of 0.682 and an AUC-ROC of 0.745. Although reference [18] suggests that compounds with an SI value over 8 show promise, we apply stricter criteria to build our training set. This approach allows you to select obviously active (and possibly less toxic compounds).

Running 5-fold cross-validation across all 10 random-split experiments produced consistent results (fig. 2).

The models are highly accurate at predicting outcomes, as shown by the relationship between their TPR and FPR. Thus, the performance graphs (ROC curves) stay consistent even when the data is split in different random ways. The average accuracy score (AUC) is higher than 0.7, which indicates a good predictive ability of the models.

The models proved their practical usefulness when tested on independent data. By using these three models together, it is possible to build a much better list of potential drugs. In test samples, the models accurately sorted 63 % of the compounds for IC50 and 75 % for SI and CC50. This study proves that our computer-based screening approach successfully identifies the most promising candidates. As a result, it greatly lowers the expenses on experimental screening.

DISCUSSION OF RESULTS

The conducted research shows that machine learning methodology can effectively solve the problems of ligand-oriented prediction of the antiviral potential of drug-like compounds using the SCEC method [15].

Using a carefully prepared database, classification (Q)SAR models have been created to estimate the antiviral efficacy, cytotoxicity, and safety profiles of compounds targeting the influenza A/H1N1 virus. During cross-validation, the best and most steady prediction models used specific thresholds: an IC50 threshold of 5 micrograms/ml, a CC50 threshold of 300 micrograms/ml, and an SI of 200.

To validate the models, an independent test set of 16 compounds was used. Their synthesis and biological testing are detailed in publications [1922]. Using SCEC models to filter candidates led to a major improvement. Up to 63 % of the chosen items showed strong promise for therapeutic use while keeping toxic side effects very low. We have demonstrated that this method is an effective tool for the initial screening and optimization of new anti-influenza drug candidates. The approach used, which does not require a priori knowledge of a specific molecular target, is universal and applicable at the early stages of research. The SCEC method improves data classification by combining feature selection with data balancing, making it highly effective for mixed and unequal datasets.

In addition to the validation stage, where each compound was tested individually, the predictive models were also evaluated on an independent test sample of just 16 compounds. Sourced from scientific literature, these compounds are absent from the database. Furthermore, they lack structural similarity to the compounds in the training set.

To successfully find and apply new antimicrobial compounds, future testing must use larger, more diverse training and test samples. This will cover a much wider variety of chemical types. Nevertheless, despite the limited sample size, the significance of the chemical space modeled remains evident. For instance, the presence of quinazoline (fig. 3A) or bicyclooctane (fig. 3B) cores in the training set improved the prediction of antiviral activity for independent test compounds. In our database, highly active compounds with a SI score over 200 include 36 structures with quinazoline and/or quinoline parts, 7 structures with a bicyclooctane core, and 246 structures that look like bicycloheptane. Conversely, the characteristics of structures in the independent test sample containing a dibenzothiepine core (fig. 3B) are predicted least correctly, since this complex heterocycle is absent in the structures of the training sample. Unlike all other predictions (green, :media_3), the model identified structures with a dibenzothiepine core (red) as falling outside the models’ applicability domain.

At the same time, it is interesting to note that in the test sample there are two compounds whose structure differs only by one atom: the replacement of the quinoline fragment with a quinazoline fragment leads to an increase in the antiviral effect (fig. 4). A smaller IC50 value leads to a higher selectivity index. Typically, the majority of predictive algorithms remain robust against such mathematically negligible structural modifications. This problem is discussed in detail in paper [16]. Our predictive model classifies the compounds in fig. 4 as active. They have an IC50 from 100 to 0.1 micrograms/ml, corresponding to experimental data [21]. The predictive model’s accuracy depends on your chosen threshold. Using a strict Structural Index (SI) threshold of 200, the model successfully identifies both structures as inactive. However, decreasing the cutoff point to SI = 8 leads to incorrect predictions (fig. 4).

This finding helps us define the boundaries of our predictive model. Nevertheless, applying a more rigorous cutoff for active compounds improves the likelihood of identifying promising candidates. Pharmacological guidelines state that compounds with a Selectivity Index (SI) over 8 show promise. However, the molecules in fig. 4 have SI values that are too low. Therefore, they are not effective as anti-flu drugs.

The testing outcomes of the predictive model are presented in fig. 5. In only two instances (compounds 10 and 13), the predicted IC50 values deviate from the established “conditionally active” range, conflicting with the empirical observations. Nevertheless, our models show that using a Selectivity Index (SI) threshold of 200 reliably identifies inactive structures. Specifically, compounds 513, 521, 8, 10, 11, 13, 20, 26, 30, and 33 had an SI below 200, classifying them as inactive. In contrast, cyperenoid acid and cyperenol showed an SI above 200, making them active. These wrong predictions belong to structures 38-S, 39-S, and 39-R. These structures show high selectivity index values. The CC50 prediction is also inaccurate for these compounds. This error is clearly due to the absence of compounds containing the dibenzothiepine core (fig. 3) in the reference database, as well as the structures failing to meet the target criteria during prediction.

Besides the clear benefits, the built models also have limits. These limits depend on the starting data and the chosen ligand-focused method. The core flaw of this technique stems from a shortage of data regarding the molecular pathways of the compounds. These models can predict if a compound fights viruses. However, they do not show exactly which part of the virus the compound targets, such as the neuraminidase, M2 protein, or polymerase complex. Initially, this approach is beneficial for early screening because it can identify compounds with novel or uncharacterized mechanisms of action. Conversely, this complicates efforts to validate the lead compounds, as doing so requires extensive experimental studies to identify a clear target. Furthermore, the models may remain vulnerable to compounds that act through non-specific or cell-destroying mechanisms, despite prior efforts to filter them out using a CC50 threshold. Like all machine learning models, these classifiers rely heavily on good, representative training data. The database we used was carefully prepared, yet it still contains varied types of chemicals. It also has an unequal number of samples (data imbalance), meaning it includes far more inactive compounds than active ones. Errors or big differences in your starting data (like having two sets of data for the exact same test) can make your computer models weak and inaccurate. The field of applicability of the models is limited by the chemical space represented in the training sample.

To overcome these limitations, several areas of further research can be identified. First, carefully build and update your training compounds with new test data. This covers more chemical space and keeps the model from overfitting (or just memorizing the data). Excluding hidden variants such as stereoisomers and tautomers improves sample homogeneity.

The accuracy and reliability of predictive models can be further assessed in the course of a prospective study during the synthesis and testing of molecules selected as a result of virtual screening.

Despite the existing limitations, the models presented in this paper form the basis for creating a constantly improving computational tool integrated into the cycle of search and development of new anti-influenza drugs. Ongoing efforts to broaden the database and verify in vitro methods will enable this platform to transform from a preliminary screening instrument into a robust decision-support system during advanced phases of pharmaceutical development.

CONCLUSIONS

We developed a ligand-oriented approach to predict the anti-influenza efficacy of drug-like molecules. Based on the novel SCEC classification method, this approach is effective regardless of the specific viral target. The main purpose of the study was to develop and validate predictive QSAR models based on SCEC to evaluate the antiviral potential of compounds against influenza virus strain A/H1N1. A specialized database (DB) was built for this research. It contains small molecule structures, their antiviral potency IC50 against various influenza A/H1N1 strains, and their toxicity CC50 in dog kidney (MDCK) cells. Using these data, we elaborated classification models and validated them with an independent test set of 16 chemically diverse compounds. This process demonstrates the practical value of the SCEC method for pre-screening candidates prior to synthesis and biological evaluation. Results indicate that 63 % of the compounds were classified accurately.

The application of the original machine learning approach implemented in the SCEC tool to the selection of potentially safe and effective antiviral compounds active against influenza viruses allowed us to demonstrate that the developed models and applied methodology represent an effective toolkit for accelerated search for new compounds with anti-influenza activity and can be integrated into the cycle of rational drug design.

КОММЕНТАРИИ (0)