In this group, KNN is a technical widely used when the goal is this. The KNN technical allows use the samples or clusters to identify another samples or clusters. For this is necessary to calculate the distances between them, using a Euclidean or Mahalanobis or Manhattan distance, for example. The minimum distance is calculated and the object is assigned to the corresponding class. A classification is dependent on the number of objects in each class. The development of new drugs is a continuous challenge, before uncountable diseases the lack an adequate pharmaceutical approach.
Multivariate analysis - Wikipedia
The modern medicinal chemists concern specially with methods based upon rational and quantitative procedures, aiming to focus on potentially efficient candidates. In that context, the use of chemometric methods is very important, in quantitative structure-activity relationship QSAR studies, and it presupposes that the biological activity BA , measured through a biological response BR , keeps a relationship with chemical structure CS :. There are several physico-chemical descriptors, useful in QSAR studies that can be divided in categories: constitutional, topological, stereochemical and electronic ones, beside the so called indicator variables.
This kind of descriptor is related to the presence of structural characteristics that can affect the BA, such as: amount of unsaturated bonds, amount of hydrogen-bond donors, average ring size, etc. These are descriptors that represent shape and connectivity, such as: ramifications, spacing groups, unsaturations, etc. The Kier [ 5 ] and Wiener [ 6 ] descriptors are typical. Steric descriptors exist to describe effects related to the size of chemical groups and hindrance behavior.
Taft steric descriptor, Es ,[ 7 ] is a common example.
These variables are related to molecular electronic densities, and are used to be calculated by quantum methods. One can mention as examples: dipole moments, atomic partial charges, highest occupied molecular orbital energy HOMO and lowest unoccupied molecular orbital energy LUMO. Indicator variables represent a useful way to convert a qualitative information into quantitative once, just as the occurrence of some kind of structural feature — setting 1 when this feature is present, and 0 otherwise.
Chemometric statistical methods find in QSAR a large application field, considering that the multivariate problems are inherent to it. Those methods aim the grouping and classification of compounds and variables in classes or categories that share resemblances, and are very interesting in pattern recognition situations and in dimensionality reduction of complex systems. Principal component PCs methods aim to combine correlated variables, projecting them in a new coordinate system, so that fewer variables are obtains, without any intercorrelation.
The former coordinates are projects in a new axis system, in which the system variability is maximum along PC1, decreasing along the other axises PC2, PC Thus, from a multi-variable universe, commonly multicolinear, one can obtain a simpler system with almost the same amount of information.
The matrix T is of scores, and represents the position of the compounds in aa novel coordinate system in which the components are its axises, and L is the loading matrix. Plotting the PCs instead of the original descriptors, one obtains groups governed by the similarities among the data. This analysis is also useful to the classification of compounds, permitting visually distinguish the patterns and cluster.
The plot resembling a tree, called dendogram, presents similar compounds at the same branches. Those branches are plotted based upon a similarity matrix, S, and each component of it is given by the similarity index between two samples k and l , Skl :. In this expression, d kl is the Euclidian distance between k and l , and d max , the maximum distance. Using electronic descriptors, it was possible to distinguish active from inactive compounds Figure 4.
The loadings values indicate that the presence of high-density groups in side chain and terminal positions favours activity. The same profile arise from the dendogram analysis.
The Annals of Mathematical Statistics
The model statistical validation is very important, and it requires the consistency in the Di descriptors unit, as well as in values magnitude necessarily,. Statistical parameter like the fitting coefficient r , the sample standard deviation s , the cross-validation coefficient q 2 and the Fischer test F are used in this task. This is a common problem in multi-descriptor system that may be dealed with other regression methods.
In order to avoid multicollinearity, it is possible to make the regression, not with the descriptors themselves, but with their principal components PCs generated in a PCA treatment. The main advantage of this approach is the assurance that every variable are independent and no n-correlated, despite it is necessary to analyze the loading matrix L. In this kind of regression, the variables are defined to maximize the descriptor matrix variance, without force a correlation with the BR.
Similarly to PCR, the PCs are employed, but in this case, the BR matrix has maximum variability, so that each loading matrix component L is a good predictor for each BR matrix component. This is the most used regression method, and it is adequate for dealing with 3D-QSAR problems, in which a set of compounds preciously aligned is put within a grid of interaction points with a molecular probe.
Each point energy is a variable in the QSAR equation, which are by their turn corrlated with the BR to achieve a tridimensional profile of the critical sites that favours or disfavours the interaction with a hypothetical biological receptor. The exploration for new sources of energy such as biodiesel is of great importance today as well as their production processes. The factorial design is an important tool to reduce the search time, waste of reagents and hence operating costs [ 10 ].
A factorial design is performed with the interest to determine the experimental variables and interactions between variables that have significant influence on the different responses of interest [ 11 ].
After selecting the significant variables, we must evaluate the experimental methodology and the influence of a particular variable on the yield of the reaction, a statistical experimental design, full factorial type, in which the independent variables are: the nature and concentration of catalyst temperature and the molar ratio between alcohol and oil and the dependent variable is the yield of esters produced. The variables that were not selected must be fixed throughout the experiment [ 12 ].
In a subsequent step must be chosen which planning used for estimating the effect the effect of the different variables results in a reduced number of conducting experiments. In the screening study the interactions between the variables main interactions and second order, usually obtained by full or fractional factorial designs.
In the experiments are evaluated best experimental conditions, as well as their simultaneous effects that influence the yield of the reaction are therefore extremely important for understanding the behavior of the system [ 13 ]. The values of "p" and greater than or equal to 0. The Figure. Pareto chart of the resulting fractional factorial design to evaluate the effects of each variable and their interactions in the reaction yield. The analysis parameters obtained by means of multivariate optimization consists in choosing the conditions for preliminary assessment of experimental variables fractional factorial design followed by a response surface methodology central composite design made from the screening of the variables that may affect the synthesis of biodiesel.
Generated model and the set of significant effects can evaluate through the study of response surface methodology, as shown in Figure 7 and 8 , and their interference in the response, ie the yield of the reaction, in which the dark area demonstrates the conditions that process has higher yield. This chapter had as aim to show the versatility tools chemometrics in several areas.
- Passar bra ihop.
- More Than Sparrows (The Larkspur Valley Series Book 1).
- Search form;
- Navigation menu.
- Love Letters From God…spoken to the heart of a servant;
Was showed application chemometrics theory in drug design, natural products chemistry but it is not limited in theses area. Well, we hope to have expanded the range of chemometrics. Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3. Help us write another book on this subject and reach those readers.
Login to your personal dashboard for more detailed statistics on your publications. Edited by Leandro Freitas. Edited by Pengzhong Li. We are IntechOpen, the world's leading publisher of Open Access books. The book presents important tools and techniques for treating problems in m- ern multivariate statistics in a systematic way. The ambition is to indicate new directions as well as to present the classical part of multivariate statistical analysis Du kanske gillar.
- Riders of The Purple Sage with Rainbow Trail: Classic American Western Novel (Illustrated).
- Cultural Studies in Question?
- Customer Reviews.
- What is Kobo Super Points?.
- The San Francisco of Alfred Hitchcocks Vertigo: Place, Pilgrimage, and Commemoration.
- The greatest white shark story ever told My Friend Michale a true story about the Real Jaws (WARD & his family Book 1)!
- Sword Play: Forgotten Realms (Netheril Trilogy).
Oligopoly Tonu Puu Inbunden. Disequilibrium Economics Tonu Puu Inbunden. Inbunden Engelska,
Related Multivariate Statistics: Theory and Applications
Copyright 2019 - All Right Reserved