يعرض 1 - 10 نتائج من 122 نتيجة بحث عن '"Džeroski, Sašo"', وقت الاستعلام: 1.52s تنقيح النتائج
  1. 1
    دورية أكاديمية

    المصدر: Cell Discovery; 1/16/2024, Vol. 10 Issue 1, p1-15, 15p

    مستخلص: The regulation of protein function by external or internal signals is one of the key features of living organisms. The ability to directly control the function of a selected protein would represent a valuable tool for regulating biological processes. Here, we present a generally applicable regulation of proteins called INSRTR, based on inserting a peptide into a loop of a target protein that retains its function. We demonstrate the versatility and robustness of coiled-coil-mediated regulation, which enables designs for either inactivation or activation of selected protein functions, and implementation of two-input logic functions with rapid response in mammalian cells. The selection of insertion positions in tested proteins was facilitated by using a predictive machine learning model. We showcase the robustness of the INSRTR strategy on proteins with diverse folds and biological functions, including enzymes, signaling mediators, DNA binders, transcriptional regulators, reporters, and antibody domains implemented as chimeric antigen receptors in T cells. Our findings highlight the potential of INSRTR as a powerful tool for precise control of protein function, advancing our understanding of biological processes and developing biotechnological and therapeutic interventions. [ABSTRACT FROM AUTHOR]

    : Copyright of Cell Discovery is the property of Springer Nature and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)

  2. 2
    دورية أكاديمية

    المصدر: Scientific Data; 11/20/2023, Vol. 10 Issue 1, p1-13, 13p

    مصطلحات جغرافية: YUCATAN (Mexico : State)

    الشركة/الكيان: SENTINEL-1 (Artificial satellite)

    مستخلص: In our study, we set out to collect a multimodal annotated dataset for remote sensing of Maya archaeology, that is suitable for deep learning. The dataset covers the area around Chactún, one of the largest ancient Maya urban centres in the central Yucatán Peninsula. The dataset includes five types of data records: raster visualisations and canopy height model from airborne laser scanning (ALS) data, Sentinel-1 and Sentinel-2 satellite data, and manual data annotations. The manual annotations (used as binary masks) represent three different types of ancient Maya structures (class labels: buildings, platforms, and aguadas – artificial reservoirs) within the study area, their exact locations, and boundaries. The dataset is ready for use with machine learning, including convolutional neural networks (CNNs) for object recognition, object localization (detection), and semantic segmentation. We would like to provide this dataset to help more research teams develop their own computer vision models for investigations of Maya archaeology or improve existing ones. [ABSTRACT FROM AUTHOR]

    : Copyright of Scientific Data is the property of Springer Nature and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)

  3. 3
    دورية أكاديمية

    المصدر: Machine Learning; Nov2023, Vol. 112 Issue 11, p4563-4596, 34p

    مصطلحات موضوعية: DEEP learning, MACHINE learning, EVOLUTIONARY algorithms

    مستخلص: We propose an approach to symbolic regression based on a novel variational autoencoder for generating hierarchical structures, HVAE. It combines simple atomic units with shared weights to recursively encode and decode the individual nodes in the hierarchy. Encoding is performed bottom-up and decoding top-down. We empirically show that HVAE can be trained efficiently with small corpora of mathematical expressions and can accurately encode expressions into a smooth low-dimensional latent space. The latter can be efficiently explored with various optimization methods to address the task of symbolic regression. Indeed, random search through the latent space of HVAE performs better than random search through expressions generated by manually crafted probabilistic grammars for mathematical expressions. Finally, EDHiE system for symbolic regression, which applies an evolutionary algorithm to the latent space of HVAE, reconstructs equations from a standard symbolic regression benchmark better than a state-of-the-art system based on a similar combination of deep learning and evolutionary algorithms. [ABSTRACT FROM AUTHOR]

    : Copyright of Machine Learning is the property of Springer Nature and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)

  4. 4
    دورية أكاديمية

    المصدر: Machine Learning; Nov2023, Vol. 112 Issue 11, p4379-4408, 30p

    مستخلص: The data used for analysis are becoming increasingly complex along several directions: high dimensionality, number of examples and availability of labels for the examples. This poses a variety of challenges for the existing machine learning methods, related to analyzing datasets with a large number of examples that are described in a high-dimensional space, where not all examples have labels provided. For example, when investigating the toxicity of chemical compounds, there are many compounds available that can be described with information-rich high-dimensional representations, but not all of the compounds have information on their toxicity. To address these challenges, we propose methods for semi-supervised learning (SSL) of feature rankings. The feature rankings are learned in the context of classification and regression, as well as in the context of structured output prediction (multi-label classification, MLC, hierarchical multi-label classification, HMLC and multi-target regression, MTR) tasks. This is the first work that treats the task of feature ranking uniformly across various tasks of semi-supervised structured output prediction. To the best of our knowledge, it is also the first work on SSL of feature rankings for the tasks of HMLC and MTR. More specifically, we propose two approaches—based on predictive clustering tree ensembles and the Relief family of algorithms—and evaluate their performance across 38 benchmark datasets. The extensive evaluation reveals that rankings based on Random Forest ensembles perform the best for classification tasks (incl. MLC and HMLC tasks) and are the fastest for all tasks, while ensembles based on extremely randomized trees work best for the regression tasks. Semi-supervised feature rankings outperform their supervised counterparts across the majority of datasets for all of the different tasks, showing the benefit of using unlabeled in addition to labeled data. [ABSTRACT FROM AUTHOR]

    : Copyright of Machine Learning is the property of Springer Nature and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)

  5. 5
    دورية أكاديمية

    المؤلفون: Zhou, Naihui, Jiang, Yuxiang, Bergquist, Timothy R, Lee, Alexandra J, Kacsoh, Balint Z, Crocker, Alex W, Lewis, Kimberley A, Georghiou, George, Nguyen, Huy N, Hamid, Md Nafiz, Davis, Larry, Dogan, Tunca, Atalay, Volkan, Rifaioglu, Ahmet S, Dalkıran, Alperen, Cetin Atalay, Rengul, Zhang, Chengxin, Hurto, Rebecca L, Freddolino, Peter L, Zhang, Yang, Bhat, Prajwal, Supek, Fran, Fernández, José M, Gemovic, Branislava, Perovic, Vladimir R, Davidović, Radoslav S, Sumonja, Neven, Veljkovic, Nevena, Asgari, Ehsaneddin, Mofrad, Mohammad R K, Profiti, Giuseppe, Savojardo, Castrense, Martelli, Pier Luigi, Casadio, Rita, Boecker, Florian, Schoof, Heiko, Kahanda, Indika, Thurlby, Natalie, McHardy, Alice C, Renaux, Alexandre, Saidi, Rabie, Gough, Julian, Freitas, Alex A, Antczak, Magdalena, Fabris, Fabio, Wass, Mark N, Hou, Jie, Cheng, Jianlin, Wang, Zheng, Romero, Alfonso E, Paccanaro, Alberto, Yang, Haixuan, Goldberg, Tatyana, Zhao, Chenguang, Holm, Liisa, Törönen, Petri, Medlar, Alan J, Zosa, Elaine, Borukhov, Itamar, Novikov, Ilya, Wilkins, Angela, Lichtarge, Olivier, Chi, Po-Han, Tseng, Wei-Cheng, Linial, Michal, Rose, Peter W, Dessimoz, Christophe, Vidulin, Vedrana, Dzeroski, Saso, Sillitoe, Ian, Das, Sayoni, Lees, Jonathan Gill, Jones, David T, Wan, Cen, Cozzetto, Domenico, Fa, Rui, Torres, Mateo, Warwick Vesztrocy, Alex, Rodriguez, Jose Manuel, Tress, Michael L, Frasca, Marco, Notaro, Marco, Grossi, Giuliano, Petrini, Alessandro, Re, Matteo, Valentini, Giorgio, Mesiti, Marco, Roche, Daniel B, Reeb, Jonas, Ritchie, David W, Aridhi, Sabeur, Alborzi, Seyed Ziaeddin, Devignes, Marie-Dominique, Koo, Da Chen Emily, Bonneau, Richard, Gligorijević, Vladimir, Barot, Meet, Fang, Hai, Toppo, Stefano, Lavezzo, Enrico, Falda, Marco, Berselli, Michele, Tosatto, Silvio C E, Carraro, Marco, Piovesan, Damiano, Ur Rehman, Hafeez, Mao, Qizhong, Zhang, Shanshan, Vucetic, Slobodan, Black, Gage S, Jo, Dane, Suh, Erica, Dayton, Jonathan B, Larsen, Dallas J, Omdahl, Ashton R, McGuffin, Liam J, Brackenridge, Danielle A, Babbitt, Patricia C, Yunes, Jeffrey M, Fontana, Paolo, Zhang, Feng, Zhu, Shanfeng, You, Ronghui, Zhang, Zihan, Dai, Suyang, Yao, Shuwei, Tian, Weidong, Cao, Renzhi, Chandler, Caleb, Amezola, Miguel, Johnson, Devon, Chang, Jia-Ming, Liao, Wen-Hung, Liu, Yi-Wei, Pascarelli, Stefano, Frank, Yotam, Hoehndorf, Robert, Kulmanov, Maxat, Boudellioua, Imene, Politano, Gianfranco, Di Carlo, Stefano, Benso, Alfredo, Hakala, Kai, Ginter, Filip, Mehryary, Farrokh, Kaewphan, Suwisa, Björne, Jari, Moen, Hans, Tolvanen, Martti E E, Salakoski, Tapio, Kihara, Daisuke, Jain, Aashish, Šmuc, Tomislav, Altenhoff, Adrian, Ben-Hur, Asa, Rost, Burkhard, Brenner, Steven E, Orengo, Christine A, Jeffery, Constance J, Bosco, Giovanni, Hogan, Deborah A, Martin, Maria J, O'Donovan, Claire, Mooney, Sean D, Greene, Casey S, Radivojac, Predrag, Friedberg, Iddo

    المساهمون: Bio-Ontology Research Group (BORG), Computational Bioscience Research Center (CBRC), Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, IA, USA., Indiana University Bloomington, Bloomington, Indiana, USA., Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA., Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA, USA., Geisel School of Medicine at Dartmouth, Hanover, NH, USA., Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Hanover, NH, USA., European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, United Kingdom., Program in Bioinformatics and Computational Biology, Ames, IA, USA., Department of Computer Engineering, Hacettepe University, Ankara, Turkey., Department of Computer Engineering, Middle East Technical University (METU), Ankara, Turkey., CanSyL, Graduate School of Informatics, Middle East Technical University, Ankara, Turkey., Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA., Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, USA., Achira Labs, Bangalore, India., Institute for Research in Biomedicine (IRB Barcelona), Barcelona, Spain., INB Coordination Unit, Life Sciences Department, Barcelona Supercomputing Center, Barcelona, Catalonia, Spain., Laboratory for Bioinformatics and Computational Chemistry, Institute of Nuclear Sciences VINCA, University of Belgrade, Belgrade, Serbia., Molecular Cell Biomechanics Laboratory, Departments of Bioengineering, University of California Berkeley, Berkeley, CA, USA., Departments of Bioengineering and Mechanical Engineering, Berkeley, CA, USA., Bologna Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy., University of Bonn: INRES Crop Bioinformatics, Bonn, North Rhine-Westphalia, Germany., INRES Crop Bioinformatics, University of Bonn, Bonn, Germany., Gianforte School of Computing, Montana State University, Bozeman, Montana, USA., University of Bristol, Computer Science, Bristol, Bristol, United Kingdom., Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Brunswick, Germany., Interuniversity Institute of Bioinformatics in Brussels, Université libre de Bruxelles - Vrije Universiteit Brussel, Brussels, Belgium., European Molecular Biolo gy Labora tory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK., MRC Laboratory of Molecular Biology, Cambridge, United Kingdom., University of Kent, School of Computing, Canterbury, United Kingdom., School of Biosciences, University of Kent, Canterbury, Kent, United Kingdom., University of Missouri, Computer Science, Columbia, Missouri, USA., Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA., University of Miami, Coral Gables, Florida, USA., Centre for Systems and Synthetic Biology, Department of Computer Science, Royal Holloway, University of London, Egham, Surrey, United Kingdom., School of Mathematics, Statistics and Applied Mathematics, National University of Ireland, Galway, Galway, Ireland., Department of Informatics, Bioinformatics & Computational Biology-i12, Technische Universitat Munchen, Munich, Germany., Faculty for Informatics, Garching, Germany., Institute of Biotechnology, Helsinki Institute of Life Sciences, University of Helsinki, Finland, Helsinki, Finland., Institute of Biotechnology, University of Helsinki, Helsinki, Finland., Compugen Ltd., Holon, Israel., Baylor College of Medicine, Department of Biochemistry and Molecular Biology, Houston, TX, USA., Baylor College of Medicine, Department of Molecular and Human Genetics, Houston, TX, USA., National TsingHua University, Hsinchu, Taiwan., Department of Electrical Engineering in National Tsing Hua University, Hsinchu City, Taiwan., The Hebrew University of Jerusalem, Jerusalem, Israel., University of California San Diego, San Diego Supercomputer Center, La Jolla, California, USA., Department of Computational Biology and Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland., Department of Knowledge Technologies, Jozef Stefan Institute, Ljubljana, Slovenia., Jozef Stefan Institute, Ljubljana, Slovenia., Research Department of Structural and Molecular Biology, University College London, London, England., Research Department of Structural and Molecular Biology, University College London, London, United Kingdom., The Francis Crick Institute, Biomedical Data Science Laboratory, London, United Kingdom., Department of Computer Science, University College London, London, United Kingdom., Department of Genetics, Evolution and Environment, University College London, Gower Street, London, WC1E 6BT, United Kingdom., Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), Madrid, Spain., Spanish National Cancer Research Centre (CNIO), Madrid, Spain., Università degli Studi di Milano - Computer Science Department - AnacletoLab, Milan, Milan, Italy., Department of Informatics, Bioinformatics and Computational Biology-i12, Technische Universitat Munchen, Munich, Germany., University of Lorraine, CNRS, Inria, LORIA, Nancy, 54000, France., Department of Biology, New York University, New York, NY, USA., NYU Center for Data Science, New York, 10010, NY, USA., Center for Computational Biology (CCB), Flatiron Institute, Simons Foundation, New York, New York, USA., Center for Data Science, New York University, New York, 10011, NY, USA., Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK., Department of Molecular Medicine, University of Padova, Padova, Italy., Department of Biology, University of Padova, Padova, Italy., CNR Institute of Neuroscience, Padova, Italy., Department of Biomedical Sciences, University of Padua, Padova, Italy., Department of Computer Science, National University of Computer and Emerging Sciences, Peshawar, Khyber Pakhtoonkhwa, Pakistan., Department of Computer and Information Sciences, Temple University, Philadelphia, PA, USA., Department of Biology, Brigham Young University, Provo, UT, USA., School of Biological Sciences, University of Reading, Reading, England, United Kingdom., Department of Pharmaceutical Chemistry, San Francisco, CA, USA., UC Berkeley - UCSF Graduate Program in Bioengineering, University of California, San Francisco, 94158, CA, USA., Research and Innovation Center, Edmund Mach Foundation, San Michele all'Adige, Italy., State Key Laboratory of Genetic Engineering and Collaborative Innovation Center for Genetics and Development, Fudan University, Shanghai, Shanghai, China., School of Computer Science and Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China., State Key Laboratory of Genetic Engineering and Collaborative Innovation Center for Genetics and Development, Department of Biostatistics and Computational Biology, School of Life Sciences, Fudan University, Shanghai, Shanghai, China., Department of Computer Science, Pacific Lutheran University, Tacoma, WA, USA., Department of Computer Science, National Chengchi University, Taipei, Taiwan., Okinawa Institute of Science and Technology, Tancha, Okinawa, Japan., Tel Aviv University, Tel Aviv, Israel., Control and Computer Engineering Department, Politecnico di Torino, Torino, TO, Italy., Department of Future Technologies, Turku NLP Group, University of Turku, Turku, Finland., Department of Future Technologies, Faculty of Science and Engineering, University of Turku, Turku, FI-20014, Finland., University of Turku, Turku, Finland., Department of Future Technologies, University of Turku, Turku, Finland., Department of Biological Sciences, Department of Computer Science, Purdue University, 47907, IN, USA., Department of Computer Science, Purdue University, West Lafayette, IN, USA., Division of Electronics, Rudjer Boskovic Institute, Zagreb, Croatia., Department of Computer Science, ETH Zurich, Zurich, Switzerland., Department of Computer Science, Colorado State University, Fort Collins, CO, USA., University of California, Berkeley, CA, USA., Biological Sciences, University of Illinois at Chicago, Chicago, Illinois, USA., Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover, NH, USA., Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.

    الوصف: BACKGROUND:The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. RESULTS:Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory. CONCLUSION:We conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens. ; NZ and IF acknowledge the invaluable input from Michael C Gerten and Shatabdi Sen and all members of the Friedberg Lab for the ongoing support for stimulating discussions. ; The work of IF was funded, in part, by the National Science Foundation award ...

    وصف الملف: application/pdf

    العلاقة: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1835-8Test; https://deepblue.lib.umich.edu/bitstream/2027.42/152164/1/13059_2019_Article_1835.pdfTest; https://genomebiology.biomedcentral.com/track/pdf/10.1186/s13059-019-1835-8Test; Zhou, N., Jiang, Y., Bergquist, T. R., Lee, A. J., Kacsoh, B. Z., Crocker, A. W., … Hamid, M. N. (2019). The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens. Genome Biology, 20(1). doi:10.1186/s13059-019-1835-8; Genome biology; http://hdl.handle.net/10754/660434Test

  6. 6
    دورية أكاديمية

    المصدر: Machine Learning; Apr2023, Vol. 112 Issue 4, p1337-1363, 27p

    مستخلص: Inversion of radiative transfer models (RTMs) is key to interpreting satellite observations of air quality and greenhouse gases, but is computationally expensive. Surrogate models that emulate the full forward physical RTM can speed up the simulation, reducing computational and timing costs and allowing the use of more advanced physics for trace gas retrievals. In this study, we present the development of surrogate models for two RTMs: the RemoTeC algorithm using the LINTRAN RTM and the SCIATRAN RTM. We estimate the intrinsic dimensionality of the input and output spaces and embed them in lower dimensional subspaces to facilitate the learning task. Two methods are tested for dimensionality reduction, autoencoders and principle component analysis (PCA), with PCA consistently outperforming autoencoders. Different sampling methods are employed for generating the training datasets: sampling focused on expected atmospheric parameters and latin hypercube sampling. The results show that models trained on the smaller (n = 1000) uniformly sampled dataset can perform as well as those trained on the larger (n = 50000), more focused dataset. Surrogate models for both datasets are able to accurately emulate Sentinel 5P spectra within a millisecond or less, as compared to the minutes or hours needed to simulate the full physical model. The SCIATRAN-trained forward surrogate models are able to generalize the emulation to a broader set of parameters and can be used for less constrained applications, while achieving a normalized RMSE of 7.3%. On the other hand, models trained on the LINTRAN dataset can completely replace the RTM simulation in more focused expected ranges of atmospheric parameters, as they achieve a normalized RMSE of 0.3%. [ABSTRACT FROM AUTHOR]

    : Copyright of Machine Learning is the property of Springer Nature and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)

  7. 7
    دورية أكاديمية

    المصدر: Journal of Cheminformatics; 9/15/2022, Vol. 14 Issue 1, p1-20, 20p

    مستخلص: Motivation: Compound structure identification is using increasingly more sophisticated computational tools, among which machine learning tools are a recent addition that quickly gains in importance. These tools, of which the method titled Compound Structure Identification:Input Output Kernel Regression (CSI:IOKR) is an excellent example, have been used to elucidate compound structure from mass spectral (MS) data with significant accuracy, confidence and speed. They have, however, largely focused on data coming from liquid chromatography coupled to tandem mass spectrometry (LC–MS). Gas chromatography coupled to mass spectrometry (GC–MS) is an alternative which offers several advantages as compared to LC–MS, including higher data reproducibility. Of special importance is the substantial compound coverage offered by GC–MS, further expanded by derivatization procedures, such as silylation, which can improve the volatility, thermal stability and chromatographic peak shape of semi-volatile analytes. Despite these advantages and the increasing size of compound databases and MS libraries, GC–MS data have not yet been used by machine learning approaches to compound structure identification. Results: This study presents a successful application of the CSI:IOKR machine learning method for the identification of environmental contaminants from GC–MS spectra. We use CSI:IOKR as an alternative to exhaustive search of MS libraries, independent of instrumental platform and data processing software. We use a comprehensive dataset of GC–MS spectra of trimethylsilyl derivatives and their molecular structures, derived from a large commercially available MS library, to train a model that maps between spectra and molecular structures. We test the learned model on a different dataset of GC–MS spectra of trimethylsilyl derivatives of environmental contaminants, generated in-house and made publicly available. The results show that 37% (resp. 50%) of the tested compounds are correctly ranked among the top 10 (resp. 20) candidate compounds suggested by the model. Even though spectral comparisons with reference standards or de novo structural elucidations are neccessary to validate the predictions, machine learning provides efficient candidate prioritization and reduction of the time spent for compound annotation. [ABSTRACT FROM AUTHOR]

    : Copyright of Journal of Cheminformatics is the property of Springer Nature and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)

  8. 8
    دورية أكاديمية

    المصدر: Scientific Data; 5/24/2022, Vol. 9 Issue 1, p1-8, 8p

    الشركة/الكيان: EUROPEAN Space Agency

    مستخلص: We present six datasets containing telemetry data of the Mars Express Spacecraft (MEX), a spacecraft orbiting Mars operated by the European Space Agency. The data consisting of context data and thermal power consumption measurements, capture the status of the spacecraft over three Martian years, sampled at six different time resolutions that range from 1 min to 60 min. From a data analysis point-of-view, these data are challenging even for the more sophisticated state-of-the-art artificial intelligence methods. In particular, given the heterogeneity, complexity, and magnitude of the data, they can be employed in a variety of scenarios and analyzed through the prism of different machine learning tasks, such as multi-target regression, learning from data streams, anomaly detection, clustering, etc. Analyzing MEX's telemetry data is critical for aiding very important decisions regarding the spacecraft's status and operation, extracting novel knowledge, and monitoring the spacecraft's health, but the data can also be used to benchmark artificial intelligence methods designed for a variety of tasks. Measurement(s) electric current Technology Type(s) current readings in spacecraft housekeeping telemetry Sample Characteristic - Environment outer space [ABSTRACT FROM AUTHOR]

    : Copyright of Scientific Data is the property of Springer Nature and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)

  9. 9
    دورية أكاديمية

    المصدر: Scientific Reports; 5/4/2022, Vol. 12 Issue 1, p1-11, 11p

    مستخلص: Multilabel classification (MLC) is a machine learning task where the goal is to learn to label an example with multiple labels simultaneously. It receives increasing interest from the machine learning community, as evidenced by the increasing number of papers and methods that appear in the literature. Hence, ensuring proper, correct, robust, and trustworthy benchmarking is of utmost importance for the further development of the field. We believe that this can be achieved by adhering to the recently emerged data management standards, such as the FAIR (Findable, Accessible, Interoperable, and Reusable) and TRUST (Transparency, Responsibility, User focus, Sustainability, and Technology) principles. We introduce an ontology-based online catalogue of MLC datasets originating from various application domains following these principles. The catalogue extensively describes many MLC datasets with comprehensible meta-features, MLC-specific semantic descriptions, and different data provenance information. The MLC data catalogue is available at: http://semantichub.ijs.si/MLCdatasetsTest. [ABSTRACT FROM AUTHOR]

    : Copyright of Scientific Reports is the property of Springer Nature and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)

  10. 10
    دورية أكاديمية

    المصدر: Machine Learning; Jan2022, Vol. 111 Issue 1, p273-317, 45p

    مصطلحات موضوعية: SPARSE matrices, MACHINE learning, ALGORITHMS

    مستخلص: Feature ranking has been widely adopted in machine learning applications such as high-throughput biology and social sciences. The approaches of the popular Relief family of algorithms assign importances to features by iteratively accounting for nearest relevant and irrelevant instances. Despite their high utility, these algorithms can be computationally expensive and not-well suited for high-dimensional sparse input spaces. In contrast, recent embedding-based methods learn compact, low-dimensional representations, potentially facilitating down-stream learning capabilities of conventional learners. This paper explores how the Relief branch of algorithms can be adapted to benefit from (Riemannian) manifold-based embeddings of instance and target spaces, where a given embedding's dimensionality is intrinsic to the dimensionality of the considered data set. The developed ReliefE algorithm is faster and can result in better feature rankings, as shown by our evaluation on 20 real-life data sets for multi-class and multi-label classification tasks. The utility of ReliefE for high-dimensional data sets is ensured by its implementation that utilizes sparse matrix algebraic operations. Finally, the relation of ReliefE to other ranking algorithms is studied via the Fuzzy Jaccard Index. [ABSTRACT FROM AUTHOR]

    : Copyright of Machine Learning is the property of Springer Nature and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)