{"entity": "journal", "iuid": "c6c199ec42364ba89753546daca03510", "timestamp": "2024-04-26T06:06:00.416Z", "links": {"self": {"href": "https://publications.scilifelab.se/journal/Gigascience.json"}, "display": {"href": "https://publications.scilifelab.se/journal/Gigascience"}}, "title": "Gigascience", "issn": "2047-217X", "issn-l": "2047-217X", "publications_count": 11, "publications": [{"entity": "publication", "iuid": "90f4fb69f7bd4582a7c2990e09b12c05", "links": {"self": {"href": "https://publications.scilifelab.se/publication/90f4fb69f7bd4582a7c2990e09b12c05.json"}, "display": {"href": "https://publications.scilifelab.se/publication/90f4fb69f7bd4582a7c2990e09b12c05"}}, "title": "Discovery of druggable cancer-specific pathways with application in acute myeloid leukemia.", "authors": [{"family": "Trac", "given": "Quang Thinh", "initials": "QT", "orcid": "0000-0003-2429-0287", "researcher": {"href": "https://publications.scilifelab.se/researcher/043294bad46e4ccaa3d0d7cd43ebdccd.json"}}, {"family": "Zhou", "given": "Tingyou", "initials": "T"}, {"family": "Pawitan", "given": "Yudi", "initials": "Y", "orcid": "0000-0003-0324-7052", "researcher": {"href": "https://publications.scilifelab.se/researcher/095052d8ea7c480b9c32b373795465b4.json"}}, {"family": "Vu", "given": "Trung Nghia", "initials": "TN", "orcid": "0000-0001-7945-5750", "researcher": {"href": "https://publications.scilifelab.se/researcher/d90993bc42694d969a24a50f21393b76.json"}}], "type": "journal article", "published": "2022-09-29", "journal": {"title": "Gigascience", "issn": "2047-217X", "volume": "11", "issn-l": "2047-217X"}, "abstract": "An individualized cancer therapy is ideally chosen to target the cancer's driving biological pathways, but identifying such pathways is challenging because of their underlying heterogeneity and there is no guarantee that they are druggable. We hypothesize that a cancer with an activated druggable cancer-specific pathway (DCSP) is more likely to respond to the relevant drug. Here we develop and validate a systematic method to search for such DCSPs, by (i) introducing a pathway activation score (PAS) that integrates cancer-specific driver mutations and gene expression profile and drug-specific gene targets, (ii) applying the method to identify DCSPs from pan-cancer datasets, and (iii) analyzing the correlation between PAS and the response to relevant drugs. In total, 4,794 DCSPs from 23 different cancers have been discovered in the Genomics of Drug Sensitivity in Cancer database and validated in The Cancer Genome Atlas database. Supporting the hypothesis, for the DCSPs in acute myeloid leukemia, cancers with higher PASs are shown to have stronger drug response, and this is validated in the BeatAML cohort. All DCSPs are publicly available at https://www.meb.ki.se/shiny/truvu/DCSP/.", "doi": "10.1093/gigascience/giac091", "pmid": "36173247", "labels": {"Bioinformatics Support for Computational Resources": "Service"}, "xrefs": [{"db": "pmc", "key": "PMC9520771"}, {"db": "pii", "key": "6730547"}], "notes": [], "created": "2023-11-27T21:53:02.313Z", "modified": "2024-01-16T13:48:34.955Z"}, {"entity": "publication", "iuid": "ac545b00aad54cdbb6eeaaed3124a362", "links": {"self": {"href": "https://publications.scilifelab.se/publication/ac545b00aad54cdbb6eeaaed3124a362.json"}, "display": {"href": "https://publications.scilifelab.se/publication/ac545b00aad54cdbb6eeaaed3124a362"}}, "title": "Benchmarking ultra-high molecular weight DNA preservation methods for long-read and long-range sequencing.", "authors": [{"family": "Dahn", "given": "Hollis A", "initials": "HA", "orcid": "0000-0001-9777-2303", "researcher": {"href": "https://publications.scilifelab.se/researcher/b82ad0bd1207419ca7c3d8792a7822df.json"}}, {"family": "Mountcastle", "given": "Jacquelyn", "initials": "J", "orcid": "0000-0003-1078-4905", "researcher": {"href": "https://publications.scilifelab.se/researcher/a35e17870a5c4572bcef476c32184824.json"}}, {"family": "Balacco", "given": "Jennifer", "initials": "J", "orcid": "0000-0001-7102-1632", "researcher": {"href": "https://publications.scilifelab.se/researcher/db20cd17c783475aa2e575af1984fb75.json"}}, {"family": "Winkler", "given": "Sylke", "initials": "S", "orcid": "0000-0002-0915-3316", "researcher": {"href": "https://publications.scilifelab.se/researcher/f292bbf542f244278bef19506b21b031.json"}}, {"family": "Bista", "given": "Iliana", "initials": "I", "orcid": "0000-0002-6155-3093", "researcher": {"href": "https://publications.scilifelab.se/researcher/b180ce2a74b94142b5eb05511c537b2f.json"}}, {"family": "Schmitt", "given": "Anthony D", "initials": "AD"}, {"family": "Pettersson", "given": "Olga Vinnere", "initials": "OV", "orcid": "0000-0002-5597-1870", "researcher": {"href": "https://publications.scilifelab.se/researcher/31689f508a984d0680d285c294669615.json"}}, {"family": "Formenti", "given": "Giulio", "initials": "G", "orcid": "0000-0002-7554-5991", "researcher": {"href": "https://publications.scilifelab.se/researcher/fdf78195993a481483d7cbbf1a6d64ed.json"}}, {"family": "Oliver", "given": "Karen", "initials": "K"}, {"family": "Smith", "given": "Michelle", "initials": "M", "orcid": "0000-0001-5288-0001", "researcher": {"href": "https://publications.scilifelab.se/researcher/a81888e8101a42e4a8aba00d6707b940.json"}}, {"family": "Tan", "given": "Wenhua", "initials": "W", "orcid": "0000-0002-5208-8126", "researcher": {"href": "https://publications.scilifelab.se/researcher/b57b8e811974434590b0e7159ff47a41.json"}}, {"family": "Kraus", "given": "Anne", "initials": "A"}, {"family": "Mac", "given": "Stephen", "initials": "S"}, {"family": "Komoroske", "given": "Lisa M", "initials": "LM", "orcid": "0000-0003-0676-7053", "researcher": {"href": "https://publications.scilifelab.se/researcher/cc12b38235894a74aab116b986d83f88.json"}}, {"family": "Lama", "given": "Tanya", "initials": "T", "orcid": "0000-0002-7372-8081", "researcher": {"href": "https://publications.scilifelab.se/researcher/8dfe6dd10e38419f8a2dc55c93780c34.json"}}, {"family": "Crawford", "given": "Andrew J", "initials": "AJ", "orcid": "0000-0003-3153-6898", "researcher": {"href": "https://publications.scilifelab.se/researcher/d3d8b6cfd6704ab8b4396ab35d68f49a.json"}}, {"family": "Murphy", "given": "Robert W", "initials": "RW", "orcid": "0000-0001-8555-2338", "researcher": {"href": "https://publications.scilifelab.se/researcher/e7cdddea1d2e4c9aa19bca8bb4e48f54.json"}}, {"family": "Brown", "given": "Samara", "initials": "S", "orcid": "0000-0003-0391-2016", "researcher": {"href": "https://publications.scilifelab.se/researcher/1dc7d389ee6245dc99480159b31ed88f.json"}}, {"family": "Scott", "given": "Alan F", "initials": "AF", "orcid": "0000-0002-9706-7839", "researcher": {"href": "https://publications.scilifelab.se/researcher/0eabc1a858ca4011b1ef57e6ea0b537b.json"}}, {"family": "Morin", "given": "Phillip A", "initials": "PA", "orcid": "0000-0002-3279-1519", "researcher": {"href": "https://publications.scilifelab.se/researcher/cde9e06251774274a55f089acd3be6ab.json"}}, {"family": "Jarvis", "given": "Erich D", "initials": "ED", "orcid": "0000-0001-8931-5049", "researcher": {"href": "https://publications.scilifelab.se/researcher/d565d5e1788e484d9d2da61af12f2120.json"}}, {"family": "Fedrigo", "given": "Olivier", "initials": "O", "orcid": "0000-0002-6450-7551", "researcher": {"href": "https://publications.scilifelab.se/researcher/fae69dfab7d841d4850da7349714cd9c.json"}}], "type": "journal article", "published": "2022-08-10", "journal": {"title": "Gigascience", "issn": "2047-217X", "issn-l": "2047-217X", "volume": "11", "issue": null, "pages": null}, "abstract": "Studies in vertebrate genomics require sampling from a broad range of tissue types, taxa, and localities. Recent advancements in long-read and long-range genome sequencing have made it possible to produce high-quality chromosome-level genome assemblies for almost any organism. However, adequate tissue preservation for the requisite ultra-high molecular weight DNA (uHMW DNA) remains a major challenge. Here we present a comparative study of preservation methods for field and laboratory tissue sampling, across vertebrate classes and different tissue types.\r\n\r\nWe find that storage temperature was the strongest predictor of uHMW fragment lengths. While immediate flash-freezing remains the sample preservation gold standard, samples preserved in 95% EtOH or 20-25% DMSO-EDTA showed little fragment length degradation when stored at 4\u00b0C for 6 hours. Samples in 95% EtOH or 20-25% DMSO-EDTA kept at 4\u00b0C for 1 week after dissection still yielded adequate amounts of uHMW DNA for most applications. Tissue type was a significant predictor of total DNA yield but not fragment length. Preservation solution had a smaller but significant influence on both fragment length and DNA yield.\r\n\r\nWe provide sample preservation guidelines that ensure sufficient DNA integrity and amount required for use with long-read and long-range sequencing technologies across vertebrates. Our best practices generated the uHMW DNA needed for the high-quality reference genomes for phase 1 of the Vertebrate Genomes Project, whose ultimate mission is to generate chromosome-level reference genome assemblies of all \u223c70,000 extant vertebrate species.", "doi": "10.1093/gigascience/giac068", "pmid": "35946988", "labels": {"NGI Uppsala (Uppsala Genome Center)": "Technology development", "NGI Long read": "Technology development", "National Genomics Infrastructure": "Technology development"}, "xrefs": [{"db": "pii", "key": "6659719"}, {"db": "pmc", "key": "PMC9364683"}], "notes": [], "created": "2022-11-21T10:23:04.540Z", "modified": "2022-11-21T10:23:30.973Z"}, {"entity": "publication", "iuid": "31af97f3e7794e698898124e67568507", "links": {"self": {"href": "https://publications.scilifelab.se/publication/31af97f3e7794e698898124e67568507.json"}, "display": {"href": "https://publications.scilifelab.se/publication/31af97f3e7794e698898124e67568507"}}, "title": "Arteria: An automation system for a sequencing core facility.", "authors": [{"family": "Dahlberg", "given": "Johan", "initials": "J"}, {"family": "Hermansson", "given": "Johan", "initials": "J"}, {"family": "Sturlaugsson", "given": "Steinar", "initials": "S"}, {"family": "Lysenkova", "given": "Mariya", "initials": "M"}, {"family": "Smeds", "given": "Patrik", "initials": "P"}, {"family": "Ladenvall", "given": "Claes", "initials": "C"}, {"family": "Guimera", "given": "Roman Valls", "initials": "RV"}, {"family": "Reisinger", "given": "Florian", "initials": "F"}, {"family": "Hofmann", "given": "Oliver", "initials": "O"}, {"family": "Larsson", "given": "Pontus", "initials": "P"}], "type": "journal article", "published": "2019-12-01", "journal": {"volume": "8", "issn": "2047-217X", "issue": "12", "pages": null, "title": "Gigascience", "issn-l": "2047-217X"}, "abstract": "In recent years, nucleotide sequencing has become increasingly instrumental in both research and clinical settings. This has led to an explosive growth in sequencing data produced worldwide. As the amount of data increases, so does the need for automated solutions for data processing and analysis. The concept of workflows has gained favour in the bioinformatics community, but there is little in the scientific literature describing end-to-end automation systems. Arteria is an automation system that aims at providing a solution to the data-related operational challenges that face sequencing core facilities.\n\nArteria is built on existing open source technologies, with a modular design allowing for a community-driven effort to create plug-and-play micro-services. In this article we describe the system, elaborate on the underlying conceptual framework, and present an example implementation. Arteria can be reduced to 3 conceptual levels: orchestration (using an event-based model of automation), process (the steps involved in processing sequencing data, modelled as workflows), and execution (using a series of RESTful micro-services). This creates a system that is both flexible and scalable. Arteria-based systems have been successfully deployed at 3 sequencing core facilities. The Arteria Project code, written largely in Python, is available as open source software, and more information can be found at https://arteria-project.github.io/ .\n\nWe describe the Arteria system and the underlying conceptual framework, demonstrating how this model can be used to automate data handling and analysis in the context of a sequencing core facility.", "doi": "10.1093/gigascience/giz135", "pmid": "31825479", "labels": {"National Genomics Infrastructure": "Technology development", "NGI Uppsala (SNP&SEQ Technology Platform)": "Technology development", "Clinical Genomics Uppsala": "Technology development"}, "xrefs": [{"db": "pii", "key": "5673459"}, {"db": "pmc", "key": "PMC6905352"}], "notes": [], "created": "2020-01-08T12:39:25.028Z", "modified": "2021-07-08T12:39:20.845Z"}, {"entity": "publication", "iuid": "147a11e9aff7495293bd505f345a38a1", "links": {"self": {"href": "https://publications.scilifelab.se/publication/147a11e9aff7495293bd505f345a38a1.json"}, "display": {"href": "https://publications.scilifelab.se/publication/147a11e9aff7495293bd505f345a38a1"}}, "title": "SciPipe: A workflow library for agile development of complex and dynamic bioinformatics pipelines.", "authors": [{"family": "Lampa", "given": "Samuel", "initials": "S", "orcid": "0000-0001-6740-9212", "researcher": {"href": "https://publications.scilifelab.se/researcher/d4be992e2eed49c8ba1aaa0a4319ec24.json"}}, {"family": "Dahl\u00f6", "given": "Martin", "initials": "M", "orcid": "0000-0001-5447-9465", "researcher": {"href": "https://publications.scilifelab.se/researcher/32395b4dcca540a7a997b88ed20c9252.json"}}, {"family": "Alvarsson", "given": "Jonathan", "initials": "J", "orcid": "0000-0002-8682-7206", "researcher": {"href": "https://publications.scilifelab.se/researcher/e8eb231a432647ee89b22e3e0dbbc651.json"}}, {"family": "Spjuth", "given": "Ola", "initials": "O", "orcid": "0000-0002-8083-2864", "researcher": {"href": "https://publications.scilifelab.se/researcher/605dbd52684d4e54ae4150a9933abe6e.json"}}], "type": "journal article", "published": "2019-05-01", "journal": {"volume": "8", "issn": "2047-217X", "issue": "5", "pages": null, "title": "Gigascience", "issn-l": "2047-217X"}, "abstract": "The complex nature of biological data has driven the development of specialized software tools. Scientific workflow management systems simplify the assembly of such tools into pipelines, assist with job automation, and aid reproducibility of analyses. Many contemporary workflow tools are specialized or not designed for highly complex workflows, such as with nested loops, dynamic scheduling, and parametrization, which is common in, e.g., machine learning.\n\nSciPipe is a workflow programming library implemented in the programming language Go, for managing complex and dynamic pipelines in bioinformatics, cheminformatics, and other fields. SciPipe helps in particular with workflow constructs common in machine learning, such as extensive branching, parameter sweeps, and dynamic scheduling and parametrization of downstream tasks. SciPipe builds on flow-based programming principles to support agile development of workflows based on a library of self-contained, reusable components. It supports running subsets of workflows for improved iterative development and provides a data-centric audit logging feature that saves a full audit trace for every output file of a workflow, which can be converted to other formats such as HTML, TeX, and PDF on demand. The utility of SciPipe is demonstrated with a machine learning pipeline, a genomics, and a transcriptomics pipeline.\n\nSciPipe provides a solution for agile development of complex and dynamic pipelines, especially in machine learning, through a flexible application programming interface suitable for scientists used to programming or scripting.", "doi": "10.1093/gigascience/giz044", "pmid": "31029061", "labels": {"Bioinformatics Support, Infrastructure and Training": "Technology development", "Bioinformatics Support and Infrastructure": "Technology development", "Bioinformatics Support for Computational Resources": "Service"}, "xrefs": [{"db": "pii", "key": "5480570"}, {"db": "pmc", "key": "PMC6486472"}, {"db": "figshare", "key": "10.6084/m9.figshare.3985674"}], "notes": [], "created": "2019-04-29T07:14:49.754Z", "modified": "2024-01-16T13:48:44.374Z"}, {"entity": "publication", "iuid": "b9b342441fc24f6b9cd0aa485a815d2c", "links": {"self": {"href": "https://publications.scilifelab.se/publication/b9b342441fc24f6b9cd0aa485a815d2c.json"}, "display": {"href": "https://publications.scilifelab.se/publication/b9b342441fc24f6b9cd0aa485a815d2c"}}, "title": "Comparative analyses identify genomic features potentially involved in the evolution of birds-of-paradise.", "authors": [{"family": "Prost", "given": "Stefan", "initials": "S"}, {"family": "Armstrong", "given": "Ellie E", "initials": "EE"}, {"family": "Nylander", "given": "Johan", "initials": "J"}, {"family": "Thomas", "given": "Gregg W C", "initials": "GWC"}, {"family": "Suh", "given": "Alexander", "initials": "A"}, {"family": "Petersen", "given": "Bent", "initials": "B"}, {"family": "Dalen", "given": "Love", "initials": "L", "orcid": "0000-0001-8270-7613", "researcher": {"href": "https://publications.scilifelab.se/researcher/48ecf726779249ac9d12f4f7a1cc62bf.json"}}, {"family": "Benz", "given": "Brett W", "initials": "BW"}, {"family": "Blom", "given": "Mozes P K", "initials": "MPK"}, {"family": "Palkopoulou", "given": "Eleftheria", "initials": "E"}, {"family": "Ericson", "given": "Per G P", "initials": "PGP"}, {"family": "Irestedt", "given": "Martin", "initials": "M"}], "type": "journal article", "published": "2019-05-01", "journal": {"volume": "8", "issn": "2047-217X", "issue": "5", "title": "Gigascience", "issn-l": "2047-217X"}, "abstract": "The diverse array of phenotypes and courtship displays exhibited by birds-of-paradise have long fascinated scientists and nonscientists alike. Remarkably, almost nothing is known about the genomics of this iconic radiation. There are 41 species in 16 genera currently recognized within the birds-of-paradise family (Paradisaeidae), most of which are endemic to the island of New Guinea. In this study, we sequenced genomes of representatives from all five major clades within this family to characterize genomic changes that may have played a role in the evolution of the group's extensive phenotypic diversity. We found genes important for coloration, morphology, and feather and eye development to be under positive selection. In birds-of-paradise with complex lekking systems and strong sexual dimorphism, the core birds-of-paradise, we found Gene Ontology categories for \"startle response\" and \"olfactory receptor activity\" to be enriched among the gene families expanding significantly faster compared to the other birds in our study. Furthermore, we found novel families of retrovirus-like retrotransposons active in all three de novo genomes since the early diversification of the birds-of-paradise group, which might have played a role in the evolution of this fascinating group of birds.", "doi": "10.1093/gigascience/giz003", "pmid": "30689847", "labels": {"National Genomics Infrastructure": "Service", "NGI Stockholm (Genomics Applications)": "Service", "NGI Stockholm (Genomics Production)": "Service", "Bioinformatics Support for Computational Resources": "Service"}, "xrefs": [{"db": "pii", "key": "5300102"}, {"db": "pmc", "key": "PMC6497032"}], "notes": [], "created": "2019-12-02T17:19:31.826Z", "modified": "2024-01-16T13:48:44.395Z"}, {"entity": "publication", "iuid": "ebb31f25c498403385dee32833ae650a", "links": {"self": {"href": "https://publications.scilifelab.se/publication/ebb31f25c498403385dee32833ae650a.json"}, "display": {"href": "https://publications.scilifelab.se/publication/ebb31f25c498403385dee32833ae650a"}}, "title": "Software engineering for scientific big data analysis.", "authors": [{"family": "Gr\u00fcning", "given": "Bj\u00f6rn A", "initials": "BA", "orcid": "0000-0002-3079-6586", "researcher": {"href": "https://publications.scilifelab.se/researcher/447295d5434d44f3998e33af808bdea3.json"}}, {"family": "Lampa", "given": "Samuel", "initials": "S", "orcid": "0000-0001-6740-9212", "researcher": {"href": "https://publications.scilifelab.se/researcher/d4be992e2eed49c8ba1aaa0a4319ec24.json"}}, {"family": "Vaudel", "given": "Marc", "initials": "M", "orcid": "0000-0003-1179-9578", "researcher": {"href": "https://publications.scilifelab.se/researcher/920056fb632147559a0d9e3f49c6bdbd.json"}}, {"family": "Blankenberg", "given": "Daniel", "initials": "D", "orcid": "0000-0002-6833-9049", "researcher": {"href": "https://publications.scilifelab.se/researcher/3ed1285d051048cf9a27a76e71bda3bc.json"}}], "type": "journal article", "published": "2019-05-01", "journal": {"volume": "8", "issn": "2047-217X", "issue": "5", "pages": null, "title": "Gigascience", "issn-l": "2047-217X"}, "abstract": "The increasing complexity of data and analysis methods has created an environment where scientists, who may not have formal training, are finding themselves playing the impromptu role of software engineer. While several resources are available for introducing scientists to the basics of programming, researchers have been left with little guidance on approaches needed to advance to the next level for the development of robust, large-scale data analysis tools that are amenable to integration into workflow management systems, tools, and frameworks. The integration into such workflow systems necessitates additional requirements on computational tools, such as adherence to standard conventions for robustness, data input, output, logging, and flow control. Here we provide a set of 10 guidelines to steer the creation of command-line computational tools that are usable, reliable, extensible, and in line with standards of modern coding practices.", "doi": "10.1093/gigascience/giz054", "pmid": "31121028", "labels": {"Bioinformatics Support, Infrastructure and Training": "Collaborative", "Bioinformatics Support and Infrastructure": "Collaborative"}, "xrefs": [{"db": "pii", "key": "5497810"}, {"db": "pmc", "key": "PMC6532757"}], "notes": [], "created": "2020-01-07T14:38:59.181Z", "modified": "2021-06-21T11:56:23.801Z"}, {"entity": "publication", "iuid": "cec100d1ada4482681493c2ea3e13c74", "links": {"self": {"href": "https://publications.scilifelab.se/publication/cec100d1ada4482681493c2ea3e13c74.json"}, "display": {"href": "https://publications.scilifelab.se/publication/cec100d1ada4482681493c2ea3e13c74"}}, "title": "PhenoMeNal: Processing and analysis of Metabolomics data in the Cloud.", "authors": [{"family": "Peters", "given": "Kristian", "initials": "K"}, {"family": "Bradbury", "given": "James", "initials": "J"}, {"family": "Bergmann", "given": "Sven", "initials": "S"}, {"family": "Capuccini", "given": "Marco", "initials": "M"}, {"family": "Cascante", "given": "Marta", "initials": "M"}, {"family": "de Atauri", "given": "Pedro", "initials": "P"}, {"family": "Ebbels", "given": "Timothy M D", "initials": "TMD"}, {"family": "Foguet", "given": "Carles", "initials": "C"}, {"family": "Glen", "given": "Robert", "initials": "R"}, {"family": "Gonzalez-Beltran", "given": "Alejandra", "initials": "A"}, {"family": "G\u00fcnther", "given": "Ulrich L", "initials": "UL"}, {"family": "Handakas", "given": "Evangelos", "initials": "E"}, {"family": "Hankemeier", "given": "Thomas", "initials": "T"}, {"family": "Haug", "given": "Kenneth", "initials": "K"}, {"family": "Herman", "given": "Stephanie", "initials": "S"}, {"family": "Holub", "given": "Petr", "initials": "P"}, {"family": "Izzo", "given": "Massimiliano", "initials": "M"}, {"family": "Jacob", "given": "Daniel", "initials": "D"}, {"family": "Johnson", "given": "David", "initials": "D"}, {"family": "Jourdan", "given": "Fabien", "initials": "F"}, {"family": "Kale", "given": "Namrata", "initials": "N"}, {"family": "Karaman", "given": "Ibrahim", "initials": "I"}, {"family": "Khalili", "given": "Bita", "initials": "B"}, {"family": "Khonsari", "given": "Payam Emami", "initials": "PE"}, {"family": "Kultima", "given": "Kim", "initials": "K"}, {"family": "Lampa", "given": "Samuel", "initials": "S"}, {"family": "Larsson", "given": "Anders", "initials": "A"}, {"family": "Ludwig", "given": "Christian", "initials": "C"}, {"family": "Moreno", "given": "Pablo", "initials": "P"}, {"family": "Neumann", "given": "Steffen", "initials": "S"}, {"family": "Novella", "given": "Jon Ander", "initials": "JA"}, {"family": "O'Donovan", "given": "Claire", "initials": "C"}, {"family": "Pearce", "given": "Jake T M", "initials": "JTM"}, {"family": "Peluso", "given": "Alina", "initials": "A"}, {"family": "Piras", "given": "Marco Enrico", "initials": "ME"}, {"family": "Pireddu", "given": "Luca", "initials": "L"}, {"family": "Reed", "given": "Michelle A C", "initials": "MAC"}, {"family": "Rocca-Serra", "given": "Philippe", "initials": "P"}, {"family": "Roger", "given": "Pierrick", "initials": "P"}, {"family": "Rosato", "given": "Antonio", "initials": "A"}, {"family": "Rueedi", "given": "Rico", "initials": "R"}, {"family": "Ruttkies", "given": "Christoph", "initials": "C"}, {"family": "Sadawi", "given": "Noureddin", "initials": "N"}, {"family": "Salek", "given": "Reza M", "initials": "RM"}, {"family": "Sansone", "given": "Susanna-Assunta", "initials": "SA"}, {"family": "Selivanov", "given": "Vitaly", "initials": "V"}, {"family": "Spjuth", "given": "Ola", "initials": "O"}, {"family": "Schober", "given": "Daniel", "initials": "D"}, {"family": "Th\u00e9venot", "given": "Etienne A", "initials": "EA"}, {"family": "Tomasoni", "given": "Mattia", "initials": "M"}, {"family": "van Rijswijk", "given": "Merlijn", "initials": "M"}, {"family": "van Vliet", "given": "Michael", "initials": "M"}, {"family": "Viant", "given": "Mark R", "initials": "MR"}, {"family": "Weber", "given": "Ralf J M", "initials": "RJM"}, {"family": "Zanetti", "given": "Gianluigi", "initials": "G"}, {"family": "Steinbeck", "given": "Christoph", "initials": "C"}], "type": "journal article", "published": "2018-12-07", "journal": {"volume": null, "issn": "2047-217X", "issue": null, "title": "Gigascience", "issn-l": "2047-217X"}, "abstract": "Metabolomics is the comprehensive study of a multitude of small molecules to gain insight into an organism's metabolism. The research field is dynamic and expanding with applications across biomedical, biotechnological and many other applied biological domains. Its computationally-intensive nature has driven requirements for open data formats, data repositories and data analysis tools. However, the rapid progress has resulted in a mosaic of independent-and sometimes incompatible-analysis methods that are difficult to connect into a useful and complete data analysis solution.\n\nPhenoMeNal (Phenome and Metabolome aNalysis) is an advanced and complete solution to set up Infrastructure-as-a-Service (IaaS) that brings workflow-oriented, interoperable metabolomics data analysis platforms into the cloud. PhenoMeNal seamlessly integrates a wide array of existing open source tools which are tested and packaged as Docker containers through the project's continuous integration process and deployed based on a kubernetes orchestration framework. It also provides a number of standardized, automated and published analysis workflows in the user interfaces Galaxy, Jupyter, Luigi and Pachyderm.\n\nPhenoMeNal constitutes a keystone solution in cloud e-infrastructures available for metabolomics. PhenoMeNal is a unique and complete solution for setting up cloud e-infrastructures through easy-to-use web interfaces that can be scaled to any custom public and private cloud environment. By harmonizing and automating software installation and configuration and through ready-to-use scientific workflow user interfaces, PhenoMeNal has succeeded in providing scientists with workflow-driven, reproducible and shareable metabolomics data analysis platforms which are interfaced through standard data formats, representative datasets, versioned, and have been tested for reproducibility and interoperability. The elastic implementation of PhenoMeNal further allows easy adaptation of the infrastructure to other application areas and 'omics research domains.", "doi": "10.1093/gigascience/giy149", "pmid": "30535405", "labels": {"Bioinformatics Support, Infrastructure and Training": "Collaborative", "Bioinformatics Support and Infrastructure": "Collaborative"}, "xrefs": [{"db": "pii", "key": "5232984"}], "notes": [], "created": "2019-01-14T19:40:52.962Z", "modified": "2020-01-21T13:53:22.851Z"}, {"entity": "publication", "iuid": "ebbf7f6821024f629c61396f1163eea2", "links": {"self": {"href": "https://publications.scilifelab.se/publication/ebbf7f6821024f629c61396f1163eea2.json"}, "display": {"href": "https://publications.scilifelab.se/publication/ebbf7f6821024f629c61396f1163eea2"}}, "title": "Recommendations on e-infrastructures for next-generation sequencing.", "authors": [{"family": "Spjuth", "given": "Ola", "initials": "O"}, {"family": "Bongcam-Rudloff", "given": "Erik", "initials": "E"}, {"family": "Dahlberg", "given": "Johan", "initials": "J"}, {"family": "Dahl\u00f6", "given": "Martin", "initials": "M"}, {"family": "Kallio", "given": "Aleksi", "initials": "A"}, {"family": "Pireddu", "given": "Luca", "initials": "L"}, {"family": "Vezzi", "given": "Francesco", "initials": "F"}, {"family": "Korpelainen", "given": "Eija", "initials": "E"}], "type": "journal article", "published": "2016-06-07", "journal": {"volume": "5", "issn": "2047-217X", "issue": null, "pages": "26", "title": "Gigascience", "issn-l": "2047-217X"}, "abstract": "With ever-increasing amounts of data being produced by next-generation sequencing (NGS) experiments, the requirements placed on supporting e-infrastructures have grown. In this work, we provide recommendations based on the collective experiences from participants in the EU COST Action SeqAhead for the tasks of data preprocessing, upstream processing, data delivery, and downstream analysis, as well as long-term storage and archiving. We cover demands on computational and storage resources, networks, software stacks, automation of analysis, education, and also discuss emerging trends in the field. E-infrastructures for NGS require substantial effort to set up and maintain over time, and with sequencing technologies and best practices for data analysis evolving rapidly it is important to prioritize both processing capacity and e-infrastructure flexibility when making strategic decisions to support the data analysis demands of tomorrow. Due to increasingly demanding technical requirements we recommend that e-infrastructure development and maintenance be handled by a professional service unit, be it internal or external to the organization, and emphasis should be placed on collaboration between researchers and IT professionals.", "doi": "10.1186/s13742-016-0132-7", "pmid": "27267963", "labels": {"National Genomics Infrastructure": "Technology development", "NGI Uppsala (SNP&SEQ Technology Platform)": "Technology development", "Bioinformatics Support for Computational Resources": "Service"}, "xrefs": [{"db": "pii", "key": "10.1186/s13742-016-0132-7"}, {"db": "pmc", "key": "PMC4897895"}], "notes": [], "created": "2017-05-03T13:00:23.417Z", "modified": "2024-01-16T13:48:49.951Z"}, {"entity": "publication", "iuid": "42f4c093ad4944ef8a0a2edfeea4ff25", "links": {"self": {"href": "https://publications.scilifelab.se/publication/42f4c093ad4944ef8a0a2edfeea4ff25.json"}, "display": {"href": "https://publications.scilifelab.se/publication/42f4c093ad4944ef8a0a2edfeea4ff25"}}, "title": "De novo assembly of Dekkera bruxellensis: a multi technology approach using short and long-read sequencing and optical mapping.", "authors": [{"family": "Olsen", "given": "Remi-Andre", "initials": "RA"}, {"family": "Bunikis", "given": "Ignas", "initials": "I"}, {"family": "Tiukova", "given": "Ievgeniia", "initials": "I"}, {"family": "Holmberg", "given": "Kicki", "initials": "K"}, {"family": "L\u00f6tstedt", "given": "Britta", "initials": "B"}, {"family": "Pettersson", "given": "Olga Vinnere", "initials": "OV"}, {"family": "Passoth", "given": "Volkmar", "initials": "V"}, {"family": "K\u00e4ller", "given": "Max", "initials": "M", "orcid": "0000-0001-6813-3051", "researcher": {"href": "https://publications.scilifelab.se/researcher/536ad902a272482aba853c078557e240.json"}}, {"family": "Vezzi", "given": "Francesco", "initials": "F"}], "type": "journal article", "published": "2015-11-26", "journal": {"volume": "4", "issn": "2047-217X", "issue": null, "pages": "56", "title": "Gigascience", "issn-l": "2047-217X"}, "abstract": "It remains a challenge to perform de novo assembly using next-generation sequencing (NGS). Despite the availability of multiple sequencing technologies and tools (e.g., assemblers) it is still difficult to assemble new genomes at chromosome resolution (i.e., one sequence per chromosome). Obtaining high quality draft assemblies is extremely important in the case of yeast genomes to better characterise major events in their evolutionary history. The aim of this work is two-fold: on the one hand we want to show how combining different and somewhat complementary technologies is key to improving assembly quality and correctness, and on the other hand we present a de novo assembly pipeline we believe to be beneficial to core facility bioinformaticians. To demonstrate both the effectiveness of combining technologies and the simplicity of the pipeline, here we present the results obtained using the Dekkera bruxellensis genome.\n\nIn this work we used short-read Illumina data and long-read PacBio data combined with the extreme long-range information from OpGen optical maps in the task of de novo genome assembly and finishing. Moreover, we developed NouGAT, a semi-automated pipeline for read-preprocessing, de novo assembly and assembly evaluation, which was instrumental for this work.\n\nWe obtained a high quality draft assembly of a yeast genome, resolved on a chromosomal level. Furthermore, this assembly was corrected for mis-assembly errors as demonstrated by resolving a large collapsed repeat and by receiving higher scores by assembly evaluation tools. With the inclusion of PacBio data we were able to fill about 5 % of the optical mapped genome not covered by the Illumina data.", "doi": "10.1186/s13742-015-0094-1", "pmid": "26617983", "labels": {"National Genomics Infrastructure": null, "NGI Stockholm (Genomics Applications)": null, "NGI Stockholm (Genomics Production)": null, "NGI Uppsala (Uppsala Genome Center)": null}, "xrefs": [{"db": "pii", "key": "94"}, {"db": "pmc", "key": "PMC4661999"}], "notes": [], "created": "2017-05-02T12:58:21.226Z", "modified": "2021-07-07T15:22:42.719Z"}, {"entity": "publication", "iuid": "213a771d07904e0697738317d50322be", "links": {"self": {"href": "https://publications.scilifelab.se/publication/213a771d07904e0697738317d50322be.json"}, "display": {"href": "https://publications.scilifelab.se/publication/213a771d07904e0697738317d50322be"}}, "title": "A quantitative assessment of the Hadoop framework for analyzing massively parallel DNA sequencing data.", "authors": [{"family": "Siretskiy", "given": "Alexey", "initials": "A"}, {"family": "Sundqvist", "given": "Tore", "initials": "T"}, {"family": "Voznesenskiy", "given": "Mikhail", "initials": "M"}, {"family": "Spjuth", "given": "Ola", "initials": "O"}], "type": "journal article", "published": "2015-06-04", "journal": {"volume": "4", "issn": "2047-217X", "issue": null, "pages": "26", "title": "Gigascience", "issn-l": "2047-217X"}, "abstract": "New high-throughput technologies, such as massively parallel sequencing, have transformed the life sciences into a data-intensive field. The most common e-infrastructure for analyzing this data consists of batch systems that are based on high-performance computing resources; however, the bioinformatics software that is built on this platform does not scale well in the general case. Recently, the Hadoop platform has emerged as an interesting option to address the challenges of increasingly large datasets with distributed storage, distributed processing, built-in data locality, fault tolerance, and an appealing programming methodology.\n\nIn this work we introduce metrics and report on a quantitative comparison between Hadoop and a single node of conventional high-performance computing resources for the tasks of short read mapping and variant calling. We calculate efficiency as a function of data size and observe that the Hadoop platform is more efficient for biologically relevant data sizes in terms of computing hours for both split and un-split data files. We also quantify the advantages of the data locality provided by Hadoop for NGS problems, and show that a classical architecture with network-attached storage will not scale when computing resources increase in numbers. Measurements were performed using ten datasets of different sizes, up to 100 gigabases, using the pipeline implemented in Crossbow. To make a fair comparison, we implemented an improved preprocessor for Hadoop with better performance for splittable data files. For improved usability, we implemented a graphical user interface for Crossbow in a private cloud environment using the CloudGene platform. All of the code and data in this study are freely available as open source in public repositories.\n\nFrom our experiments we can conclude that the improved Hadoop pipeline scales better than the same pipeline on high-performance computing resources, we also conclude that Hadoop is an economically viable option for the common data sizes that are currently used in massively parallel sequencing. Given that datasets are expected to increase over time, Hadoop is a framework that we envision will have an increasingly important role in future biological data analysis.", "doi": "10.1186/s13742-015-0058-5", "pmid": "26045962", "labels": {"Bioinformatics Support, Infrastructure and Training": null, "Bioinformatics Support and Infrastructure": null}, "xrefs": [{"db": "pii", "key": "58"}, {"db": "pmc", "key": "PMC4455317"}], "notes": [], "created": "2017-05-02T12:58:34.234Z", "modified": "2020-01-21T13:53:20.379Z"}, {"entity": "publication", "iuid": "ef809a1996704838a59705082f7f5b37", "links": {"self": {"href": "https://publications.scilifelab.se/publication/ef809a1996704838a59705082f7f5b37.json"}, "display": {"href": "https://publications.scilifelab.se/publication/ef809a1996704838a59705082f7f5b37"}}, "title": "Lessons learned from implementing a national infrastructure in Sweden for storage and analysis of next-generation sequencing data.", "authors": [{"family": "Lampa", "given": "Samuel", "initials": "S"}, {"family": "Dahl\u00f6", "given": "Martin", "initials": "M"}, {"family": "Olason", "given": "Pall I", "initials": "PI"}, {"family": "Hagberg", "given": "Jonas", "initials": "J"}, {"family": "Spjuth", "given": "Ola", "initials": "O"}], "type": "journal article", "published": "2013-06-25", "journal": {"volume": "2", "issn": "2047-217X", "issue": "1", "pages": "9", "title": "Gigascience", "issn-l": "2047-217X"}, "abstract": ": Analyzing and storing data and results from next-generation sequencing (NGS) experiments is a challenging task, hampered by ever-increasing data volumes and frequent updates of analysis methods and tools. Storage and computation have grown beyond the capacity of personal computers and there is a need for suitable e-infrastructures for processing. Here we describe UPPNEX, an implementation of such an infrastructure, tailored to the needs of data storage and analysis of NGS data in Sweden serving various labs and multiple instruments from the major sequencing technology platforms. UPPNEX comprises resources for high-performance computing, large-scale and high-availability storage, an extensive bioinformatics software suite, up-to-date reference genomes and annotations, a support function with system and application experts as well as a web portal and support ticket system. UPPNEX applications are numerous and diverse, and include whole genome-, de novo- and exome sequencing, targeted resequencing, SNP discovery, RNASeq, and methylation analysis. There are over 300 projects that utilize UPPNEX and include large undertakings such as the sequencing of the flycatcher and Norwegian spruce. We describe the strategic decisions made when investing in hardware, setting up maintenance and support, allocating resources, and illustrate major challenges such as managing data growth. We conclude with summarizing our experiences and observations with UPPNEX to date, providing insights into the successful and less successful decisions made.", "doi": "10.1186/2047-217X-2-9", "pmid": "23800020", "labels": {"Bioinformatics Support, Infrastructure and Training": null, "Bioinformatics Support and Infrastructure": null}, "xrefs": [{"db": "pii", "key": "2047-217X-2-9"}, {"db": "pmc", "key": "PMC3704847"}], "notes": [], "created": "2017-05-04T14:56:23.957Z", "modified": "2020-01-21T13:53:20.946Z"}], "created": "2017-05-09T09:12:18.445Z", "modified": "2020-11-27T13:14:08.840Z"}