Project Proposals for Google Summer of Code 2015
Students can apply to take part in projects proposed by mentoring organizations – please see our proposed projects below. To learn more about the program, use the links at the bottom of the page.
Proposals and ideas for potential INCF projects within Google Summer of Code:
1. Off-line mobile client for EEGBase
2. Scalable Brain Atlas: Responsive design, 3d printing, Image registration
2.1 Responsive design and client-side image analysis
2.2 3D printing of anatomical brain regions
3. Open source, cross simulator, large scale cortical models
4. A comprehensive Python-R online tools suite
6. The Virtual Brain: An open-source simulator for whole brain network modeling
6.1 Integration of SCRIPTS in TVB
6.2 Numerical accuracy evaluation
6.3 Interactive Data Exploration
7. Python module for brain connectivity analysis based on multivariate autoregressive model
8. Neuroscience model exploration and development tools for PyDSTool
8.1 Improving the graphical model exploration and analysis environment for PyDSTool
8.2 Adding basic “literate modeling” capabilities for model development with PyDSTool
9. Improving the Q&A tool BioStars for NeuroStars.org
10. NIX
10.1 Language bindings for the NIX file format
10.2 Neurosuite NIX integration
11. Accessing a versioned RDF graph within a git environment
12.1 Multiscale Model Grid using MOOSE
12.2 GPU implementations for MOOSE
14. StackReg Plus: Improved multiple image alignment with ImageJ
15. Power/ARM support for the Cyme library
16. Toolboxing network complexity for connectomes: New code for big samples in C-PAC
18. A platform for neuroimaging workflows in distributed and heterogeneous environments
19. Next generation biological simulator user interface for neuroscientists
20. Modelling real dynamics of muscles and neurons
1. Off-line mobile client for EEGBase
We perform electroencephalography (EEG) and eventrelated potentials (ERP) experiments. These experiments produce large collections of data and unstructured metadata. Management and long term storage of these data is crucial for future processing. Facing these needs we have implemented a web based system that enables researchers to upload, download, or share experimental data and metadata using a common computer connected to the Internet the EEG/ERP Portal. Management of experiments is ensured by a user interface connected to a database (a combination of SQL - Postgres and NoSQL - ElasticSearch). The EEG/ERP Portal enables to register users, manage user groups, create well described experiments, group experiments to packages, etc. User accounts are secured by users' credentials, third-party tools communicate with the system using secured web-services.
Anyway, there are situations when a common computer connected to the Internet is not available. In this case, handwritten forms, later stored electronically are often used. The next use case includes situations when experimental records and results are not on hand. For instance discussions of scientific results at meetings.
Avoiding error-prone and redundant work we have implemented a mobile version of the EEG/ERP Portal used in devices such as mobile phones or tablets. This mobile client provides similar user interface as in common EEG/ERP Portal. The data filled in the mobile client are immediately stored in the EEG/ERP Portal.
Aims: In current state the mobile clients communicates online with the EEG/ERP Portal. It is restrictive in environments when an online connection is not available. The first task for the applicant is to implement an embedded database into the client. The data will be stored in this database and synchronized with the EEG/ERP Portal when the client gets online. Moreover a system of templates for different kind of experiments is currently implementing in the EEG/ERP Portal. Anyway, the client still uses a predefined fixed forms for storing metadata. To extend the client to enable working with different layouts is the applicant's second task.
Skills: Java, Maven, Android, SQL (Postgres or Similar), NoSQL databases (Elastic Search or similar), XML, WebServices (SOAP and REST), GitHub
Mentors: Petr Ježek and Roman Mouček (University of West Bohemia, Czech Republic, Czech National INCF Node)
2. Scalable Brain Atlas: Responsive design, 3d printing, Image registration
The Scalable Brain Atlas (SBA) is an online webservice for the interactive display of brain atlases. It had more than 13000 unique visitors in 2014. You can help to make this number grow bigger, by opting for one of the following three project ideas.
2.1 Responsive design and client-side image analysis
A recent addition to the Scalable Brain Atlas is the "virtual microscope", with which brain images at microscopic resolution can be browsed and enhanced inside the clients browser window. You can see it in action at http://scalablebrainatlas.incf.org/ABA12?plugin=imaging, select the high resolution "Nissl+" data. In this project you will enhance the plugin by
a) converting the Scalable Brain Atlas to a "responsive design" website, that adapts itself to the users screen resolution and in which page elements can be resized. This will make the imaging plugin much more attractive.
b) enhancing the plugin by introducing multiple layers, and creating tools to compare two different modalities, for example by making a subset of pixels transparent or using a checkerboard view.
c) improving the responsiveness of the plugin, by eliminating unnecessary page refresh cycles that are currently imposed by the <a href="http://fabricjs.com">fabric.js</a> framework.
d) enable a new type of image alignment whereby the user draws corresponding regions of interest (such as "left side of brain") in two brain image sections and uses an online service to nonlinearly map the images to each other.
Skills: Javascript, HTML5 canvas
2.2 3D printing of anatomical brain regions
An SBA service to upload an MRI scan and register it to a reference atlas is under development. With that service in place, it would be very interesting to offer users of the site the possibility to 3d print the brains that they upload, or simply print the atlases that are already on the server. The software infrastructure required for this involves surface extraction and parcellation of brain volume data. The main 3d printing challenge is to do it such that different pieces of the brain can be put together like a puzzle. The output of the service should consist of files that are ready for submission to online 3d printing services. We will seek a (commercial) partner to provide test prints and give practical advice.
Skills: Python and/or C++
2.3 Image registration
If you like to be more on the scientific side of brain atlasing, then go for the implementation of image registration pipelines. We already have a prototype pipeline for brain-volume to brain-volume registration, but there is a lot of demand for brain-slice to brain-volume registration, which is harder because it requires manual interaction with the data: select the best matching brain slice, choose the optimal slice orientation, carry out 2d to 2d image warping, etc. For the warping part you have a choice of existing software packages, such as elastix.
Skills: Python, Javascript, PHP
Mentor: Rembrandt Bakker (Donders Institute, Radboud University, Netherlands, INCF Netherlands Node)
3. Open source, cross simulator, large scale cortical models
An increasing number of studies are using large scale network models incorporating realistic connectivity to understand information processing in cortical structures. High performance computational resources are becoming more widely available to computational neuroscientists for this type of modelling and general purpose, well tested simulation environments such as NEURON and NEST are widely used. In addition, hardware solutions (based on custom neuromorphic hardware or off the shelf FPGA/GPU hardware) are in development, promising ever larger, faster simulations. However, there is a lack of well tested, community maintained network model examples which can work across all of these simulation solutions, which both incorporate realistic cell and network properties and provide networks of sufficient complexity to test the performance and scalability of these platforms. This work will involve converting a number of published large scale network models into open, simulator independent formats such as PyNN and NeuroML, testing them across multiple simulator implementations and making them available to the community through the Open Source Brain repository.
Skills required: Python; XML; open source development; computational modelling experience.
Skills desired: Java experience; a background in theoretical neuroscience and/or large scale modelling.
Aims:
1) Select a number of large scale network models for the conversion & testing process. Examples could include Izhikevich and Edelman 2008, Hill and Tononi 2005, Potjans and Diesmann 2012.
2) Convert network structure and cell/synaptic properties to PyNN and NeuroML. Where appropriate use the simulator independent specification in LEMS to specify cell/synapse dynamics and to allow mapping to simulators.
3) Make models available on the Open Source Brain repository, along with documentation and references.
Mentors: Andrew Davison (French INCF Node), Padraig Gleeson (UK INCF Node)
4. A comprehensive Python-R online tools suite
Python and R are two of the most used languages by the scientific community for processing and display of data. One of the main objectives of neuroinformatics is to provide free and online accessible suites of tools and platforms for processing and display of neurobiological data. Hence, the pair Python-R is one of the best suited for construction of such platforms. Even though neuroinformatic data sources are freely available, and the quantity of data is rapidly increasing, there is no system to date that is able to process this data for anonymous users, and help scientists in data analysis and hypotheses making. Thus, a freely accessible suite of statistical tools becomes necessary, and it might well received by the neuroinformatics community. The task is to create an autonomous Python-R suite of online statistical tools, which can be connected to classical database environments (MySQL, PostgreSQL), as well as to RDF and Excel (CSV) datasources. The suite will contain a comprehensive set of statistical tools, which include clustering methods, principal and independent component analysis, networks analysis, etc. The output will be both in tabular and graphical formats, the input will support heterogeneous datatypes, and both will be performed online. Users will be able to save their results in different formats (CSV, PDF), as well as images. This package will be created in such way that will be easily extended both horizontally and vertically.
Skills:
Mandatory: the background of students should include knowledge of mathematics and statistics at least at the undergraduate level. Also, working knowledge of Python (Django as Python-based framework), MySQL or PostgreSQL, RDF, and Javascript/HTML (as client side). Ideally, experience with Bootstrap.
Useful: working knowledge of R, or any similar language. Students will have the opportunity to learn one of the most used freely available statistical language.
Mentor: Mihail Bota (University of Southern California)
5. Improving the Brian simulator's interoperability with simulator-independent model-description languages
Brian is a widely used simulator for spiking neural networks, written in Python. The aim of this project is to automate the translation of models defined in Brian to and from simulator-independent formats. This will have two important applications:
(1) it will make it possible to validate simulations by comparing the behaviour of the same model across different simulators
(2) it will facilitate the construction of shared resources (e.g. neuron models on Open Source Brain), that can be used by the widest possible community of researchers.
This project has been made possible by a relatively recent convergence in the features offered by various simulators for flexible definitions of neural models. Brian can simulate arbitrary neural models that are described by the user with mathematical equations and statements. Brian 2 (the version of Brian currently in Beta version) has built-in facilities for code generation, and can therefore run simulations in various programming languages even if it is itself written in Python. Until recently, simulator-independent languages (such as PyNN[1] and NeuroML[2]) worked by selecting from a fixed set of standard models, allowing only the parameters and connectivity to be varied. However, in recent years, several initiatives created simulator-independent model languages that also describe neural models based on mathematical equations, e.g. NineML [3] and LEMS[4] (used with NeuroML 2), and so it has now become possible to translate arbitrary models between these formats.
[1] Davison, A.P., Brüderle, D., Eppler, J., Kremkow, J., Muller, E., Pecevski, D., Perrinet, L., and Yger, P. (2009). PyNN: a common interface for neuronal network simulators. Front. Neuroinform. 2, 11.
[2] Gleeson, P., Crook, S., Cannon, R.C., Hines, M.L., Billings, G.O., Farinella, M., Morse, T.M., Davison, A.P., Ray, S., Bhalla, U.S., et al. (2010). NeuroML: A Language for Describing Data Driven Models of Neurons and Networks with a High Degree of Biological Detail. PLoS Comput Biol 6, e1000815.
[3] Raikov, I., Cannon, R., Clewley, R., Cornelis, H., Davison, A., Schutter, E.D., Djurfeldt, M., Gleeson, P., Gorchetchnikov, A., Plesser, H.E., et al. (2011). NineML: the network interchange for neuroscience modeling language. BMC Neuroscience 12, P330.
[4] Cannon, R.C., Gleeson, P., Crook, S., Ganapathy, G., Marin, B., Piasini, E., and Silver, R.A. (2014). LEMS: a language for expressing complex biological models in concise and hierarchical form and its use in underpinning NeuroML 2. Front Neuroinform 8.
Aims:
1. Write a Python script that imports either a NineML or a LEMS/NeuroML 2 description and generates a Brian script from it
2. Write an export module (based on Brian's code generation facilities) that exports a NineML or LEMS/NeuroML2 description from a Brian script
3. Validate the importer/exporter on published models across simulators
Skills:
Required: Python programming
Desirable: XML, code generation techniques (in particular using the jinja2 templating system), experience/interest in computational neuroscience
Mentor: Marcel Stimberg (Institut de la Vision, Paris, France)
6. The Virtual Brain: An open-source simulator for whole brain network modeling
The Virtual Brain (TVB) is one of the few open source neuroinformatics platforms used to simulate whole brain dynamics. Models are not limited to the human brain but researchers can also work with the macaque's and/or the mouse's connectome. Models based on biologically realistic macroscopic connectivity will hopefully help us to understand the global dynamics observed in the healthy and diseased brain. Whether you are interested in beautiful visualizations or differential equations, you can join us and help us improve!
Several open issues addressed by the following proposals involve
- verifying numerical methods
- improving simulator performance
- enhancing data IO and visualization
6.1 Integration of SCRIPTS in TVB
Currently, TVB does not provide the tools to obtain a complete dataset from structural and diffusion MRI ready to use in simulations. Preparation of such data is not trivial. An semi-automatic pipeline, SCRIPTS, addressing the specific needs of TVB is freely available. The aim of this project is the integration of the pipeline into TVB to simplify the task of data reconstruction. An API in python using nypipe will be provided. In particular, a GUI will be built (PyQt or equivalent in HTML5/WebGL) enabling a non-expert to perform the following steps:
- upload data
- visualize existing data
- launch pipeline
- view progress & visual verifiers
- visualize final results
Skills required: Shell scripting, Python, HTML5, JS, WebGL, D3.js. Some knowledge about MRI processing techniques is a plus.
Expected results: An adapter for TVB web interface allowing to run the pipeline and visualize the results.
Mentors: Timothée Proix (The Virtual Brain; @timpx), Paula Sanz-Leon (The Virtual Brain; @pausz), Marmaduke Woodman (The Virtual Brain; @maedoc), Lia Domide (The Virtual Brain; @liadomide)
6.2 Numerical accuracy evaluation
Assessing the numerical accuracy of a simulation is necessary in order to verify the results. This proposal involves using systems with known solutions (such as a linear ordinary differential equation) with TVB's numerical methods to determine how accurate the methods are. These results may be tested against methods implemented by other softwares such as XPPAUT.
Skills required: Python; Experience with differential equations, MATLAB would be helpful.
Expected results: Test suite for assessing integration accuracy, and documentation of accuracy tests in user guide.
Mentors: Paula Sanz-Leon (The Virtual Brain; @pausz), Marmaduke Woodman (The Virtual Brain; @maedoc), Mihai Andrei (The Virtual Brain; @mihandrei)
6.3 Interactive Data Exploration
Data visualization plays a crucial role in TVB's neuroinformatics platform; effective interactive visualization can improve users' experience by helping them to quickly explore large datasets. Doing it properly in TVB is challenging because the web browser is still a developing platform with respect to graphics. Several tasks related to this project are available. Interested students are urged to select one or more from the list based on time, interest & experience:
a) Rewrite visualizers that are currently implemented using MatplotLib and MPLH5 with visualization libraries oriented toward web browsers, such as D3.js or Bokeh.
b) Refactor one of the time-series visualizer currently available.
c) Improve interactive editing of large matrices (O(1000^2)). Rendering performance as well as per-element interaction is important.
Skills required: HTML/JS/CSS & Python; Experience in web development, JQuery, SVG, WebGL, as well as server side frameworks such as CherryPy, is helpful.
Mentors: Lia Domide (The Virtual Brain; @liadomide), Mihai Andrei (The Virtual Brain; @mihandrei), Paula Sanz-Leon (The Virtual Brain; @pausz)
7. Python module for brain connectivity analysis based on multivariate autoregressive model
The problem of functional brain connectivity has been gaining more and more interest in the last years. Estimating brain interactions from noisy EEG/MEG signals is a difficult task. As classical bivariate estimators in systems with more than two channels may supply misleading information [1], [2], truly multivariate methods based on the Granger causality become more popular [3], [4]. However, while there are available toolboxes for connectivity analysis such as e.g. SIFT (for EEGLAB) or eConnectome based on Matlab, according to increase of Python popularity in neuroscience applications, there is a great need to provide this language with a similar module. Recently, connectivity estimating packages have started to appear. However, there is still need for a simple open module which contains basic functionality, easy to change and manipulate for students and advanced software developers, without heavy all-purpose libraries and multitude of functions to comprehend before starting using it. Moreover, it is planned to implement various AR model fitting algorithms and short-time (dynamic) version of estimators which seem to be not developed yet.
The aim of this project is to design and implement a Python module for analysis and visualization of the information flow between signals from e.g. electrophysiological recordings. The analysis will be based on a multivariate AR model which gives an opportunity to define causal relations estimators like DTF or PDC [3], [4]. Classical bivariate (such as coherency [5]) and multivariate (DTF, PDC, GC) estimators will be available in the toolbox. The module will contain the multivariate AR model fitting routines, currently not available in any public Python numerical library. A special care will be given to performance of the calculations and results visualization part. The module will be easy to use and open for many data formats.
Skills needed:
- knowledge of mathematical analysis, algebra and statistics at least at the undergraduate level
- a working knowledge of Python including scipy, numpy and matplotlib libraries
- basics of connectivity analysis
Mentor: Maciej Kaminski (University of Warsaw);
8. Neuroscience model exploration and development tools for PyDSTool
PyDSTool (http://pydstool.sf.net) is a python-based modeling and simulation environment for dynamical systems, including biophysical models of neural membranes and small networks. It is used in multiple scientific fields where models involve differential equations, discrete-time mappings, and hybrid models, but many of its toolbox components focus on applications in neuroscience. PyDSTool can play two roles in providing next generation modeling capabilities for neuroscience applications. First, it can be used as a high quality reference simulator and a bifurcation analyzer for small-to-moderate sized models imported from other modeling platforms. Second, PyDSTool can help explore the geometric properties of a model’s parameter space, assess qualitative properties of model trajectories, derive reduced model approximations automatically using multi-scale analysis, and develop or diagnose features and metrics for use in parameter optimization. It is this second, more forward-looking, application area that is the focus of the two proposed GSoC projects.
8.1 Improving the graphical model exploration and analysis environment for PyDSTool
PyDSTool’s present capabilities for exploratory modeling are prototypical and require significant technical burden to set up and manage through scripting. The existing UI extensions are based on a user-configurable set of classes over core components imported from PyDSTool or Matplotlib. The extensions are “plug-and-play” objects inherited from a base class, which create “layers” in sub-plots that are associated with specific calculations. Potentially spanning multiple sub-plots, these layers can be dynamically updated as parameters change or time advances during analysis, and they can be grouped or hierarchically constructed. Examples are:
-
Feature representations in state space, e.g. line/surface crossings, regions of high curvature, domains of an approximation’s error bound satisfaction
-
Clustering of state space sub-domains within which user-defined functions are zero or
-
below some threshold
-
Vectors showing velocity or acceleration at a specified position or time point
-
Other special geometric objects of interest, e.g. derived from approximations
-
Raw or derived properties of experimental data in the same coordinate space as model trajectories, which can be used to visually guide parameter fitting of the model
For simplicity, this project will focus on 2D projections of high dimensional state spaces. Extensions to 3D will follow later, once the core program structure is established for 2D.
Aims: Restructure, rewrite, and potentially redesign the prototype UI components and their interfaces to PyDSTool, and document them thoroughly (including creating a tutorial for the website). There are opportunities to create innovative new forms of plug-and-play UI component that will assist in visualization complex information about models. Test according to predefined examples that will include manipulations of the Hodgkin Huxley model and other simple dynamical systems.
Required skills: Python, Matplotlib, GUI design, documentation tools, working knowledge of calculus and geometry
Desirable skills: general experience with simulator and/or math modeling with differential equations, technical writing
8.2 Adding basic “literate modeling” capabilities for model development with PyDSTool
A related prototypical extension is a part-graphical, part-command-line tool to explore and test model behavior and parameter variations against quantitative objective data or qualitative features while working inside PyDSTool. These tools are aimed at promoting the concept of “literate modeling,” a natural extension to “literate programming” and reproducible research concepts in which the goal is to create a rich audit trail of the model development process itself (e.g. in addition to recording the provenance of code versions and parameter choices for simulation runs once a model is developed). Literate modeling aims to add structured, interpretive metadata to model versions through “active documents”. Examples of such metadata include recording of model metadata, validation regression tests, exploratory hypotheses (sets of expected features and constraints), data-driven contexts for parameter search. Currently, only bare-bones workflows and code snippets have been explored.
Aims: Refine, improve, or redesign existing prototype classes, DB structure, and user workflow for basic functions of literate modeling. Add other core functionality to support simple workflows. Option to design a browser-based interface to this system using Django or similar technology. Document the tools, including creating a tutorial for the website. Test according to predefined examples that will include manipulations of the Hodgkin Huxley model and other simple dynamical systems.
Required skills: Python, git or similar VCS, documentation tools, automatic code generation principles, working knowledge of calculus and geometry
Desirable skills: IPython notebooks, Pweave/Ptangle or similar, Django or similar, SQL, XML, general experience with simulator and/or math modeling with differential equations, technical writing
Mentor: Robert Clewley (Georgia State University, Atlanta, GA, USA)
9. Improving the Q&A tool BioStars for NeuroStars.org
BioStars is a popular Q&A software aimed at scientific communities. In this edition of GSoC we would like to further improve Neurostars.org website, which is based on BioStars, in order to benefit scientific communities in general and the Neuroinformatics community in particular.
Our goals for the project will target the following
-
Ease deployment. We believe that the mainstream success of this software depends on more streamlined deployment processes. That would allow users to setup a BioStar instance in the least time possible. For instance, http://wordpress.com allows a beginner to setup a blog in no time. We want to be the wordpress or tumblr for Q&A communities, allowing for easier deployment, operation, skinning, usability and customization. As a result of the easier deployment, people can use such a site locally at their own institutions or even for specific events (e.g., hackathons).
-
Federated content searches. The second goal will be to federate knowledge by connecting different instances search engines and background processes. Searching a question will allow users to discover content from other sites and explore new horizons instead of being stuck in the knowledge available in a single instance.
-
Email list integration. As of today, many communities use mailing lists to communicate. We want to not only import those (which is possible today), but to bridge those conversations happening in the BioStar sites with mailing lists in both directions, and coupled with deployment and federation above allow users running these mailing lists to deploy their own site.
Last but not least, if the student has time and energy left after implementing the above features, we have extra goodies, including but not limited to:
-
We want to experiment with synchronous interaction with Q&A websites so that visiting users can see in real time how the site changes and further engage them into being part of it.
-
We want to explore data sharing via BitTorrent and/or WebTorrent protocols. Removing silos for data sharing in the scientific world could be revolutionary.
Are you up for the challenge?
Required skills: Python, Django, Travis, Heroku, Ansible, WebRTC, BitTorrent
Mentors: Roman Valls (Stockholm, Sweden; NeuroStars.org) https://github.com/brainstorm and Albert Istvan (Penn State University, US; BioStars) https://github.com/ialbert
More info on student applications - post your questions and suggestions here: https://github.com/ialbert/biostar-central/issues/328
10.1 Language bindings for the NIX file format
NIX [1] is a versatile data model and format for comprehensive storage of scientific data and metadata [2]. In particular, NIX initiative is focused on establishing a common data model and corresponding file format to store raw experimental data in the domains of electrophysiology, functional imaging, microscopy and several other fields of neuroscience. The goal is to bring experimental data and metadata together and provide a single consistent API to manage experimental entities and recorded datasets, enabling easy data access and data exchange among scientists.
The main NIX project is a C++ library [3] that uses the Hierarchical Data Format (HDF5) [4] as a backend. To facilitate data analysis, scientists need to be able to access these files natively from their computational environment. For this purpose, we seek to provide bindings for the most common languages. Currently, Python bindings are fully implemented [5]. This project aims at developing interfaces in other languages (Java, Julia, R or .NET). We expect you to pick one of these languages according to your interest and area of competence.
First you would have to investigate how to create bindings to the C++ library with reasonable efforts and which technology can be used for this purpose. Second the library or the bindings have to be implemented. Finally, some use cases should be written to show the practical use of the library. You will work in close collaboration with the original NIX developers and use real neuroscience datasets to prepare examples. Writing a library will require a lot of programming and will give you an opportunity to learn about the used language (one of Java, Julia, R or .NET) and related technologies.
Aims:
Project #1: Java library (JNI, JNA, Swig or other)
Project #2: Julia library
Project #3: R library
Project #4: .NET library
Skills: C++, HDF5, good knowledge of the selected technology for bindings
[1] https://github.com/G-Node/nix/wiki
[2] Stoewer A, Kellner CJ, Benda J, Wachtler T and Grewe J (2014). File format and library for neuroscience data and metadata. Front. Neuroinform. Conference Abstract: Neuroinformatics 2014. doi: 10.3389/conf.fninf.2014.18.00027
[3] https://github.com/G-Node/nix
[4] http://www.hdfgroup.org/HDF5/
[5] https://github.com/G-Node/nixpy
Mentors: Christian Kellner, Andrey Sobolev, Adrian Stoewer (German INCF Node)
10.2 Neurosuite NIX integration
Neurosuite[1], consisting of Neuroscope, Kulsters, and NDManager, are powerful and free data visualization and processing tools for neurophysiologists that are being used by dozens of teams around the world. Currently the the Neurosuite programs use various different file formats for storing raw and derived data.
NIX is a versatile data model, format for efficient and comprehensive storage of scientific data and metadata [2, 3]. The generic apporach of NIX enables it to store a great variety of scientific data and metadata and their relations in a structured, wellorganized and unified way. It comes with a well tested and fullfeatured C++ API that uses the HDF5 file format for storage.
The project idea is to facilitate the advantages of the NIX data model by integrating it into the Neurosuite analysis software tools so data can be read and written from and to HDF5 files via the NIX API.
Skills: Mainly C++ (C++11); knowledge of Qt (version 4) would be desireable. Familiarity with neurophysiology would be advantageous.
[1] http://neurosuite.sourceforge.net
[2] https://github.com/G-Node/nix/wiki
[3] Stoewer et al. (2014) doi: 10.3389/conf.fninf.2014.18.00027
Mentor: Christian Kellner (German INCF Node)
11. Accessing a versioned RDF graph within a git environment
G-Node aims at the development and provision of data formats handling data from neuroscientific experiments. We identified RDF as an ideal tool to store experimental data and associate them with the equally important metadata specifying the experimental environment. RDF is our data storage of choice because it can be used to define a strict core data structure used by all participants, but can be easily expanded to fit data from a wide range of different experiments. This in turn requires the management of an RDF file over multiple versions.
The project aims to develop a feasible approach to conveniently store, access and modify different versions of an RDF graph in a git environment. The graph has to reside within a git repository and different versions of this graph have to be easily accessible. Whether this is accomplished solely by using the git command line environment or if this uses both git and additional file versioning technologies are subject of investigation for this project. One such approach has been previously described [1].
To complete this project, the first part is to evaluate the approach suggested by Vander Sande et al. [1], research additional approaches and compare them to identify the most feasible approach. The second part of the project is the implementation of a minimal proof of concept application of the identified best solution for reading and writing from and to different versions of an RDF file in a programming language of choice.
Required skills: Git, RDF, programming language of choice
Participating in this project exposes the participant to the concepts of the Semantic Web in general and the handling of Linked Data in particular as well as pushing the knowledge of git, the RDF standard and SPARQL to its limits.
[1] http://ruben.verborgh.org/publications/vandersande_ldow_2013/
Mentor: Michael Sonntag (German INCF Node), Satrajit Ghosh (MIT)
12.1 Multiscale Model Grid using MOOSE
Neurons compute using a combination of electrical and chemical signals. This is particularly important in neuronal plasticity and memory, where there is a close interplay between different kinds of signaling.
This project is to build a grid of multiscale neuronal models that incorporate both electrical and chemical signaling events in plasticity. There are already numerous electrical-only cell models, and a similar variety of biochemical-only models of chemical computations during learning. It is very interesting to see how the diverse kinds of models behave when put together. This exercise will also explore a wide range of use cases for further development of the GUI and core capabilities of MOOSE, The Multiscale Object-Oriented Simulation Environment.
Skills: Python a must. XML based standards useful. NeuroML or SBML experience a plus.
More details: Detailed technical specification (PDF)
Mentors: Upi Bhalla, Dilawar Singh (Indian INCF Node)
12.2 GPU implementations for MOOSE
Neuronal and multiscale simulations are computationally demanding, yet they require large numbers of very similar calculations. This problem domain is well suited for GPU computation. In this project students will take further the work that has already been done on preparing a GPU-enabled numerical solver for MOOSE. This numerical engine will be used to speed up calculations of neuronal dynamics for multiscale models.
Skills: The project requires good C++ knowledge and recommended experience with MPI and/or one of the GPU environments like OpenCL and/or CUDA.
More details: Detailed technical specification (PDF)
Mentors: Upi Bhalla, Aviral Goel (Indian INCF Node)
13. iRODS client for ImageJ
ImageJ (http://rsbweb.nih.gov) is a public domain Java image processing program extensively used in life sciences. The program was designed with an open architecture that provides extensibility via Java plugins. User-written plugins make it possible to solve almost any image processing or analysis problem or integrate the program with 3rd party software. The project aims at developing a data upload and download client for the cloud storage service Integrated Rule-Oriented Data System iRODS. The plugin will be very useful application for scientists working in different labs where they want to share data and collaborate.
iRODS (http://irods.org/about/overview/) is open source data management software used by research organizations and government agencies worldwide. It virtualizes data storage resources, so users can take control of their data, regardless of where and on what device the data is stored.
Aims: The project will develop an ImageJ plugin which will provide users with a Graphical User Interface (GUI) to upload and download datasets :
-
binary files: images,
-
text files
-
folder trees
using an suitable Java API for iRODS, for example https://github.com/DICE-UNC/jargon .
A similar plugin has been developed for GSOC 2014 for DropBox (http://atin007.github.io/dbclient/) which can be used as a starting point. Eventually the student may decide implementing a plugin supporting both protocols and accounts. Mentors will help set up the infrastructure for development and provide guidance and documents for the candidate for developing the plugin.
Required skills: Experience with Java
Desired skills: experience with JSON, REST-full web services, ImageJ
Mentors: Dimiter Prodanov, INCF Belgian Node, Visakh Muraleedharan, INCF
14. StackReg Plus: Improved multiple image alignment with ImageJ
ImageJ is an open source Java based image processing program extensively used in life sciences. The main functionality of the Turboreg plugin (http://bigwww.epfl.ch/thevenaz/turboreg/) consists of aligning or matching two images, one of them called the source image and the other the target image. The plugin is widely used in Neuroscience for pre-processing of both static and time-lapse imaging data.
After completion of the registration process, the plugin uses the final geometrical transformation of the source and target landmarks to create a warped image that has the size of the target and that contains a distorted version of the source. The rigid transformation is such that the source is well mapped to the target. TurboReg is well known to be robust to noisy data and fast, like the calcium images, thank to the multi-resolution approach and to smart optimization scheme. Nevertheless the limitations that the neuroscientists are faced with are the following:
- There is no export of the transformation which allows processing only of single channel images
- Limitation to process large datasets because the data should be in RAM
- No parallelized which limits the performance
- Limited possibility to tune the reference image
- No control the coherence of transformation over time, do not have any dynamic model of transformation.
- No constraints on the transformation. Typically, experimenters want to limit the transformation to a small drift.
Aims: The project will implement add-on functionality which will allow automation of pre-processing tasks and improved reproducibility of the results. The minimal goals of the project will be to
-
import and export registration transform settings
-
add preview functionality
-
add functionality to define the reference image based on some easy user-defined rules, e.g. the mean of the first 100 images,
-
iterate over the images, one image per files.
-
propagate transforms across different images/ fluorescent channels
-
parallelization of the registration algorithm.
-
implement automatic testing mode for goodness of fit of registration.
Required skills: Experience with Java
Desired skills: experience in ImageJ, knowledge of Linear Algebra/Algebraic Geometry
For more details, see a longer description here (link to Google Doc)
Mentors: Daniel Sage, EPFL (daniel.sage@epfl.ch); Dimiter Prodanov, INCF Belgian Node (dimiterpp@gmail.com); Tomasz Konopczyński , INCF Belgian Node; (konopczynski.tomasz@gmail.com)
15. Power/ARM support for the Cyme library
Vectorization is a requirement in High Performance Computing to maximize hardware utilization. Compiler hints provide the most commonly used vectorization method. However such a method tends to be suboptimal for implementations that includes complex code with lots of indirections. To alleviate this issue, the Blue Brain project has developed an alternative approach and software called Cyme. It hides vectorization complexity from the application developer stand point through the use of high level data containers and automated low level SIMD code generation tailored to the targeted hardware. As a first proof of concept we implemented a reduced set of computational neuroscience kernels which maximize utilization of the IBM Blue Gene/Q, Intel x86 and MIC hardware. Building on these results, the integration of cyme into NEURON open source software package which aims at modeling the electrical activity of neuronal network is being developed. It relies on reimplementing NEURON high level DSL conversion step using Cyme library instead of translating NEURON DSL into C code which the compiler later on tries to vectorize. As the HPC team of the Blue Brain project aims at supporting and optimizing NEURON and its reduced version referred as coreNeuron on all available HPC platforms, we would like to extend Cyme support to two other backends: Power and ARM processors.
Cyme is a C++ library for vectorisation exploiting a user-friendly container approach. Based on template meta-programming and template expression trees, Cyme generates optimized assembly SIMD code for the specific targeted hardware. It has been demonstrated that the generated code can achieve the maximum performance of the processor. Cyme provides containers following different memory layouts (AoS-serial or AoSoA-SIMD computation) constructed over a `composite' SIMD vector class. Computational kernels based on Cyme containers have the same syntax regardless of the memory layout, but the generated assembly will be serial or SIMD depending on the type of the declared container.
Every expression is transformed at compile time into a directed acyclic graph. Each node of this graph is a mathematical operation that is associated with a final back-end for a given architecture.
Skills needed: The student should know C++, SIMD intrinsics programming and the basics of template meta-programming. A minimum background in mathematics is needed to understand the mathematical library code (Newton-Raphson, Horner's method). Knowledge in assembly language and floating point representation would be a real plus. Familiarity with the Unix shell and command line tools will be necessary as the student will be working remotely on the BBP machines. As the regression tests for Cyme are based on Boost-test, familiarity with unit testing, test driven development and Boost test will be considered as an asset.
Optional - potential project branches:
This project would involve adding functionality to Cyme in one or more of the following areas:
• adding Power7/8 backend, based on the existing Power A2 backend (main objective)
• adding ARM backend (single precision only)
• Optimisation of the DAG expression representations (technically fun)
More info: view the Blue Brain/cyme framework on github
Mentors: Timothée Ewart - BBP <timothee.ewart@epfl.ch>, Sam Yates - BBP <sam.yates@epfl.ch>, Cremonesi Francesco - BBP <francesco.cremonesi@epfl.ch>, Fabien Delalondre <fabien.delalondre@epfl.ch>
16. Toolboxing network complexity for connectomes: New code for big samples in C-PAC
C-PAC (Configurable Pipeline for the Analysis of Connectomes, http://fcp-indi.github.io/) is a new, developing tool designed to study connectivity in brain images, transformed into connectomes. It allows us to work with big datasets using computer clusters. Integrating some of the most successful imaging environments to date (FSL, AFNI, ANTS), and advancing several innovative methods in preprocessing (automatic quality control, motion correction, regional homogeneity), it provides researchers with the chance to test new graph and other complexity measures to researchers. Complexity measures (graph and non-linear time series) are currently the best tools to study functional network properties and the design of the brain topological properties.
Aims: In contact with mentors, C-PAC developers and its forums, the successful student will develop very valuable and innovative tools that will help researchers to perform new analysis in connectomics:
- To add/expand a set of python scripts (within the scope of the developers' documentation; http://fcp-indi.github.io/docs/developer/developer/index.html), oriented to the analysis of non-linear series' within the time series workflow, thus giving a comprehensive set of analysis tools in this area to the researcher. Some of the proposed, such as sample and transfer entropies, multifractal analysis and surrogate testing, would help to test dynamical properties not available in current packages.
- From a different perspective, this project is also aimed to produce code and integration of new network connectivity (within centrality and complexity structure), complementing the existing ones, in the frame of C-PAC's analysis/networks workflow and providing a more comprehensive set of tools to describe the its design traits and vulnerabilities.
- To put the test these tools to test and document its performance with the study of available databases (ie. 1000 connectomes datasets) and the properties that pinpoint in brain function. It will also be expected to help in the integration of workflows and components, possibly interacting with the C-PAC community.
Skills:
- Essential: Hands-on experience in python, possibly in Nipype and sofware developing. Familiarity with cluster computing (SGE) and software carpentry / collaborative software (Github). Basic knowledge on scientific implementation. Basic knowledge on functional imaging processing.
- Desirable: C++ and/or Fortran or Perl. Working knowledge on imaging preprocessing and frequency analysis . Knowledge on imaging packages (FSL, AFNI). Knowledge in graph methods, time-series complexity analysis and information theory.
Mentors: Ivan J Roijals (KI, ij.roijals@gmail.com), Cameron Craddock (cameron.craddock@childmind.org, Victor M Eguiluz (IFISC-UIB, victor@ifisc.uib-csic.es)
17. Finite element model of traumatic brain injury
The study of brain injury is an important area in neurological injury, recovery and preventive medicine. Brain injury parameters can be expressed in terms of pressure, shearing stresses or invariants of the strain tensor. However, a careful model of injury types and verification using imaging data still needs to be undertaken.
Aims: The goal of this project is to establish a traumatic brain injury (TBI) model using finite element modelling. The following steps will be taken: 1) modelling of healthy brain using mechanical model such as Strasbourg University Finite Element Head Model. 2) introduction of brain injury types (deformation, shear) to establish a computational model of TBI. 3) verification of the computational model of TBI using advanced brain imaging data.
Skills:
- experience with at least one other widely-used scientific programming language (e.g. C++, Matlab, R)
- advanced mathematics.
18. A platform for neuroimaging workflows in distributed and heterogeneous environments
The need for workflows in processing and analysing neuroimaging data has become imperative due to large data sets and the need for replicable and reproducible science. There have been several workflow engines proposed with nipype being the current community lead, best practice solution for neuroimaging workflows. However nipype workflows are tied to their data sources and execution environments in their source code. By decoupling and templating the data and execution environments from the workflow description, it is possible to make workflows more generic and easy to use.
Aims: The goal of this project is to develop a platform and GUI application in python for configuring and remotely launching nipype workflows in a distributed and heterogeneous environment where the data and compute resources may be on different systems to each other and the end user. An early prototype for the system exists which will need to be redesigned and implemented so that it is useful to the broader neuroimaging community.
Skills:
1) Extensive experience with python.
2) Software design and development experience.
3) Knowledge of nipype, neuroinformatics systems such as xnat and job scheduling and management systems is a plus.
Mentors: Parnesh Raniga PhD, Monash University (parnesh.raniga@monash.edu), Toan Nguyen PhD, Monash University (toan.nguyen@monash.edu)19. Next generation biological simulator user interface for neuroscientists
OpenWorm has been developing Geppetto, an open-source web-based multi-algorithm, multi-scale simulation platform engineered to support the simulation of complex biological systems and their surrounding environment. The platform aims at integrating multiscale models whose simulation depends on a variety of algorithms which need to interoperate. Great focus has been put on the engineering of the platform, aimed at having a robust and generic solution that can be applied to different classes of models. OpenWorm aims to make advancements in the OpenWorm simulation available to scientists and general public through Geppetto.
Geppetto supports NeuroML (a standard model specification for computational neuroscience models of cells and networks) and a way to simulate neuroscience models through jLEMS. Support for running NEURON simulations from the web is currently being added and will be implemented by the end of June 2015.
Geppetto functionality has been built with a strong focus on its API, both server side and client side with Javascript to ensure reproducibility and scripting capabilities. The console based interactions are ideal for developers and testers in order for scientists to easily access all the existing functionality.
A live demo of Geppetto can be found at https://live.geppetto.org and the documentation can be found at http://docs.geppetto.org. The public development kanban board can be found at https://waffle.io/openworm/org.geppetto. Additional samples are available (Hodgkin-Huxley Cell, Auditory cortex Network, and Fluid dynamics simulation).
Aims:
Geppetto’s ultimate goal is to offer an intuitive user interface for less technical neuroscientists. This project will add significant user interface and user experience improvements to the platform to accomplish this. The following features represent some of the specific improvements of this aim:
-
Ability from GUI to view available widgets and easily add them (e.g. plot simulation variables, network connectivity widget, etc.)
-
Ability for widgets to exchange information via intuitive GUI actions, e.g. drag & drop a variable from the model view into a plotting widget results in plotting the variable
-
Easy display and access of the computational model for the selected entities from GUI
-
Allow the user to reduce widgets to icons, anchor widgets at chosen locations, persist views state, all from GUI
-
Addition of a “voltage/current clamp” widget enabling reading variables and setting parameters in a given compartment or in the whole cell by drag & drop of a virtual pipette to the 3D model
Skills:
Essential: Javascript, HTML5, CSS, Open source development
Desired: Backbone, WebGL, Java, UI/UX, Computational neuroscience training
Mentor: Matteo Cantarelli matteo@openworm.org
20. Modelling real dynamics of muscles and neurons
The OpenWorm project is building a simulation of the c. elegans in an open science fashion. Currently we are working on a 3D neuromechanical model of the c. elegans. The ultimate aim is to incorporate muscle and neuronal models with detailed morphologies and the known complement of ion channels which give rise to the cells’ electrical behaviour. Synaptic connectivity based on the identified chemical & electrical synapses in the c. elegans connectome will be incorporated. However, there is a lack of biophysically detailed models of c. elegans cells and synapses.
Aims: The successful applicant for this project will contribute to the effort to build biophysically detailed models of c. elegans cells and synapses being assembled in the ChannelWorm repository on GitHub. integrate and structure data related to ion channels in C. elegans, from genotype to phenotype. They will model ion channels based on available patch clamp studies, estimate kinetics and build models for ion channels with no patch clamp data available. They will further simulate and run the computational phase of a patch clamp experiment as well as customized versions of ion channel(s) in cell(s) and heck if the simulation fits the biological boundaries. Further, they will search the literature for computational models incorporating biophysically detailed cells and networks from c. elegans and other small invertebrates (e.g. drosophila, other nematodes, leech, tritonia, cliona), and create an online resource summarising these models, e.g. on Open Source Brain. All the models are generated in NeuroML2 format and all the simulations in LEMS, and the verified models can be run in the Geppetto simulation platform.
Skills: Essential: experience with biophysically detailed neuronal modelling OR first hand experience with invertebrate physiology (ideally in an experimental lab) plus programming skills. Desired: publication record in either of these areas. Python. Experience with open source development.
Mentor: Stephen Larson stephen@openworm.org and Padraig Gleeson p.gleeson@ucl.ac.uk
Further information
- The Google Summer of Code home page: www.google-melange.com/
- 2014, thirteen students were accepted and mentored by volunteers from the INCF scientific community. Read more about the projects at incf.org/gsoc/2014
- INCF's public developers list: http://lists.incf.org/mailman/listinfo/incf-developers
INCF is also prepared to serve as an umbrella mentoring organization for high-profile neuroinformatics projects. If interested, please contact gsoc@incf.org.