In addition, many data mining tools such as the R environment (R Development Core Team, 2004) and the WEKA library (Witten and Frank, 2005) are becoming freely available to incorporate into prescreening process; for example, finding structure-activity relationship and predicting activity from structure.

s MOL Explorer shown in Figure 1 has been developed as a collection of Java Server Pages (JSP) and Servlets running on the Apache Tomcat web server, utilizing My SQL database management and several chemical informatics libraries such as CDK, JOELib and Open Babel for data manipulation, and connecting to various data analysis and mining methods from the Weka library and R statistical environment.After the installation, only a web browser is needed for using s MOL Explorer.s MOL Explorer is a centralized system that allows the registered users to create a database of small molecules in two ways: direct entry and data upload.In mode of direct entry, users can add a structure of small molecule into database via the web with several options: After submitting the structure file, user can enter the associated screening data.For data upload, users can prepare structure and screening data in either SDF file or s MOL-defined XML file and upload into the database.Summary: s MOL Explorer is a 2D ligand-based computational tool that provides three major functionalities: data management, information retrieval and extraction and statistical analysis and data mining through Web interface.

With s MOL Explorer, users can create personal databases by adding each small molecule via a drawing interface or uploading the data files from internal and external projects into the s MOL database.Then, the database can be browsed and queried with textual and structural similarity search.The molecule can also be submitted to search against external public databases including Pub Chem, KEGG, Drug Bank and e Molecules.Moreover, users can easily access a variety of data mining tools from Weka and R packages to perform analysis including (1) finding the frequent substructure, (2) clustering the molecular fingerprints, (3) identifying and removing irrelevant attributes from the data and (4) building the classification model of biological activity.Availability: s MOL Explorer is an Open Source project and is freely available to all interested users at Contact: To increase the success rate in laboratory and expedite research for drug discovery, databases and software tools are basically required for computational prescreening of compound libraries.Especially, the databases of compounds known to the desired biological activity are very important to be prepared in the early stages of a virtual screening project.

