The purpose of this Technical Architecture is to define the technologies, products, and techniques necessary to develop and support the system and to ensure that the system components are compatible and comply with the enterprise-wide standards and direction defined by the Agency.
dataZense Technical Architecture
Technical Architecture Objectives ChainSys’ Smart Data Platform enables the business to achieve these critical needs. 1. Empower the organization to be data-driven 2. All your data management problems solved 3. World class innovation at an accessible price Subash Chandar Elango Chief Product Officer ChainSys Corporation Subash's expertise in the data management sphere is unparalleled. As the creative & technical brain behind ChainSys' products, no problem is too big for Subash, and he has been part of hundreds of data projects worldwide. Introduction This document describes the Technical Architecture of the Chainsys Platform Purpose The purpose of this Technical Architecture is to define the technologies, products, and techniques necessary to develop and support the system and to ensure that the system components are compatible and comply with the enterprise-wide standards and direction defined by the Agency. Scope The document's scope is to identify and explain the advantages and risks inherent in this Technical Architecture. This document is not intended to address the installation and configuration details of the actual implementation. Installation and configuration details are provided in technology guides produced during the project. Audience The intended audience for this document is Project Stakeholders, technical architects, and deployment architects The system's overall architecture goals are to provide a highly available, scalable, & flexible data management platform Architecture Goals A key Architectural goal is to leverage industry best practices to design and develop a scalable, enterprise-wide J2EE application and follow the industry-standard development guidelines. All aspects of Security must be developed and built within the application and be based on Best Practices. Platform Component Definition Foundation Smart data platform Smart Business Platform Security Authentication / Authorization / Data Migration Data Quality Data Masking Used for Autonomous Rapid Application Crypto Management Regression Testing Develiopment (RAD)Setup Migration Data Compliance Framework Master Data (PII, GDPR, CCPA, Used for Load and Test data Prep Governance OIOO) Performance Testing Visual Development User Management ApproachBig Data Ingestion Analytical MDM Data Cataloging User / Groups (Customer 360, Drag & Drop Data Archival Supplier360, Data Analytics Design Tools Roles / Product 360) Data Reconciliation Data Visualizat Functional Components Responsibility into Visual Workflow Access Manager Data Integration Base Component Workflow Versioning Notification Logging Scheduler Object Manager Gateway Component API Gateway The Platform Foundation forms the base on which the entire Platform is built. The major components that create the Platform Foundation Platform are described in brief. Security Management User Management Federated Authentication Credential Authentication Users Role Hierarchy SAML JWT Platform Authentication User Groups Responsibilities OAuth2.0 LDAP AD Object Access Manager Authentication Service Credential Authenticator SSO Authenticator Crypto Engine Base Components Hashing Algorithm Authorization Engine MD5 Platform Object Manager Org/License Authorization SHA1 App / Node Authorization Asymmetric Encryption Workflow Object Sharing AES 128 Access Authorization AES 256 Constructs Sharing Manager Dependent Sharing Approvals Activities Scheduler Gateway Component SLA Job Schedular Job Feedback Platform API Logging Collaborate Versioning Login API Job Feedback Application Logs EMAIL SVN Execution Logs Web Notification GIT API Gateway Engine Audit Logs Chats Database REST API Publisher SOAP Service Publisher The component manages all the Roles, Responsibilities, Hierarchy, Users, & User Management User Groups. Responsibilities Roles Users The Platform comes with The Platform comes with The users will be assigned the preconfigured the predefined Roles for these applications that are Responsibilities for dataZap, dataZap, dataZen, and necessary for them. The dataZen, and dataZense. dataZense. Organizations User will be given a Role. Organizations can customize can create their Roles. The hierarchy formed using Responsibilities and are The Role-based hierarchy is the role hierarchy setup assigned to the platform also configured for the user where a manager from the objects with additional level hierarchy. The roles are next role is assigned. privileges. assigned with the default The responsibility against responsibilities. these roles is set by default for the users. The User can be given more responsibilities or revoked an existing responsibility against a role. Users gain access to the objects based on the privileges assigned for the responsibility. The security management component takes care of the following Security Management SSL The Platform is SSL / HTTPS enabled on the transport layer with TLS 1.2 support. The SSL is applied to the nodes exposed to the users like the DMZ nodes and Web nodes and the nodes exposed to the third-party applications like the API Gateway nodes. Authentication Engine The Platform offers a Credential based authentication handled by the Platform and also Single Sign-On based federated authentication. Both SSO and Credential authentication can co-exist for an organization. Credential Authentication User authentication on the Platform happens with the supplied credentials. All the successful sessions are logged, and failed attempts are tracked at the user level for locking the user account. A user can have only one web session at any given point in time. Password policy, including expiry, is configured at the Organization level, applicable for all users. Enforced password complexity like. Min length Max length Usage of Numbers, Cases, and Special Characters can be set. No of unsuccessful attempts are also configurable Single Sign-On SSO can be set up with federated services like SAML, OAuth 2.0, or JWT (Java Web Tokens). Setup for an IDP would be configured against the organization profile, and authentication would happen in the IDP. This can either be IDP initiated or SP (Chainsys Smart Data Platform) initiated. The organization users with SSO would get a different context to login. Authorization Engine The first level of authorization would be the Organization License. The Licensing engine would be used to setup the organization for the authentications too The next level of authentication would be the Applications assigned to the Organization and the respective User. The individual application nodes would be given to the organization as per the service agreement to handle load balancing and high availability Authorization of pages happens with the responsibilities assigned to the users Authorization of a record happens concerning sharing the records to a group or individual users Responsibility and sharing will have the respective privileges to the pages and records On conict, the Principle of least privilege is used to determine the access Authorization Engine The Crypto Engine handles both asymmetric encryption and hashing algorithms AES 128 is the default encryption algorithm but also supports 256 bits The keys are managed within the Platform at the organization level. The usage of keys maintained at the application level determines how they are used for encryption and decryption.tv All the internal passwords are being stored by default with MD5 hashing Encryption of the stored data can be done at the Database layer as needed Authorization Engine The workow engine is created to manage the orchestration of the ow of activities. The workow engine is part of the platform foundation extended by the applications to add application-specic activities. Version Management This component helps in handling the version of objects and records eligible for versioning. The foundation has the API to version the objects and its records and can be extended by the applications to add specic functionalities. Currently, the Platform supports SVN as default and also supports database-level version management. Support for GIT is on the roadmap. Notification Engine The notication engine is the component that will do all the notications to the User in the system. The feature helps notify the users on the page when online in the application. The other notications like Mail notication and Chat Notication are also part of this component. Logging Engine All activity logs, both foundation, and application are handled to understand and help in the debugging. Scheduler Creation It enables you to schedule a job once or regularly. In terms of recurring jobs are planned minutely, hourly, weekly, and monthly. Scheduler Execution The scheduler execution engine uses the configuration and fires the job in the respective application. The next job would be scheduled at the end of each job as per the configuration. Job Monitoring The scheduled jobs are monitored and keep track of the progress and status at any stage. If there is any delay in the expected job or unexpected errors, the responsible users are notified accordingly for actions. The API Gateway forms the foundation for publishing and consuming services with the Platform. API All the eligible jobs or actions can be published for external applications to access. The following are the Gateway Engine components that would form the publishing node. Login Service The login service is the one that authenticates if the requested consumer has the proper authentication or credentials to invoke the job or action. The publisher engine has two methods of authentication. Inline authentication - where all the requests will have the Credential for authentication and access control Session Authentication - This service is explicitly invoked to get the token and gather the other published services using this token to authorize the request. SOAP Service The eligible jobs or actions can be published using the Simple Object Access Protocol (SOAP). SOAP is a messaging protocol that allows programs that run on disparate operating systems to communicate using Hypertext Transfer Protocol (HTTP) and its Extensible Markup Language (XML). REST Service The eligible jobs or actions can be published using the Representational State Transfer Protocol (REST). REST communicates using the HTTP like SOAP and can have messages in multiple formats. In dataZap, we will publish in the XML format or the JSON (JavaScript Object Notation) format. dataZense - dataZense offers full trust to your Data Catalog enterprise data. Find below the components Component of dataZense. End Points EndpointsConnector Execution Engine Execution Controller Profiling Engine Catalog Handler Database RDBMS JDBC Structured Profiler Unstructured Profiler Data Registration Metadata Engine Metadata Engine Data Governance Sampling Engine Form Based Engine Enterprise Applications JCO Relationship Engine OCR Engine {Rest} Data Protection Handler Cloud Applications {Soap} Data Protection Engine Analytical Engine Data Identification Rules Rectification Engine Data Lineage Engine Data Rectification Handler Big Data Lake Virtualization Engine PII Tag Engine Query Execution Business Tag Engine Virtualization Handler OData Query Generator No SQL Databases Catalog Engine Search Engine Registration Query Sequence Enterprise Storage System datazap Extract Adapter Data Object Engine Data Streaming Engine The Platform Foundation forms the base oThni sw choicmhp tohnee enntt hiraen Pdllaetsf otrhme pisr obfuiliilnt.g of tThhee smysatjoerm cso. mIt pcoanne andtds rtehsast tchree aptreo ftihlien g Pofla btofothrm s tarurec tduersecdr iabnedd uinn sbtrrieufc.tured PlaPtrfofrimlin Fgo Eunngdinaetion data sources. Structured Profiling In the structure data sources, it can handle all the endpoints supported by the Platform. We can not only profile relational databases and Big data sources it can also handle but also the cloud data sources using the data connectors we produce. Technical Metadata Engine This Engine catalogs all the metadata information of the source. These are automated to have most of the metadata information. The metadata attributes are configurable to add more business terms as needed by the organization. The relationship between the entities is also suggested. The data citizens would be able to approve and comment on all the suggestions put forward by the Engine. Unstructured Profiling In the unstructured data profiling, we would be able to read and process all the text-based data files from the different connectors that have been provided. The unstructured profile would identify Tet Content, Table Formats, form-based data, and images. We will be able to create a rule-based reading algorithm to categorize and identify data. This component creates a catalog of all the Cataloging Engine profiled Data and helps create a whole meaning and purpose. Advanced Search Engine The search engine provided searches for the Metadata from the catalog and searched all the data indexed across all sources to give a unified search engine. The search results would also pick the data from the respective sources to look at the data. Data Lineage Data lineage traces your Data's origins, movements, and joins to provide insight into its quality. The data lineage also considers the Data that are indexed to give more viable suggestions for the lineage. The data lineage also offers options to indicate the forward and backward propagation of the lineage pipeline's data. Data Security This Engine helps make sure that row level and column-level Security ensure only relevant data are discovered in the catalog. It provides granular access management controls and supports secure and role-based access levels for individuals like the data engineer, data steward, data analyst, and scientist. Access notifications are provided to managers with the complete picture and approval of the request to view the data. Data Citizenship There is a high level of flexibility for the data citizens to view the Metadata and comment and approve on all of the results of the above Engine. This Engine also helps to build a data visualization for all the cataloged data. The Platform Foundation forms the base doant awZheicnhs et hper oetnetcirtes yPolautrf opremrs iosn baul idlta. ta bTrheea cmha bjoyr i dcoemntpifoyninegn atsn dth raetm ceredaiateti nthge DPlaattafo Prrmot Feocut nEdnagtinioen tPhlea tPfoIIr dma taar.e I td seuspcrpiboertds ibno btrhie sft. ructured and unstructured data. Personal Identifier Linked Engine This Engine uses the profiling engine results to determine any piece of personal information that can be used to identify sensitive data like Full name, Home address, Phone Number, Email address, National Identification Number, GovernmentID's, Credit Card Number, etc. Personal Identifier Linkable Engine This Engine uses the profiling results to determine the information that may not identify a person on its own. When combined with another piece of information, it could locate, trace, or locate a person—some examples like Gender, Ethnicity, Job Position, etc. Identification Rule Engine This Engine helps create rules to identify the PII data not given in the standard list of rules from the application. You would also be able to modify the default rules to the needs as per the organizational requirements. Data Lake or Virtualization This is the data movement layer of the Platform. Here data from multiple different sources with different structures are brought into a data lake or virtualization environment. This can be used to catalog the data for creating lineages and provision the Data for users. Data Provisioning This is where the Data is being provisioned to the users to view the data in a unified manner across all the data sources. The application building layer in the Platform is used to have this enabled for the users. dataZense - The Execution Engine will be available on the client-side and at the cloud to handle the Analytics & pure cloud environment and manipulate data Visualization in the cloud for less Load at the client end. The Execution controller will be available in Component the cloud to direct the execution handler in every step. End Points EndpointsConnector Execution Engine Execution Controller Foundation Engine Database RDBMS JDBC Dataset Engine Cube Engine Visualization Engine JCO Access Engine Dimension Engine View Engine Big Data Data Stream Engine Query Engine Dashboard EngineLake Snapshot Engine No SQL Formatting Engine Databases Learning Engine Enterprise Supervised Learning Reinforcement Learning Processing Engine Storage Engine Engine System Data Model Engine Unsupervised Learning Natural Language Report Schedule Engine Enterprise Engine Processing Engine Applications Cloud Applications datazap Extract Adapter Data Object Engine Data Streaming Engine This is the foundation component of all the activities and engines in the dataZense Foundation Engine execution Engine. Dataset Engine This Engine executes the data model defined that would be required for the analyticsof the data. This is the data holder for the reports to convert into the individual cubes and dimensions. Data can be directly fetched from the endpoints using the endpoint connector, or for complicated APl’s data would be fetched through the extract adapter in the dataZen application. Access / Filter Engine This Engine creates filters based on users or views. This can be used to filter records needed for a particular Chart and can be used to filter data to be processed based on the User’s access or privileges set in the dataset model. Data Streaming Engine This Engine streams the manipulated data to the controller to produce the visual effect on the data. Analytics Engine This is the Engine that does all the analytics on the dataset to produce the results. The main components of this Engine are the Cube Engine This Engine generates the cubes from the data in the dataset to run the analytics on the data based on the dimensions and KPI’s configured in the dataset. Query Engine Generates query on the cube to fetch the needed report from the cube. The Data is streamed to the controller to form the visual effects on the data. This is the foundation component of all Visualization Engine the activities and engines in the dataZenseexecution Engine. View Engine Multiple chart types can be created in the visualization component. The User can change the compatible charts based on the dimensions and KPI’s in the runtime. The data used for the views can be downloaded in CSV or Excel format. Dashboard Engine These multiple views are assigned and arranged for a dashboard. We can bring in various views from various datasets into a dashboard. There are options to drill down based on the filters created from one filter to impact or not to impact the other dataset views. The dashboard can be made fluid to change the views at runtime as per the User’s choice. Snapshot Engine The snapshot engine generates the snapshot of a dashboard, not just which is in the browser’s view but the entire dashboard even if not visible. Formatting Engine This component helps to create and handle the formatting of the fonts and colors of the charts. This also addresses the conditional formatting based on the data. Processing Engine Report Scheduling Engine This Engine processes the data and generates the reports/dashboards at a scheduled time. These dashboards and reports can also be sent as attachments to emails. Single Data Management Platform, solves all your data management needs. The Execution Handler will be available on - DidstaritbautZehorizontal aa d np co mputing helps in scalitnhge bcolitehn t-side and at the cloud to handle d vertical. the pure cloud envirSonymsteenmt a nTde mchannipoulolagtey - BaCttolem tepsotende wnitth several Fortune data in the cloud forL laesnsd Lsocaad pate the client 500 Organizations. end. The Execution controller will be available Many Fortune 500 Organizations are iwn ethlle cloud to direct the execution handler poised to select ChainSys for their daitna epvreorjeyc sttse. p. 11 DMZ Nodes DMZ Nodes DMZ Nodes DMZ Nodes Web Application Data Mart APACHE HTTPD Server 11 ApacheTomcat 9 Web Load Balancing Reverse Proxy Forward Proxy Collaborate DATABASEServer ApacheACTIVE MQ Indexing Store single sign API On dimple.jsGateway R Analytics App Data Store Foundation Nodes 12.16 Apache relax V4 Caching Node Selenium WebDriver Schedular Node Default Data Stores Metadata Store Versioning Store File / Log Server DATABASE These nodes are generally the only nodes exposed to the external world outside the DMZ Nodes enterprise network. The two nodes in this layer are the Apache HTTPD server and the "Single Sign O" Node. Apache HTTPD The Apache HTTPD server is used to route the calls to the Web nodes. The server also handles the load balancing for both the Web Server Nodes and the API gateway Nodes. The following features are used in the Apache HTTPD Highly scalable Forward / Reverse proxy with caching Multiple load balancing mechanisms Fault tolerance and Failover with automatic recovery WebSocket support with caching Fine-grained authentication and authorization access control Loadable Dynamic Modules like ModSecurity for WAF etc. TLS/SSL with SNI and OCSP stapling support Web Nodes Single Sign-On This layer consists of the nodes exposed to the This Node is built on the Spring Boot application users for invoking the actions throughfrontend with Tomcat as the Servlet container. or a third-party application asAPI’s. The nodes Organizations opting to have a single sign-on would available in this layer would be theWeb Server have a separate SSO node with a particular context. to render the web pages, API Gateway for other The default context will take them to the applications to interact with the application, and platform-based authentication. the collaborate node for notifications. Web Server The web application server hosts all the web pages of the chainsys platform. Apache Tomcat 9.x is used as the servlet container. JDK 11 is the JRE used for the application. The Platform works on OpenJDK / Azul Zulu / AWS Corretto and Oracle JDK. Struts 1.3 platforms are used as the controllers Integration between the webserver to the application nodes is handled with Microservices based on the SpringBoot The presentation layer uses HTML 5 / CSS 3 components and uses many scripting frameworks like JQuery, d3js, etc. The web server can be clustered to n- nodes as per the number of concurrent users and requests. Gateway Node This Node uses all the default application services. This Node uses the service of Jetty to publish the API as SOAP or REST API. The API Gateway can be clustered based on the number of concurrent API calls from the external systems. The Denial of Service (DoS) is accomplished in both JAX-WS and JAX-RS to prevent illegal attacks. Collaborate This Node is used to handle all different kinds of notifications to the users like front end notifications, emails, push notifications (in the roadmap). This Node also has the chat services enabled that can be used by the applications as needed The notification engine uses netty APIs for sending notification from the Platform. Apache Active MQ is used for messaging the notification from application nodes. The application nodes are spring boot applications for communicating between theother application nodes and web servers. Application Nodes JDK 11 is the JRE used for the application. The Platform works on OpenJDK / AzulZulu / AWS Corretto and Oracle JDK. Load Balancing is handled by the HAProxy based on the number of nodes instantiated for each application. Node Node Node (Analytical Services / Catalog Services) The application uses only The application uses only the the default services that The application uses all the default services that are are mentioned above. default services that are mentioned above. mentioned above. In addition to this, it also uses R analytics for Machine Learning algorithms. It also uses D3 and Dimple JS for the visual layer. The application uses all the default services that are mentioned above. In addition to this, it also uses the Selenium API for web-based automation and Sikuli. The application uses all the default services that are mentioned above. These services are used to configure the custom applications and to generate dynamic web applications as configured. The mobile applications' service would need NodeJS 12.16, which would use the IonicFramework V4 to build the web and mobile apps for the configured custom applications. This Node uses only the default application node services. This Node can be clustered only as failover nodes. When the primary Node is down, the HAProxy makes the Scheduler Node secondary Node the primary Node The secondary Node handles notifications, automatic rescheduling of the jobs. It calls each of the application objects that are schedulable to take all the possible exception scenarios to be addressed. Once the Node is up and running, this will become the secondary Node. Data Storage Nodes Database Chainsys Platform supports both PostgreSQL 9.6 or higher and Oracle 11g or higher databases for both Metadata of the setups and configurations of the applications Data marting for the temporary storage of the data. The Platform uses PostgreSQL for the Metadata in the cloud. PostgreSQL is a highly scalable database. Designed to scale vertically by running on more significant and faster servers when you need more performance 1 Can be configured to do horizontal scaling, Postgres has useful streaming replication features so you can create multiple replicas that can be used for 2 reading Data It can be easily configured for High Availability based on the above. 3 Password Storage Encryption PostgreSQL offers encryption Encryption For Specific Columns at several levels and provides flexibility in protecting data Data Partition Encryption from disclosure due to database server theft, unscrupulous Encrypting Passwords Across A Network administrators, and insecure networks. Encryption might Encrypting Data Across A Network also be required to secure s ensitive data. SSL Host Authentication Client-Side Encryption Multi-tenant database architecture has been designed based on the following Separate Databases approach for each tenant Trusted Database connections for each tenant Secure Database tables for each tenant Easily extensible Custom columns Scalability is handled on Single Tenant scaleout Cache Server Redis cache is used for caching the platform configuration objects and execution progress information. This helps to avoid network latency across the database and thus increases the performance of the application. When the durability of Data is not needed, the in-memory nature of Redis allows it to perform well compared to database systems that write every change to disk before considering a transaction committed. The component is set up as a distributed cache service to enable better performance during data access. Redis cache can be made HA enabled clusters. Redis supports master-replica replication File Log Server This component is used for centralized logging, which handles the application logs, execution logs, and error logs in the platform applications' common server. Log4J is used for distributed logging. These logs can be downloaded for monitoring and auditing purposes. A small Http service gets executed, which allows the users to download the file from this component—implemented with the Single Tenant scaleout approach. Subversion (SVN) Server Apache Subversion (abbreviated as SVN) is a software versioning and revision control system distributed as open-source under the Apache License. The Platform uses SVN to version all the metadata configurations to revert in the same instance or move the configurations to multiple instances for different milestones. All the applications in the Platform use the foundation APIs to version their objects as needed. Loader Adapters, Data Model, Data Set, Object Model, Data Objects, Rules, Views, Layouts, Data Extracts, Augmentations, Dashboards, Workflow Data Flows, Workflow Ad-hoc Reports Process Flows, Migrations Flows, Reconciliations ChainSys Platform uses SOLR for the data cataloging needs as an indexing and search engine. Solr is an open-source enterprise-search platform. Its major features include full-text search, hit highlighting, faceted search, real-time indexing, dynamic clustering, SAchpeadchulee rS ONLoRde database integration, NoSQL features, and rich document handling. Apache Solr was used over the others for the following reasons. Real-Time, Massive Read, and Write Scalability Solr supports large-scale, distributed indexing, search, and aggregation/statistics operations, enabling it to handle large and small applications. Solr also supports real- time updates and can take millions of writes per second. SQL and Streaming Expressions/Aggregations Streaming expressions and aggregations provide the basis for running traditional data warehouse workloads on a search engine with the added enhancement of basing those workloads on much more complex matching and ranking criteria. Security Out of the Box With Solr, Security is built-in, integrating with systems like Kerberos, SSL, and LDAP to secure the design and the content inside of it. Fully distributed sharding model Solr moved from a master-replica model to a fully distributed sharding model in Solr 4 to focus on consistency and accuracy of results over other distributed approaches. Cross-Data Center Replication Support Solr supports active-passive CDCR, enabling applications to synchronize indexing operations across data centers located across regions without third-party systems. Solr is highly Big Data enabled Users can storeSolr’s data in HDFS. Solr integrates nicely with Hadoop’s authentication approaches, and Solr leverages Zookeeperto simplify fault tolerance infrastructure Documentation and Support Solr has an extensive reference guide that covers the functional and operational aspects of Solr for every version. Solr and Machine Learning Solr is actively adding capabilities to make LTR an out of the box functionality. Chainsys Platform uses CouchDB for mobile applications in the Application Builder module. PostgreSQL would be the initial entry point for the Dynamic Web Applications. Apache CouchDB The data in the PostgreSQLwill sync with CouchDB if mobile applications are enabled. In contrast, the initial ntry point for the Dynamic Mobile Applications would be in the PouchDB. CouchDB syncs with the PouchDB in the mobile devices, which then syncs with PostgreSQL. The main feature for having CouchDB are CouchDB throws the HTTP and REST as its primary means of communication out the window to talk to the database directly from the client apps. The Couch Replication Protocol lets your Data flow seamlessly between server clusters to mobile phones and web browsers, enabling a compelling offline-first user-experience while maintaining high performance and strong reliability. Another unique feature of CouchDB is that it was designed from the bottom-up to enable easy synchronization between different databases. CouchDB has JSON as its data format. Distributed Mode Chainsys Smart Data Platform is a highly Deployment at distributed application and with a highly Customer scalable environment. Most of the nodes are horizontally and vertically scalable. DMZ Services VM APACHE HTTPD Server single sign on Web Container Cluster Database Layer Web Page Services Collaborate Services API Gateway Versioning VM Node1 Node n Node1 Node1 Node n Foundation Services Cluster Database Culster DATABASE Foundation Services Cluster File/Log Services Scheduling Services Primary Node Secondary Node Metadata Metadata Caching Node Node1 Primary Node Secondary Node Datamart Datamart SOLR Cluster Smart Data Platform Cluster Master Node Stave node Core 1 Core 1 Core 2 Core 2 Web Page Services Web Page Services Web Page Services Web Page Services Apache CouchDB Cluster relax Node1 Node n Node1 Node n Node1 Node n Node1 Node n Node 1 Node 1 Doc 2 Doc 2 Doc 2 Doc 2 Design & Process Layout Build Layout Rendering Node1 Node n Node1 Node n Node1 Node1 Node n Load Balancer DMZ Nodes Apache HTTPD would be needed in a distributed environment as a load balancer. This would also be used as a reverse proxy for access outside the network. This would be a mandatory node to be available. SSO Node would be needed only if there is a need for the Single-Sign-On capability with any federated services. Web Cluster Chainsys recommends having a minimum of two web node clusters to handle high availability and Load balanced for better performance. This is a mandatory node to be deployed for the Chainsys Platform. The number of nodes is not restricted to two and can be scaled as per the application pages’ concurrent usage. The Collaborate node generally is a single node, but the Node can be configured for High Availability if needed. Gateway Cluster The API Gateway Nodes are not mandatory to be deployed. It would be required only when there is a need to expose the application APIs outside the Platform. When deployed, Chainsys would recommend having a two-node cluster to handle high availability and load balancing in high API call expectations. The number of nodes in the clustered can be determined based on API calls’ volume and is not restricted to two. Application Cluster The HAProxy or Apache HTTPD acts as the load balancer. All the calls within the application nodes are handled based on the node configuration. If the Apache HTTPD is used in the DMZ for Reverse Proxy, it is recommended to have HAProxy for internal routing or a separate Apache HTTPD. The number of nodes in the cluster is not restricted to two. Individual application nodes can be scaled horizontally for load balancing as per the processing and mission-critical needs. Integration Cluster is a mandatory node that will be deployed in the Platform. All the other applications depend on this application for all the integration needs. Visualization Cluster is also a mandatory node that will be deployed in the Platform. All the other applications depend on this application for all the dashboard report needs. The visualization uses the R Studio Server for Machine Learning capabilities. It is needed only when the Machine Learning algorithms are to be used. When deploying the MDM, the ”Smart Application Builder” node would be needed for the dynamic layout generation and augmentation. The vice versa doesn’t apply as ”Smart Application Builder” is not dependent on the MDM nodes. NodeJS would be needed only when mobile applications are to be dynamically generated. The Apache HTTPD server will handle load balancing. The Scheduler cluster would be needed even if one of the applications use the scheduling capability. The cluster would only be a High Availability (Failover) and not load balanced. The number of nodes is restricted to two. Data Storage Nodes Generally, the PostgreSQL database would be configured for High Availability as an Active - Passive instance. Depending on the number of read-write operations, it can be Load balanced too. This can be replaced by Oracle 11g or more significant if the client wants to use the existing database license. File Server would be needed only if there is no NAS or SAN availability to mount the same disk space into the clusters to handle the distributed logging. The NFS operations for distributed logging would require this Node. SVN server would be mandatory to store all the configuration objects in the repository for porting from one instance to the other. Generally, it would be a single node as the operation on this would not be too high. REDIS is used as a cache engine. It is mandatory for distributed deployment. This can be configured for high availability using the master-slave replication. SOLR would be needed only if data cataloging is implemented, and search capability is enabled. This can be configured for High Availability. SOLR sharding can be done when the Data is too large for one Node or distributed to increase performance/throughput. CouchDB would be needed only if dynamic mobile applications are to be generated. CouchDB can be configured for high availability. For better performance, Chainsys recommends having individual instances of CouchDB for each active application. Single Node does not mean that literally. Here we would say that all application services produced by the ChainSys Platform are deployed in a Single Node or Server. Deployment at The rest of the data storage nodes are separate servers SCinugslteo Nmoedre or nodes. This type of installation would generally be for a patching environment where there are not too many operations. These would also be recommended for non-mission critical development activities where high availability and scalability are not a determining factor. DMZ Services VM single sign on APACHE HTTPD Server Foundation Package Application Services VM Foundation Package Apache Web Services 9 Apache Collaborate Tomcat ACTIVE MQ services Foundation Services File / log Server Scheduling Services Caching Service Smart Data Platform™ ™ ™ ™ Analytics Catalog Services Services Services Services Smart BOTS™ Smart App Builder™ Design & Process Layout Build & Services Services Render Services Database VM Versioning VM NoSQL VM Indexing VM Metadata / Datamart DATABASE DMZ Nodes Apache HTTPD would be needed only if a reverse proxy is required for access outside the network. This is not a mandatory node for a Single Node installation. SSO Node would be needed only if there is a need for the Single-Sign-On capability with any federated services. Application Server There will be just one Apache Tomcat as the web application service and will not be configured for high availability. Collaborate service will have the Apache ActiveMQ and the spring integration service. The API Gateway would be required only if the objects are to be published as a REST API or SOAP Service. This service can be shut down if not needed. The Integration Service, Visualization Service, and Scheduler Service would be mandatory services running. The rest of the applications would be running or shut down depending on the license and need. Data Storage Nodes PostgreSQL would be in a separate node. Chainsys does not recommend having the applications and the Databases on the same machine. SVN server would be mandatory to store all the configuration objects in the repository for porting from one instance to the other. SOLR would be needed only if data cataloging is implemented, and search capability is enabled. CouchDB would be needed only if dynamic mobile applications are to be generated as a separate node. Built-in Configuration management approaches for check-in and check-out Instance Strategy without leaving ChainSys Platform.- Gives a great Software development lifecycle process for your projects. - All your work is protected in a secure location and backed up regularly. DEV TST/QA PRD DEV Meta DB TST Meta DB PRD Meta DB Generally, the above instance propagation strategy is recommended. Depending on the applications in use and the Load, it could be determined to go with a single node deployment or a distributed model deployment. Generally, it is recommended to have a distributed deployment for Production instances. The adapters are forward propagated using the SVN repository. All the instances need not follow the same deployment model. For the reverse propagation of the example from Production to Non-Production instances, we can clone the application and the data storage layer and have the node configurations re-configured to the lower instances. ChainSys Platform is available on the cloud. The Platform has been hosted as a Public Cloud and also has the Pure Cloud Deployment Private Cloud options. Private Cloud Virtual Network 2 On-Premise Network - Prod Data Subnet - Tenant x Prod Application Subnet Prod DMZ Subnet Tenant x Slite to slite APACHE HTTPD Server Gateway Dev Data Subnet - Tenant x Dev Application Subnet Dev DMZ Subnet Tenant x Tunnel APACHE HTTPD Server Public Cloud Virtual Network 1 On-Premise Network - Prod Data Subnet - Tenant n Tenant x Slite to slite Prod Application Subnet Prod DMZ Subnet Dev Data Subnet - Tenant 1 GatewayAPACHE HTTPD Server Tenant x Tunnel On-Premise Network - Prod Data Subnet - Tenant n Tenant x Slite to slite Dev Application Subnet Dev DMZ Subnet Gateway Tunnel Dev Data Subnet - Tenant 1 APACHE HTTPD Server Tenant x Public Cloud The Site would handle connectivity to the Customer Data Center to Site tunneling between the Tenants Data Center and the ChainSys Data Center. Individual Gateway Routers can be provisioned per tenant. Tenants will share the same application and DMZ node clusters except the data storage nodes. If a tenant needs to be assigned a separate application node for the higher workloads, we can have the particular application node-set only for that specific tenant. As mentioned earlier in the Database section, Multi-Tenancy is handled at the database level. Tenants will have separate database instances The databases would be provisioned based on the license and the subscription. Depending on the workload on the nodes, each Node can be clustered to balance the Load. Private Cloud Customers (Tenants) will have all applications, DMZ nodes, and data storage nodes assigned to the specific tenant and are not shared. Depending on the workload on the nodes, each Node can be clustered to balance the Load. The application nodes and databases would be provisioned based on the license and subscription. Hybrid Cloud Hybrid Cloud Deployment DMZ Services VM Web Nodes Application Clustvaer Nodes Data Store Apache Data Mart Tomcat 9 DATABASE Collaborate Server OutSide World API Gateway Analytics Services Indexing Store Datamart DMZ Nodes Catalog Services Caching Node DATABASE Executable Metadata Store End Points single sign on DATABASE Analytics Executable Versioning Store APACHE HTTPD Server Design & Process Catalog Executable Schedular Node App Data Store Executable File / Log Server Apache Data Center relax Client Data Centre Application Deployment Node Layout Build This can be associated along with both the Private or Public cloud. An Agent would be deployed in the client organization’s premises or data center to access the endpoints. This would avoid creating the Site to Site tunnel between the Client Data Center and the ChainSys Cloud Data Center. There is a proxy (Apache HTTPD Server) on both sides, the ChainSys Data Center and the Client Data Center. All the back and forth communications between the ChainSys Data Center and the Agent are routed through the proxy only. The ChainSys cloud sends instructions to the Agent to start a Task along with the Task information. The Agent executes the Task and sends back the response to the cloud with the Task’s status. The Agents (for dataZap, dataZense, and Smart BOTS) would be deployed. For dataZap, we can use the existing database (either PostgreSQL or Oracle) for the staging process. The Agent executes all integration and migration tasks by connecting directly to the source and target systems, validating and transforming data, and transferring data between them. For dataZen and Smart App Builder, data would be streamed to the Chainsys Platform to manipulate the data. APACHE HTTPD Server Disaster Recovery Primary Application & DB Secondary Application & DB RSYNC Application Nodes Application Nodes Streaming Replication PostgreSQL Nodes Archive Log Ship PostgreSQL Nodes CDC Replication SOLR Nodes SOLR Nodes Replication CouchDB Nodes CouchDB Nodes All the application nodes and the web nodes would be replicated using the RSYNC. The specific install directory and any other log directories would be synced to the secondary replication nodes. For PostgreSQL, the Streaming replication feature would be used, which used the archive log shipping. SOLR comes up with the in-built CDCR (Cross Data Center Replication) feature, which can be used for disaster recovery. CouchDB has an outstanding replication architecture, which will replicate the primary database to the secondary database. The RPO can be set to as per the needs individually for both Applications and Databases The RTO for the DR would be approximately an hour. ChhaainiSnySsy use Ps ltahtirfdo pramrty i sm oanvitaoirlianbg loep eonn-s tohurec ec tlooulsd . Tsuhceh aPs lZaatbfboixr man dh Jaensk bines eton m hoonsittoer dal l athse a n odes. Application Monitoring PZaubbbilxi cs uCplpoourtds tarancdki nagl sthoe hSearvers’ availability and Pure Cloud Deploymentperformance, Virtual Machines, Asp tphlicea tions Third-Party Monitoring Tools (Plikreiv Aaptaec hCel, oToumdc oatp, AticotinvesM. Q, and Java), and Databases like PostgreSQL, Redis, etc.) that are used in the Platform. Using Zabbix, the following are achieved Various data collection methods and protocols Start to monitor all metrics instantly by using out-of-the-box templates Flexible trigger expressions and Trigger dependencies We can also use the individual Proactive network monitoring application monitoring systems Remote command execution for more in-depth analysis but having an integrated approach Flexible notifications to looking into the problems Integration with external applications using Zabbix API helps us be proactive & faster. Single Node does not mean that literally. HChearein wSyes w iso wulodr skainyg t hoant i tasl lA apppplliiccaattiioonn Mseornviitcoersin pgr otdouolc ethda bt y mthoen Cithoarisn Sthyes Pnelactefsosrmar ya rpea rdaempleotyeerds ilnik ea tShineg CleP NodIn-Built U / M e eomr Soeryr.v er. TThhies rteosotl oisf athlseo d paltaan snteodr atog eh nelopd meso anriteo sre inpdarate seSingle Node ividual rtvherersa ds Monitoring System wori tnhoind etsh.e T ahpisp ltiycpatei oonf .i nIts itsa alllastoio inn twenodueldd gteon deor amllyo sbte for a pmaaticnhtiennga enncvei raocntimvietinets w likhee rpea tthcheirneg a, rcelo nnoint gto, oan md adnayt aobpaesrea tions. Tmhaeinsete wnaonuclde afrlsoom b oen ree csoinmgmle etnodoelsde tfo. Tr hniosn w-mill ibsesi oinnt ecgrirtaictaeld wdeitvhe lZoapbmbeixn fto arc mtivointiietos rwinhge arne dh iaglher ativnagil asybsiltiteym asn.d scalability are not a determining factor. Supported Endpoints ( Partial ) Oracle Sales Cloud, Oracle Marketing Cloud, Oracle Engagement Cloud, Oracle CRM On Demand, SAP C/4HANA, SAP S/4HANA, SAP BW, Cloud SAP Concur, SAP SuccessFactors, Salesforce, Microsoft Dynamics 365, Applications Workday, Infor Cloud, Procore, Planview Enterprise One Oracle E-Business Suite, Oracle ERP Cloud, Oracle JD Edwards, Oracle PeopleSoft, SAP S/4HANA, SAP ECC, IBM Maximo, Workday, Enterprise Microsoft Dynamics, Microsoft Dynamics GP, Microsoft Dynamics Nav, Applications Microsoft Dynamics Ax, Smart ERP, Infor, BaaN, Mapics, BPICS Windchill PTC, Orale Agile PLM, Oracle PLM Cloud, Teamcenter, SAP PLM, SAP Hybris, SAP C/4HANA, Enovia, Proficy, Honeywell OptiVision, PLM, MES & Salesforce Sales, Salesforce Marketing, Salesforce CPQ, Salesforce Service, CRM Oracle Engagement Cloud, Oracle Sales Cloud, Oracle CPQ Cloud, Oracle Service Cloud, Oracle Marketing Cloud, Microsoft Dynamics CRM Oracle HCM Cloud, SAP SuccessFactors, Workday, ICON, SAP APO and IBP, HCM & Supply Oracle Taleo, Oracle Demantra, Oracle ASCP, Steelwedge Chain Planning Oracle Primavera, Oracle Unifier, SAP PM, Procore, Ecosys, Project Management Oracle EAM Cloud, Oracle Maintenance Cloud, JD Edwards EAM, IBM Maximo & EAM OneDrive, Box, SharePoint, File Transfer Protocol (FTP), Oracle Webcenter, Enterprise Storage Amazon S3 Systems HIVE, Apache Impala, Apache Hbase, Snowflake, mongoDB, Elasticsearch, SAP HANA, Hadoop, Teradata, Oracle Database, Redshift, BigQuery Big Data mangoDB, Solr, CouchDB, Elasticsearch No SQL Databases PostgreSQL, Oracle Database, SAP HANA, SYBASE, DB2, SQL Server, MySQL, memsql Databases IBM MQ, Active MQ Message Broker Java, .Net, Oracle PaaS, Force.com, IBM, ChainSys Platform Development Platform One Platform for your End to End Data Management needs Data Migration Data Quality Management Data Analytics Data Reconciliation Data Governance Data Catalog Data Integaration Analytical MDM Data Security & Compliance www.chainsys.com
Comments