www.neiljohan.com

Computing Degree Revision Notes


Here are revision notes for my software engineering degree. I found them useful, maybe you will as well. Note that I make no guarantees about the accuracy of them. Use at your own risk!

Artificial Intelligence (AI)

Search

Start from initial state and search for goal state.
  • Depth First Search - May not find solution. Max. depth should usually be specified to stop it from getting trapped in a dead end.
  • Breadth First Search - Will find shortest solution.
Both DFS and BFS are not v.efficient
  • Hill Climbing - version of DFS where the most promising node is selected to expand. Great efficiency gains can be made as not all the search space has to be stored. It could however get trapped.
  • Best First Search - mixture of DFS and BFS. The most promising of all the nodes seen so far is selected to expand.
  • A* - specialised version of Best First Search where we take the cost of getting to the current node into account. We select best node by how good the node looks, and by how expensive the path to this node is. A* is good if we are interested in the quality of a solution.

  • f(x) = g(x) + h(x) - where g(x) is the cost of the cheapest known path from the source to x, and h(x) is the heuristic estimate of the cost of getting from node x to the goal node.

Heuristics

Heuristic knowledge can be used to guide the search process. Weak heuristics involve very general search strategies, with little domain knowledge. They can be used in a wide range of domains Strong heuristics involve a lot of specific domain knowledge, and can only be used on a limited number of domains.

Knowledge Representation

  • Resolution Refutation - We want to convert logical statements to clause form. This is a flat representation with no embedded components, and all quantifiers have been separated out.
    1. Eliminate <->
    2. Eliminate ->
    3. Reduce scope of ¬ to a single atom
    4. Rename variables so quantifiers bind to a unique variable
    5. Eliminate existential quantifiers (skolemzing) - Each existentially quantified variable must be replaced by a skolem function that is defined in terms of the universally quantified variables that it is in the scope of.
    6. Move quantifiers to left of formula
    7. Eliminate prefix (all variables are now assumed to be universally quantified)
    8. Transform into a conjunction of clauses
    9. Standardize clauses by renaming variables
  • Unification
  • Resolve clauses - Negate clause that you want to prove and add it to the set of clauses
Answer extraction - Add the goal and complement of it to the set of clauses. Tag the goal and don't select the goal as a complementary literal. It's variables are changed as a result of substitutions. At the end this tagged clause should be the only one left


Concurrency

Mutual Exclusion Problem

  • ME Property - The processes are mutually exclusive if at any one time, no two processes are in their critical sections.
  • Deadlock freedom (and livelock freedom) - deadlock is where the process remains in it's waiting stage, and will never be executed
  • No Starvation - A process is starved if other process can enter their critical sections many times, but this process cannot enter it.
  • Success when no contention - If no other processes need to enter the critical section, then a process which does, should be allowed to enter it.
Peterson's algorithm is correct

Semaphores

These are a higher level mechanism for concurrency.

Busy Wait Semaphores

  • Init(S)>=0
  • Wait(S) If S > 0 then S=S-1 else suspend the execution of this process. The process is said to be suspended on the semaphore S.
  • Signal(S) If there are processes that have been suspended on this semaphore, wake one of them else S=S+1
Blocked Queue Semaphores
  • Init(S)>=0
  • Wait(S) - If S > 0 then S=S-1 else suspend the execution of this process and put it into FIFO queue. enqueue(qs,p)
  • Signal(S) If qs is not empty then dequeue(qs,p) and wake P else S=S+1
At any point of execution of a program using semaphore S, S>=0 and S=Init(S) + #signals - #waiting operations

Bounded Buffer

Example of producer-consumer problem

Deadlock Model

The following conditions are necessary for deadlock.
  • ME
  • Hold and Wait - If a process wants a resource that is already being used then it will have to wait.
  • No Pre-emption
  • Circular Wait

Deadlock Prevention

  • Hold and Wait - 1. Get all or none - you get all resources or you get none 2. Get some - Release all - Get others.
  • Pre-emption
  • Circular Wait - in order to prevent circular waiting all resources must be numbered. The processes must then request resources from smaller to bigger

Resource Allocation Graphs

Deadlock Detection


Databases

Candidate Keys

Primary Keys are a specialised form of Candidate Key. If R is a relation then a Candidate Key (C) for R is a subset of R's attributes such that:
  • No two tupples of R have the same value for C. The Candidate Key is a unique identifier for the tupple.
  • No proper subset of C has a uniqueness property. C is irreducible.
Candidate keys are used by the DBMS to pinpoint a specific tuple. Each tuple may have multiple candidate keys. For example a table of atomic elements may have a candidate key for the element symbol, and a candidate key for the element number. A superkey is a relation that contains at least one candidate key as a subset.

Primary Key

One of the relation's Candidate Keys must be chosen to be the Primary Key. If the relation has more than one Candidate Key (rare) then it is generally up to the user to choose which one will be the Primary Key. No component of a Primary Key may accept nulls.

Foreign Key

The Foreign Key of a table must match the primary key of another table. Every value of a foreign key must appear as a value of a matching Candidate Key. However the primary key of the foreign table is not required to be a foreign key of the first table.

Referential Integrity

The database must not contain any unmatched foreign key values. If B referenced A, then A must exist.

Relational Algebra

Relation algebra is a theoretical language with operations that work on one or more relations without changing the original relations.
  • Closure - the output from one relation is another relation.
  • Union - Two relations can be unioned if they have the same set of attribute name, which are defined on the same domain.
  • Intersection - The intersection of the relations that have the same set of attribute names.
  • Difference - Minus
  • Product - Times
  • Restriction - selects the rows that satisfy the condition.
  • Projection - chooses the columns that match a condition.
  • Natural Join - Joins two relations that have matching attributes.
  • Division
Intersection and Times are associative and commutative. Minus is not.

Functional Dependencies

A functional dependency is a many-to-one relationship from one set of attributes to another in a given relationship. FD's tell us the meaning of data. i.e. every supplier is located in only one city. FD's represent integrity constraints. They are checked by the DBMS at every update. The left hand side of a FD is called a determinant. The right hand side is called the dependent.

For efficiency reasons we want to be able to find the smallest set of FD's for the database. An irreducible set of dependencies has the following properties:

  • RHS (the dependent) of every FD in S has just one attribute
  • LHS (the determinant) is irreducible
  • No FD can be discarded without changing the closure of S+
In order to reduce a set of FD's the following rules can be used.
  • Reflexivity - If B is a subset of A then A->B
  • Augmentation - If A->B then AC->BC
  • Transitivity - If A->B and B->C then A->C
  • Self-determination - A->A
  • Decomposition - A->BC then A->B and A->C
  • Union - If A->B and A->C then A->BC
  • Composition - A->B and C->D then AC->BD

Normalisation

  • 1NF - every field contains atomic data.
  • 2NF - every non-key field is irreducibly dependent on the primary key.
  • 3NF - All non-key fields (if any) are (1) mutually independent and (2) irreducibly dependent on the primary key.
  • BCNF - Only determinants (LHS of FD) are candidate keys - There are only dependency arrows out of the candidate keys.
Normalisation is always reversible as no information is lost. Only redundant and duplication is removed. If the relations are not normalised to a sufficient extent the Insert, Update and Delete problems can occur.

Calculating Closure

To calculate the closure {A,B}+ of attributes {A,B} using the FD's A->BC, E->CF, B->E, CD->EF
  1. Add A,B to the result
  2. Loop around the FD's, If the left hand side is a subset of the closure so far, add the attributes to result.
  3. Keep looping round the FD's until no new attributes can be added.
To tell if the FD X->Y follows under S (set of FD's). Calculate the closure of X+ of X under S. If Y is a subset of this closure then X->Y follows.


Functional Programming

Important Higher Order Functions

A higher order function that accepts another function as input.
  • foldr
  • foldr f s [] = s
    foldr f s (a:x) = f a (foldr f s x)
  • map - applies a function to each item of list
  • map :: (t->u)->[t]->[u]
    map f [] = []
    map f (a:x) = f a : map f x
  • filter - returns the elements of the list that satisfy a condition
  • filter :: (t->Bool)->[t]->[t]
    filter p [] = []
    filter p (a:x) | p a       = a : filter p x
                   | otherwise =     filter p x
A polymorphic function can involve more than one type of variable.

Other standard functions

  • length - returns length of list
  • concat - flattens a list of lists into a list
  • take, drop - takes or drops elements from a list

Currying

Currying is when a function that accepts two or more arguments is made to accept arguments one at a time.


Grammar

Phrase Structure Grammar

G = (N,T,P,S)
N= finite set of non-terminal symbols
T= finite set of terminal symbols (such that N intersection T = {), and write V=NUT)
P is a finite set of pairs called production rules
S= start symbol
  • Immediate derivation - New string that can be derived from old string by applying one production rule.
  • Derivation -New String that can be derived from old string by applying production rules a number of times
  • Sentinal form - This is any string that can be derived from start symbol. Made up of either terminals and/or non terminals
  • Sentence - Sentinal form consisting of only non-terminal symbols.
  • Language L(G) - set of all sentences generated by G

Deterministic Finite State Automaton

A deterministic FSA is a 5 tuple

M = (K, T, t, k, F) ---- Example M=(K,T,t,k0,F)

 
K= finite non-empty set of states
T= finite input alphabet
t= (possibly partial) mapping of KxT into K
k (inK) is the initial state
F (inK) is the set of final states
K={k1, k2, k3, k4}
T={0,1}
F={k1}
t(k1,0) = k3 
t(k1,1) = k2 etc

Non-Deterministic Finite State Automaton

A non-deterministic FSA is a 5 tuple

M = (K, T, t, k, F) ---- Example M=(K,T,t,k0,F)

 
K= finite non-empty set of states
T= finite input alphabet
t= mapping of KxT into subsets of K
k (inK) is the initial state
F (inK) is the set of final states
K={k0, k1, k2, k3, k4}
T={a,b}
F={k2,k4}
t(k0,a) = {k0,k3} 
t(k0,b) = {k0,k1} etc

Top Down Parsing

Generates a left most derivation tree. May need to eliminate left recursion.

Predictive Parser

G = (N,T,P,S)
N= {S,A}
T= {a,b,c}
P={S -> aS (a)
S -> bA (b)
A -> d (d)
A -> ccA} (c)
Predictive Parser Table
 
a b c d
S S -> aS S -> bA
A A -> ccA A -> d

Above: The look ahead value is in paremphasis. It is an LL(1) grammar as each starts with a different terminal symbol.

 
Input  -Stack- Action 
[a]abccd$ 
S$
apply S->aS
[a]abccd$
aS$
pop&scan
a[a]bccd$
S$
apply S->aS
a[a]bccd$
aS$
pop&scan
aa[b]ccd$
S$
apply S->bA
aa[b]ccd$
bA$
pop&scan
aab[c]cd$
A$
apply A->ccA
aab[c]cd$
ccA$
pop&scan
aabc[c]d$
cA$
pop&scan
aabcc[d]$
A$
apply A->d
aabcc[d]$
d$
pop&scan
aabccd[$]
$
String is valid


Java

Graphs

Various representations
  • 2D array - v.fast to check if edge exists, and to add edges.
  • List of edges
  • Adjacency list representation

Algorithms

  • Dijkstra's Algorithm for shortest paths - keeps array of shortest know distances to goal node
  • Floyd-Fulkerson - finds paths through graph to give largest network flow.
  • Formula Checker - Add 2 edges for each pair in formula. If there is a path from + to - version of any variable then formula is false.
  • Warshalls algorithm - Finds shortest path?

Java

  • Observers - addObserver(Observer o), notifyObservers(), notifyObserver(Observer O)
  • Client server system
  • GUI's
For more Java information see my Java source code page.


Networks

A network is a system of inter-connected communicating entities.

Data Transmission

  • Amplitude is the height of the waves
  • Wavelength is the length of the waves.
  • Frequency is the number of complete units per unit time
  • Phase is the relative position in time of a fixed point in a signal.
A codec converts analog data to digital data.
  • Duplex - In a full duplex system the two signals can be sent at different frequency, and then the correct frequencies extracted at each end.
  • Attenuation - the amount of reduction in signal strength due to boosting a weakened signal
  • Delay - the time of propagation
  • Noise - Thermal, intermodulation (signal interferes with itself), crosstalk (signal overlay), impulse (everything else, electrical). Noise is measured as a ratio in decibels.
Different media are used to transmit signals. Either fixed wire or free medium
  • Two wire - one line carries ground, the other carries the signal
  • Twisted pair - reduced interference, use repeater every 2-3km
  • Coaxial Cable - big bandwidth, up to 10,000 voice signals.
  • Optical fibre
Then there are free media
  • Terrestrial microwaves - subject to interference
  • Satellite microwaves
  • Radio
  • Infrared

Network Performance

We need to be able to measure network performance. We need to be able to work out reliability, availability, usage, response time and job loss. We can use system modelling to predict how a network will perform. Several types of model.
  • Deterministic - We know what will happen and when
  • Non-Deterministic - We don't know what will happen and when - uses random time e.g. component failure
  • Continuous event - weather system
  • Discrete event - things happen at a set time (e.g. light bulb breaking)
  • Continuous time
  • Discrete time - Things happen at a set CPU clock cycle
Roulette simulation - probability of a particular event occurring only depends on the current state

Network Architecture

OSI seven layer model
  • Application later - communication between end user applications
  • Presentation layer - data encryption, compression
  • Session layer - Deals with synchronisation of the dialogue, between users at each end.
  • Transport layer - this layer is based on an end-to-end connection
  • Network layer - establishes and maintains connection between end points
  • Data link layer - bits -> frames. Provides error correction.
  • Physical layer - check for bit errors, convert bits into signals

Protocols

Used for communication between entities in different systems.

Data Integrity

Data can be corrupted, and integrity checks are needed to detect this corruption. Parity can be used to check errors. If seven bits are used to represent a character then an 8th bit can be added so that the number of 1's is always even. If a single bit is corrupted then the number of 1's will no longer be even and the error will be detected (even parity).

Checksums are a more sophisticated way of checking for corruption. The numerical values can be added, and then the lower digits of the sum can be sent as a checksum. e.g. if the total is 67835, we might send he value 835 as the checksum. A cyclic redundancy check or frame check sequence can be used, these use a more complex mathematical equation to compute a checking value, and they are often useful for dealing with certain error patterns.

In order to help correct errors as well as detect them hamming codes and convolution codes can be used. Convolution codes operate on continuous streams of data rather than blocks. The value of each encoded bit is dependent on the number of preceding bits.

Data may be transmitted 'vertically'. This will make the use of hamming codes more effective as a burst of interference will affect fewer bits from each block.

Redundancy, Routing and re-routing, and acknowledgement and resend are other ways of ensuring data integrity.

Security

  • Static data access - restrict access to host, or material including passwords, read/write protection and firewalls.
  • Dynamic data access - measures to prevent data from being read, or altered while in transit. Uses encryption and authentication.
Border routers are simple gateways that use simple filtering. Bastion hosts link the internal and external networks at an application level (www or ftp). These can prevent connections at the lower TCP level, which prevents most types of hacking tools.

Networks

LANs are normally manages by a single organisation, whereas a WAN would be managed by a number of organisations.
  • Ethernet - computers connected by coaxial cable. Frames of data are transmitted to all computers. The computer whose address matches the destination of the frame can receive the frame. All ethernet frames are preceded by a preamble of 62 alternating bits. If two frames are sent by two different computers at the same time then a collision will occur, and the preamble will be corrupted. If a computer detects that another computer is sending a frame of data, then it will not send any other frames. If a collision is detected then the transmitting machines will wait a random amount of time before resending the data.
  • Cambridge ring - All the computers are connected in a loop. Each computer (host) has a repeater for the signal. Data moves round the ring in a train of mini-packets. Each mini packet has a destination address and a full/empty flag. When the train reaches a host which is the destination of one of the mini packets, the packet is removed from the train, and the packet header is sent on with the full/empty flag set to 0. A monitor (a separate machine) controls the data train. When it first encounters a packet it sets a flag to say that it has seen the packet. If it sees the packet again then the packet is removed from the train. To transmit a packet a host adds it to the end of the train.
  • Token Ring Network - This type of network uses a token to determine which host has the authority to send data. It adds data to the token, and this travels round to it's destination. A host wanting to send data must wait for a free token. The monitors main job is to look for lost-token errors.
  • Aloha - Packet based radio system used by Uni of Hawaii. Two frequencies are used, one for sending from the host, and one for sending to the host. If a collision occurs with two islands sending to the host, the the host will not be able to send an acknowledgement. The senders will then wait for a random amount of time before resending their messages.


Operating Systems

OS Components

  • Process Management
  • Main Memory management
  • Secondary storage management
  • I/O System Management
  • File Management
  • Protection System
  • Networking
  • Command-Interpreter System

Typical Layered Approach

  • User Programs
  • Buffering for input and output devices
  • Operator console device drivers
  • Memory management
  • CPU Scheduling
  • Hardware

Memory Protection

For each program two registers can be used for memory protection. The base register holds the smallest legal memory address. The limit register holds the size of the range.

Device Management

  • Direct I/O with Polling - The CPU regularly checks the controller state register to see if the device is idle, busy or has completed it's requested operation. If it is idle then the CPU can place a command in it's Command Register.
  • Interrupts - When the device has completed it's task it interrupts the CPU. The interrupt handler determines which device interrupted it and acts on the interrupt.

Privileged Instructions

These are instructions that only the OS can use. They control enabling and disabling interrupts, switching between processes, accessing memory protection registers, performing I/O and halting the CPU. These instructions can only be used in Supervisor mode, not User mode.

Interrupts

An interrupt is a signal that causes the control unit to branch to a specific location to execute code to service the occurrence of an external condition. The first level interrupt handler determines the source of an interrupt, then services the interrupt.

Dispatcher

This allocates the CPU amongst the various processes.

Scheduler

  • Long term controls the degree of multi programming i.e. the number of processes in memory.
  • Medium term scheduler - occasionally removes processes from memory if the memory becomes over allocated. Also swaps processes into memory.
  • Short term Scheduler - performed by the CPU scheduler. It decided which process should be executed next. In pre-emptive multitasking if a process arrives with a length of less than the remaining current process, then the current process will be removed from execution.

Types of Scheduling Algorithm

  • First Come First Serve
  • Shortest Job first - non-preemptive is when the process cannot be pre-empted until it completes it's CPU burst. Pre-emptive is where if a process arrives with a less remaining time than the currently executing process, the currently executing process is removed from memory
  • Priority Scheduling
  • Round Robin - Each process gets a small amount of CPU time. It is then pre-empted and added to the end of the ready queue.

Memory management

  • Overlays - only keep in memory those instructions that are needed at any given time. Implemented by user, no OS support is needed.
  • Logical / Physical Addressing - The programs see the logical or virtual address. The physical address is only seen by the memory management unit. In the MMU the value of the relocation register is added to the address generated by the user process at the time it is sent to memory.
  • Swapping - process can be swapped out of memory and into a backing store. It is brought back when it needs to be executed.
  • Page Table - held in memory and maps between logical and physical memory.
  • Segmentation - Program is split into logical units such as main program, functions, stacks etc.

Virtual Memory

  • Demand Paging - A page is brought into memory whenever it is needed. When there is a reference to a page a page fault is generated. If the page is not in memory then the page will be swapped into memory. Usually another page will have to be swapped out of memory in order to make way for the new page. There are a number of algorithms to decide which page to swap out of memory.
    • FIFO
    • Optimal - replaces page that will not be used for longest period of time. Used to identify how well other algorithms perform.
    • Least Recently Used

Secondary Storage

There are a number of disk scheduling algorithms.
  • First Come Fist Served
  • Shortest Seek Time First
  • Scan - The disk arm starts at one end of the disk are sweeps across it to the other end servicing requests as it passes them. It then goes back the other way.
  • C-Scan - Treats the cylinder as a circular list that wraps from the last cylinder to the first.
  • C-Look - Arm only goes as far as the last request in each direction, then reverses immediately and goes back.


Software Engineering

Management

  • Team - work towards a collective goal, collective ego
  • Group - individual goals and egos
Belbin's management roles - Chairman, Shaper, Plant, Evaluator, Company worker, Resource instigator, Team worker, Finisher progress

Software Problems

  • Sophisticated hardware has outpaced out ability to use it's potential
  • Our ability to build new programs cannot keep pace with demand
  • Ability to maintain existing programs cannot keep pace with demand.

Software Crisis

  • Schedule and cost estimates are often inaccurate
  • Productivity of software has not kept pace with demand
  • Quality is often inadequate
  • Existing software is difficult to maintain

Risk

Risks involve uncertainty and loss. We can never be 100% sure that they will occur, but if they do then we will experience an unwanted loss.
  • Examine potential damage and probability
  • They affect cost, schedule and performance
  • We need to decide a level at which the risks will cause the project to terminate

Schedules

Slipage, slacktime, Pert and Gantt charts

Requirements

A requirement is something that a customer needs.
  • Functional requirements - Interactions between the system and it's environment
  • Non Functional requirements - Restrictions on system, e.g. specific hardware or software or performance requirements

Cost Estimation

Cocomo produces an estimate of man months and developer schedule in months. It does this by taking LOC and cost drivers into account. There are three types of cocomo
  1. Basic - takes LOC into account
  2. Intermediate - uses LOC and cost drivers
  3. Advanced - Also includes departmental cost breakdown
Each software project can be either Organic (small in-house project), embedded (tight constraints, often tightly coupled to hardware) and semidetached (mixture of the two). There are four types of cost drivers - product, computer, personnel and project.

Design

Design is applying techniques and principles for the purpose of identifying the system in sufficient detail to permit it's physical implementation. It is the foundation for implementation, testing, maintenance and reuse. Design consists of the following concepts
  • Abstraction
  • Refinement
  • Information hiding
  • Structured programming
  • Modularity - It is easier to solve lots of little problems, divide and conquer
  • Reusability (concept supported by all the others)
Cohesion is the degree to which modules perform one, and only one function. Coupling is the degree to which the module is connected to other modules in the system. Design is essential for quality software.

DFD and JSP can be used. In JSP *-iteration 0-option/branch

OO Design

Objects (data structures), operations (specification of process), messages (invocation method). Encapsulation - user of object only needs to know what the object does, not how it does it. Polymorphism - the ability to take several forms.

SCM

This is the tracking and control of activities. It begins when software development starts, and ends when the software is taken out of use. SCM consists of Identifying Change, Controlling Change, Auditing Change and Reporting Change. The baseline is the foundation of SCM.

Testing

Testing is a process that leads to debugging. Verification - Are we designing the system right? Validation - Are we designing the right system. Technical reviews (Walkthroughs)
  • Unit Testing - usually white box, branch coverage, boundary value analysis and equivalence classes. Uses drivers (main programs) and stubs (replace subordinate modules that are not being tested)
  • Integration testing - (design) Mostly black box
  • Validation - (requirements) black box, demonstrates tractability to requirements.
  • System testing - validates software once it has been incorporated into a larger system.
Regression testing - re-doing tests when the program has changed, in order to verify that it is still correct.

The three standard methods of debugging are brute force, backtracking and cause elimination. When testing, code statement coverage, branch coverage and condition coverage can be used.

Metrics

  • McCalls quality factors - Audibility, accuracy, completeness, consistency, data communality, error tolerance, execution efficiency, expandability, generality, hardware independence, reliability, testability, modularity etc.
  • FURPS - functionality, usability, reliability, performance and supportability.
Process Metrics:
  • G/Q/M - Goal/Question/Metrics
  • QFD - Quality Functional Deployment - needed for identifying control points of quality assurance to be carried through the design
  • CMM - Compatibility Maturity Model - 5 levels of process maturity, 1.Initial, Repeatable, Defined, Management, 5.Optimising. The companies are independently assessed.

McCabe's Cyclomatic Number

Number of linear independent paths in control flow graph. V(G) = e-n+2 For programs consisting of more than one function (p functions) V(G) = e-n+(2*p) It is a good idea for the value to be <=7. Need to be able to construct Control Flow Graphs (program graphs)


Projects Home
Copyright © 1997 - 2023

Privacy policy
Hosted by 1&1